Although the data in your EHR can offer significant insights, its primary purpose is to support patient care and billing, rather than analysis. If you’ve ever attempted to analyze EHR data, this has likely become apparent—cleaning and isolating concepts of interest is no small task.

Data is often scattered across tables in the EHR database, making it hard to get answers to seemingly simple questions. For example, how would you identify patients who have been approached about tobacco cessation? You might find evidence of this intervention in:

  • ICD-10 codes linked to a progress note
  • CPT codes linked to a bill or claim
  • Structured data entered in smart forms or templates
  • Free-text documentation in a progress note
  • Prescription data for tobacco cessation treatments

You could choose the most reliable of these data sources and ignore the others. But if you really need an accurate analysis—especially if you’re comparing data over time, across multiple configuration changes or upgrades to your EHR system—you might want to combine these data points into a single, comprehensive source of truth about tobacco cessation. This approach (spoiler alert!) is how Relevant facilitates accurate reporting for a variety of clinical concepts.

In this post, we'll review the benefits of Relevant’s approach to data cleaning, exploring how you can leverage this process to simplify creation and maintenance of reports and modules in our platform. Even if you don’t use Relevant, you can apply similar techniques to your own data warehouse.

At Relevant, the SQL code we create to clean and aggregate data is called a “transformer.” Transformers process raw EHR data from each health center, creating tables to store cleaned-up concepts (for example, diabetic patients). These tables reside in the data warehouse and ultimately power Relevant’s front-end visualizations.

The use of transformers has several implications:

Transformers ensure consistency in the way concepts are defined, minimizing redundant code

To see how this works, let’s stick with the example of diabetes. To calculate a quality measure like HbA1c control for diabetic patients, we typically use transformers to define and clean two concepts:

  • Diabetes cases. This transformer pulls cases of diabetes from the problem list and/or encounter-related assessments based on specified value sets and ICD-10 codes, and populates the earliest known diagnosis date for each case.

  • A1C labs. This transformer extracts A1C lab results based on standard codes (typically LOINC) as well as text matching known test names, populating a clean, numeric result value.

These concepts power a number of reports and quality measures that are baked into Relevant, but they can also be repurposed. For example, if you wanted to track A1C values for patients with diabetes over time, this could be simplified by using the tables that have already been defined via transformers. This would reduce redundancy (e.g., not reinventing the wheel for each diabetes-related report) and also promote consistency (e.g., each time diabetes cases are used within Relevant, they refer to the same population).

When concepts require modification, updating transformers allows changes to propagate across Relevant

Let’s say your health center changes the way A1C labs are documented. (Maybe a new off-site lab starts performing tests, and a few more LOINC codes are incorporated into the EHR.) Here’s where transformers really shine. It wouldn’t be necessary to update every module and report referencing diabetes cases or A1C labs; only the transformer would need to be altered.

We don’t define every possible concept in a transformer—just the ones that we've found to appear frequently in health center analyses. As we encounter novel data challenges, we’ll sometimes add new transformers, and health centers with data analysts working in Relevant are encouraged to add them, too. Feel free to reach out to if you need any help.