Data quality checks: A framework for APCDs
Inhabiting the space of health care performance measurement for quite some time, I’m convinced the biggest improvement we will make in measurement will not come from better measures. It will come from better data. So, when I was recently asked to examine the capacity of a given APCD to produce performance measures suitable for accountability, my natural inclination was to first explore the quality of the data. I sought an organizing framework as the blueprint to systematically deploy data quality checks.
After some research, I unearthed materials for a session I attended at an AcademyHealth Annual Research Meeting and rediscovered a framework for conducting data quality checks. This method was created by the Observational Medical Outcomes Partnership (OMOP) project. The OMOP breaks out quality checks into three categories:
- Boundary: Checks to identify suspicious or implausible values.
- Example: End date precedes the start date. Rates with numerators greater than denominators.
- Concept: Checks to identify concepts present in one source but missing in others, as well as concepts that are substantially different across sources.
- Example: The number of diagnosis codes for inpatient claims in an APCD is markedly lower compared to the state’s all payer inpatient data set.
- Temporal: Checks to review patterns over time, identifying results that differ from earlier measurement periods where specifications are relatively stable over these time periods.
- Example: The average number of claims per health plan member drops precipitously from one 12-month period to the next.
Starting with this framework, I added a second overlay before designing quality checks. In numerous discussions with colleagues, I have come to understand that data quality can be defined by two basic elements: completeness and accuracy.
With OMOP’s framework in mind and my added overlay of completeness and accuracy, I set out to design a set of quality checks for an APCD with a very modest budget. In developing the detailed workplan, I quickly came to understand that completeness checks were generally simpler, and thus less costly, to deploy. With this in mind, I decided to focus primarily on data completeness. Then, I designed two to three quality checks for each category: boundary, concept and temporal.
The upshot? I discovered substantial data quality issues. These problems were not uncovered by the data contributors or the APCD vendor, despite numerous quality checks from each. I was stunned that such glaring issues were missed by these other filters given I had conducted very few quality checks. Why was this? I found the vast majority of others’ quality checks looked solely at temporal measures or the consistency of results over time. Very few boundary or concept quality checks were included.
So, what are my tips for developing a data quality check approach?
- Pick a framework representing the array of quality issues you need to examine.
- Stick to your framework in designing and deploying your quality checks.
Above I describe my initial framework. While I plugged in a very modest set of quality checks, the framework provided good discipline to look at the data across several dimensions in a very balanced way.
If I had not found significant issues in these initial checks, my next step would have been to seek a larger budget to develop a series of checks aimed at assessing accuracy. Again, these accuracy checks would have been designed within the OMOP framework.
If you would like to learn more, here are a few data quality resources I recommend:
Foundation for the National Institutes of Health as it relates to OMOP: https://fnih.org/what-we-do/major-completed-programs/omop
Observational Health Data Sciences and Informatics: https://www.ohdsi.org/
Society for Clinical Data Management: https://www.scdm.org/