15 49.0138 8.38624 1 1 4000 1 https://apcdjournal.com 300 true

International Classification of Diseases, Period!

,

New Blog Series: Inconsistent Data Sets

How do you deal with inconsistent data?

Every experienced analyst knows that about 90% of the job is getting your data in shape for analysis. This 90% is not only time-consuming, but also more drudgery than enlightenment. This is especially true if you need to merge multiple data sets and whose file formats are not identical. A classic example are the rich data from CMS’ Hospital Compare. Every quarter, CMS releases updated results. And too often, those updates are formatted differently than their predecessors—changed field names, new data formats, altered data structure, etc. How can a health care analyst spin Hospital Compare gold from the mess of data file straw?

For those who don’t know, Hospital Compare offers extensive quality and other data for dozens of measures, thousands of hospitals, and multiple years. The challenge is how to merge it all when file format may vary from year to year.

In the next 6 posts, Hannah Sieber, Software Engineer at FHC, will discuss the challenges and a very real solution that speeds production time, reduces analyst angst, and prevents human errors.

Although Hannah’s example is Hospital Compare, FHC’s solution will work for all manner of merged data sets: hospital discharge data merged across several states, all-payer claims data (APCD) across multiple years or states; population health data for one provider organization from multiple insurance plans, and many more.

I hope you enjoy the series. If you need help with data tasks like these, please reach out. We’d be glad to help.

The International Statistical Classification of Diseases and Related Health Problems (ICD) is the most standardized set of codes used in healthcare and covers both diagnoses (CM) and procedures (PCS).

Every medical claim is required to have at least one ICD-CM diagnosis code to explain why the service was rendered. All acute inpatient claims are required to use ICD procedure codes to identify the service or service performed. On October 1, 2015, the United States healthcare system switched from using the 9th revision of ICD to the 10th revision.

Both versions of ICD-CM (9 & 10) diagnostic codes use a 6-character format: XXX.XX. ICD-PCS version 9 uses a 5-character format: XX.XX; version 10 CM uses XXX.XXX and PCS uses XXXXXXX. Some claim extracts will include the ‘.’ and some will drop it.

Occasionally claims extracts will merge from multiple sources and include observations with and without the ‘.’ as shown in the associated DPS data. For this reason, it is good practice to drop the ‘.’ from both your processed medical claims file and from any code maps (e.g. dx labels) and crosswalks (e.g CCS).

Be careful to retain leading zeros when dropping the ‘.’. We’ll talk more about why in a future post.

Dr. Olmsted has a Ph.D. in Economics and has been working with healthcare data for over twenty years for companies including IHCIS (now part of Optum), RTI International, and Health Dialog.

Previous
The Road to De-Identification: How to Maintain Privacy with Publicly Released Data
Next
Enhanced Excel data access with our new Data Publishing System