Finding Non-NULL NULL Dates

September 19, 2017

New Blog Series: Inconsistent Data Sets

How do you deal with inconsistent data?

Every experienced analyst knows that about 90% of the job is getting your data in shape for analysis. This 90% is not only time-consuming, but also more drudgery than enlightenment. This is especially true if you need to merge multiple data sets and whose file formats are not identical. A classic example are the rich data from CMS’ Hospital Compare. Every quarter, CMS releases updated results. And too often, those updates are formatted differently than their predecessors—changed field names, new data formats, altered data structure, etc. How can a health care analyst spin Hospital Compare gold from the mess of data file straw?

For those who don’t know, Hospital Compare offers extensive quality and other data for dozens of measures, thousands of hospitals, and multiple years. The challenge is how to merge it all when file format may vary from year to year.

In the next 6 posts, Hannah Sieber, Software Engineer at FHC, will discuss the challenges and a very real solution that speeds production time, reduces analyst angst, and prevents human errors.

Although Hannah’s example is Hospital Compare, FHC’s solution will work for all manner of merged data sets: hospital discharge data merged across several states, all-payer claims data (APCD) across multiple years or states; population health data for one provider organization from multiple insurance plans, and many more.

I hope you enjoy the series. If you need help with data tasks like these, please reach out. We’d be glad to help.

As we saw in ”Why do 5% of My Services Occur on January 1, 1960?”, NULL dates are frequently passed to claims extracts as 0 dates causing issues with trending and length of stay. To identify this issue, summarize your data by year and month as shown in the DPS Non-NULL NULL Dates dataset.

The results are not always as dramatic as displayed below, but the example illustrates how to identify possible problems. Significant event counts of dollar amounts prior to the time period of the analysis generally indicate that you have found the issue. It is easy enough to reset 0 values to nulls for key date fields. Keep in mind that there are a variety of dates that 0 can be translated into and not all are so far back in time as 1900 or 1960.

In general, it is good practice to look at all of your date fields with counts (or another unit of measurement like dollars or member months) by year and month. Look for outliers prior to the general analytic time period. Dates of birth can be especially tricky since most actually do precede the analytic time period. However, a chart showing counts will help identify outliers from the general trend.

Year and month analysis will also help identify other outlier dates like ’12-31-9999’ or other “non-termination” dates added during warehousing and extract production. These frequently show up in end/termination dates for enrollment spans or discharge dates from facilities.

Dr. Olmsted has a Ph.D. in Economics and has been working with healthcare data for over twenty years for companies including IHCIS (now part of Optum), RTI International, and Health Dialog.

~ share

Finding Non-NULL NULL Dates

As we saw in ”Why do 5% of My Services Occur on January 1, 1960?”, NULL dates are frequently passed to claims extracts as 0 dates causing issues with trending and length of stay. To identify this issue, summarize your data by year and month as shown in the DPS Non-NULL NULL Dates dataset.

About the APCD Journal

Follow the APCD Journal

Categories