Inconsistent Data Sets: A Summary of the ETL Normalization of CMS Quality Data
This series has explored Freedman Analytical Engine (FAE)’s purpose and process. More specifically, we have looked at the problems that FAE solves, the need for standardized data in order to complete analysis, and the involvement of programmers and analysts in completing this ETL process. We have described ways in which file structures are identified and files are normalized in order to be merged into a single CSV. We have focused on the need for standardization of quality data provided by the Centers for Medicare and Medicaid Services (CMS). Having developed FAE, we have a unique understanding of the inconsistencies within the data’s formatting and the intricacies involved in the normalization process.
The most important things to notice are the lessons learned through the creation of FAE. Interpreting the file formats is complex and nuanced. We have grown to understand the different structures and how they relate to the final merged CSV file that our team can analyze. FAE minimizes the need for programmers and analysts, allowing them to focus their time on maximizing the utility of the analytics rather than cleaning files and preparing data for analysis.
FAE has the capability to extend its methods beyond this dataset. We must ask ourselves, “What are the most meaningful ways to use this engine?” FAE can be generalized to work with other CMS datasets and to help process completely new datasets. The method is a solid foundation for handling data that is not in one standardized form. FAE teaches us how to inspect, identify, and process differences in data structure. These lessons learned go far beyond this quality data.