If there’s any process data analysts, data scientists and actuaries would want to automate, it’s likely data preparation.
Data preparation is a cumbersome and laborious process. In many cases, data analysts can spend up to 60 percent of their time just preparing their data before they can use it to drive business decisions. Sound familiar? We have all been through the data preparation nightmare! This number definitely rings true for Emily–a data analyst we know at a health plan–who is inundated with gigabytes of new data every month, and loses precious time because she has to wrestle with dirty data. In an industry where data sets are extremely complex with high dimensionality, and where timely and precise decisions and interventions are crucial, the time spent on just combining and cleaning data becomes costly, both clinically and financially.
We need a better way. Health plans need the ability to put their data to work within the smallest possible timeframe so they can make faster, more precise business decisions that improve their members’ health, reduce costs and help them manage risk.
At Lumiata, we are motivated by the urgency to make sense of health data for the “Emilys” of the world. To reliably drive value, data needs to be corrected, standardized and contextualized. We transform raw data into enriched, standardized and longitudinal records for each individual member so our customers can put their data to work toward analysis and action faster.
Here’s what that means for someone like Emily:
The Need for Correctness: Raw Data Ingestion and Integrity Analysis
We take the time to understand the health plan’s data, and assess how it serves the analysis we conduct. The first step in our Data-as-a-Service (DaaS) process is to ingest Emily’s data as-is from multiple sources, including claims, labs, EHRs and unstructured data. This data is almost always incomplete and has very high dimensionality. We perform a comprehensive data integrity analysis to understand whether all the minimum required data is available, but also to ensure that the raw data is transformed properly. At the end of this largely automated process, we generate a data integrity report, which can be shared with Emily and her internal stakeholders to help them get a deeper understanding of their own data.
The Need for Standardization: Raw FHIR Creation
For repeatable analytical or AI processes to be applied, Emily’s data has to be cleaned and standardized. Our proprietary ETL process automatically corrects formatting issues and identifies missing content. We transform it into standard, validated FHIR bundles (Fast Healthcare Interoperability Resources)–the emerging standard for medical data representation, exchange and interoperability.
The Need for Contextualization: Data Standardization and Enrichment
Just as humans require and apply knowledge to make sense of data, any AI or analytical process can be enhanced with knowledge that is applied to raw data, thereby enriching it and making it more useful. This enrichment occurs in many forms, such as: identifying and correcting inaccurate and incomplete codes; applying code mapping and abstraction; active ingredient mapping with medications; and lab range interpretations. These processes ensure not just clinical quality of the data, but also clinical value. For instance, Lumiata would correct a 2007 CPT code in Emily’s data to its corresponding 2017 CPT code for both code consistency and ensuring signal identification. Lumiata also maps all ICD9 codes to ICD10 appropriately and applies SNOMED hierarchies to provide additional data abstraction. These kinds of processes enrich raw data for every downstream use.
This rigorous enrichment quickly transforms Emily’s raw, incomplete data into a format that can be meaningfully used by any analytical or machine learning method. We have found that even if the data set consists primarily of just claims data, Lumiata’s AI for Data Prep is able to enrich the data with medical knowledge (from more than 50 million PubMed articles) and experience from other data sets (from more than 60 million patient records) to enable more precise predictions that are interpretable.
Here’s how Emily’s day-to-day is now transformed with just the first part of our DaaS process: she can now start her analysis within hours, not months, allowing her to focuses on strategic business considerations, rather than wrestling with data; she is working with good data that is not old, which gets her past the ‘garbage-in-garbage-out’ pain-point; and because her data is standardized, all her processes are repeatable and she can drive new and additional insights and value through the enriched data.
We want to empower our customers to deliver clinical brilliance effortlessly through their data. It starts with an AI-powered data preparation process that is faster, better, with more efficient preparation. We have found that our process drastically reduces the data latency issues and health plans’ time-to-intervention metrics.
By: Ash Damle, Founder and CEO & Prerna Anand, Product Manager