A customer recently asked our VP of Data Science, Rohun Kshirsagar, “How does Lumiata’s model work?”

Here’s the answer.

“Lumiata takes the data you provide, including medical, behavior and pharmacy claims, procedure and lab codes including ICD9, ICD10, NDC, CPT, LOINC, etc., eligibility information, and allowed amounts, and mixes it with our internal 110 million-member data asset.  Machine learning and AI models really shine with lots of data to train the models. We leverage our internal data assets to exploit this principle to great effect. Next, we extract millions of input variables (also known as machine learning features) from each patient record. For example, we take an ICD10 code like E11 (type-2 diabetes) and compute “count of E11 in the last 6 months,” or “count of E11 in the last year,” for each member. We extract millions of input variables for every member (since there are hundreds of thousands of billing codes).  Next, we apply machine learning techniques to pare down the input variables set from five million to four or five thousand of the most relevant for the machine learning task we are solving, for example, “cost in 2019.”  As a result, we are able to train a highly optimized model and excel at the learning task we are solving.

We have a good understanding of which input variables are generally most useful for predicting response variables on claims data, we are also agnostic and open-minded about what input variables the models find to be most informative to yield maximum performance.  It is for these reasons that we have had great success in building the next generation of risk models on claims data.”

Thanks for explaining, Rohun.