Revolutionizing Data-Based Health Care Research
BackBy Daniel Margolis — August 3, 2009
PearlDiver Technologies Inc., creator of what is believed to be the largest fully HIPAA-compliant, publicly available and searchable database of patient records in the United States, is using Indiana University's Big Red supercomputer for advanced data analysis.
"We're moving beyond just private-payer data and we're now bringing in Medicare data, so we're going to go from 130 million records to about 500 million records being added this year," said Benjamin Young, president of PearlDiver. "The size of our data is going to expand pretty tremendously, and it takes a lot of computer power to actually organize it in a way that you can ask questions and then get responses back. For example, [if you] go into Medicare and say, ‘What happens to a patient who is 65, has spine fusion and happens to have osteoporosis?' we'll be able to tell you that in a couple seconds." Young explained that with PearlDiver's previous computer capacity this would have taken six to nine months.
PearlDiver's mission is to improve the ways surgeons, product manufacturers, hospitals and regulators connect with and use health care-related information. Big Red's computational power will analyze outcomes from millions of patients to condense findings into information readily available to medical providers and policy makers.
Young explained, "We're looking into the relationship between how much is charged and reimbursed compared to the age, gender and morbidities of the patients, so that we can get a much better idea of health care costs - especially government-paid health care costs - and essentially what kinds of patients are we spending the money on, and then what kinds of outcomes do those patients have, and is there a direct correlation between spending more money on certain procedures and a better outcome?"
This is basically a process of data mining Medicare information provided by the federal government, all of which is "de-identified" to preserve patient confidentiality. What's being sought is not specifics, but rather a macro-level view of the state of health care in the United States.
"We're not trying to identify Joe Smith in Minneapolis and look at what happened to him," Young said. "We're looking at real life trends, at things [that] really happen to people who match specific criteria. The Medicare data will give us a very accurate picture of the 65-and-older population. Right now, we have a very large sample of the 65-and-under population only in orthopedics, but we are currently working with a couple other organizations to expand outside of orthopedics so that we can essentially cover all of medicine in the private-payer arena."
PearlDiver's use of Big Red is enabled by the Indiana Initiative for Economic Development (IIED), a program that fosters technology development and job growth in the state of Indiana. The IIED is a partnership involving the Indiana Economic Development Corp. (IEDC), Indiana University, Purdue University and IBM Inc. IU's activities as part of the IIED are now coordinated through the University's newly formed Pervasive Technology Institute.
PearlDiver is not moving its raw data to IU's supercomputers themselves. "We're simply moving some roll-up tables, which are actually relatively small compared to the raw data, to the supercomputer," Young explained. "We're looking at maybe 30 or 40 terabytes of information that we need to get over to the supercomputer, and then I work with a chief scientist there who will get it loaded for me, and they've actually allocated 64 nodes that we're allowed to work on. Then I can upload my software and have it run across all those nodes and essentially do analysis on this data that I've uploaded."
So, PearlDiver's biggest challenge in this endeavor was to make its data mining as parallel as possible so it could be split into many different parts and still perform efficiently. "We've essentially managed that," Young said. "We've set up little miniclusters of machines and have managed that software and actually are doing a lot of our own data processing now on clusters of machines. But we're very anxious to get it onto the big cluster and do the kind of analysis that we don't have the computer power to do."