It doesn't take much imagination to think of ways in which all of the data being recorded into electronic health records (EHRs) could be used study diseases and map trends. One of the problems in putting this data to work, however, is that these forms are not uniformly maintained—and the more structured sections with standardized codes lack nuance and might even be misleading, according to some researchers. New tools, making use of natural language search, could overcome these shortcomings and glean connections not seen in the standard parts of the EHRs.

The tools take advantage of the fact that a lot of telling—and arguably even more accurate and complete—information about patients is jotted down by healthcare workers in the records' free-text note sections, rather than encoded in standardized notation. "These notes are rich in detail about signs and symptoms of patients' conditions, their priorities for clinical care, and their willingness to take some medications but not others," Ashish Jha, of the Harvard School of Public Health, wrote in an essay in JAMA, Journal of the American Medical Association.

Like other forms of casual writing, though, the content can be quirky and unpredictable, which makes it tough for quick analysis on a broad scale. Currently, these more freewheeling kinds notes need to be read manually. But with a way to analyze these resources en masse, the scope of digital medical data could expand by many orders of magnitude.

"Natural language processing has the potential to alter the landscape by analyzing the context of words and phrases in medical records making them available for computer processing, resulting in the ability to automatically interpret EHRs," Jha noted.

One team of researchers assessed thousands of EHRs from different Veterans Health Administration medical centers to look for complications after common procedures. After performing a natural language search of records, they found about 12 times as many incidences of poor outcomes such as sepsis and renal failure than when they just assessed the records' standard number-based codes. That study, led by Harvey Murff, of the Vanderbilt Epidemiology Center, was published online Tuesday in JAMA.

A Danish team has been searching for a way to assess these free-text sections across systems—and languages. "Worldwide, the manually inserted medical terms in medical records are heavily biased by local practice and billing purposes," Soren Brunak, of the Technical University of Denmark, said in a prepared statement. Brunak and his colleagues developed a search program based on the International Classification of Disease dictionary, which provides standard terms that translates across languages. They report finding some 10 times the amount of relevant patient clinical information when their search is added than an analysis of the medical codes alone. Their findings were published online August 25 in PLoS Computational Biology.

In addition to being able to better spot broad trends, these search tools might also help expedite the move to more personalized care. "Using our method we obtained a much more fine-grained clinical characterization of each patient, which ultimately also may be very valuable for choosing personalized treatment regimes," Brunak said.

He and his group took their findings a step further, matching the correlations they found to genetic data in hopes of creating a new model for "interfacing the electronic patient record data directly to the DNA sequencing," Brunak said.

Overall, this filed of electronic health information data mining "may appear esoteric, but its significance should not be underestimated," Jha wrote. "These findings suggest that EHRs can transform health care delivery."