Creating a Software Tool for the Clinical Researcher – the IPS System
Bruce G. Buchanan, Ph.D., Wendy Chapman, Ph.D., Gregory F. Cooper, M.D., Ph.D., Paul Hanbury, B.S., Mehmet Kayaalp, M.D., M.S, Manoj Ramachandran, M.S.,
Melissa Saul, M.S.
University of Pittsburgh
Pittsburgh, Pennsylvania
Background: The University of Pittsburgh Medical Center and Schools of the Health Sciences (UPMC-HS) includes both a large integrated healthcare delivery system and a large medical research enterprise. Vast amounts of data have been collected and stored electronically, including narrative reports as well as data in structured databases. Clinical researchers at UPMC-HS have wanted more efficient and comprehensive access to these data. In response to that need, the Center for Biomedical Informatics at the University of Pittsburgh has been developing a software system called IPS (Identify Patient Sets). IPS has three components: De-ID, the IPS Kernel, and Encoder, which we discuss in turn.
Description: De-ID processes text records to find the eighteen HIPAA identifiers such as names, locations, dates, and ages. It is largely a pattern-directed program that looks for, and replaces, identifying information in the HIPAA categories. The application has been evaluated by two groups of researchers to validate its ability to appropriately remove identifiers and to indicate the extent to which the program’s over-marking creates problems. Both audits were successful and continual audits are a part of the ongoing software development process.
The IPS Kernel is an interactive Bayesian learning program that processes the text to construct phrases that serve as features. It uses relevance feedback from a researcher to build a model that identifies records of patients of interest. It is able to identify patient subsets that are difficult to find with a simple Boolean query, either because the relevant query terms are not easily identifiable or because the concept of interest is inherently complex. The IPS kernel distinguishes assertions of presence or absence of findings and diseases. Thus the researcher may select both concepts of interest as well as the specific absence of a term. As such, the program can build models in which negated terms are an important part of the classifier.
Once free-text records of interest are located, the Encoder program assists the researcher in coding those records into a relational database (currently Microsoft Access™). As a side effect of that encoding process, a data dictionary is created that also is stored in the relational database. The relational database can be exported into a format that is ready for analysis by statistical software packages.
Results: The deployment of IPS has been done in cooperation with the University’s Office of Clinical Research (OCR). This arrangement provides for a central resource for use of the IPS system. A researcher might choose to use any or all of the components of IPS. De-ID is used exclusively by the OCR staff for Institutional Review Board (IRB) approved projects. As one example of its use, the IPS Kernel has been applied to find re-admissions for ‘pain’ following ambulatory surgery. The concept of ‘pain’ may be difficult to find in coded data, but it is often apparent in reading through de-identified free-text emergency room (ER) notes. A clinical researcher at UPMC-HS used IPS to construct a probabilistic model for locating ER notes that indicate patients with pain. In this research, the model served two purposes. It showed phrases in the text that are strongly associated with pain, including phrases that indicate different types of pain. Also, once constructed, the model can be applied to future ER notes to easily identify cases for which patients suffered from pain.
Conclusion: IPS is a working software tool that represents more than four years of project effort. It responds to the conflicting needs of a large medical institution to (1) make data available to clinical researchers, and (2) withhold data that reveal patients’ identities, even to researchers within the system. To date, these tools appear to enhance the researcher’s ability to conduct research. The purpose of this theatre demonstration will be to show by example the main lessons we have learned in designing, implementing, and deploying the IPS system.
Filed under: Clinical Tools