Rochester Regional Health authored publications and proceedings

Do Neural Information Extraction Algorithms Generalize Across Institutions?

Enrico Santus, Massachusetts Institute of Technology, Cambridge, MA.
Clara Li, Massachusetts Institute of Technology, Cambridge, MA.
Adam Yala, Massachusetts Institute of Technology, Cambridge, MA.
Donald Peck, Henry Ford Health System, Detroit, MI.
Rufina Soomro, Liaquat National Hospital & Medical College, Karachi, Pakistan.
Naveen Faridi, Liaquat National Hospital & Medical College, Karachi, Pakistan.
Isra Mamshad, Liaquat National Hospital & Medical College, Karachi, Pakistan.
Rong Tang, Rochester Regional HealthFollow
Conor R. Lanahan, Massachusetts General Hospital, Boston, MA.
Regina Barzilay, Massachusetts Institute of Technology, Cambridge, MA.
Kevin Hughes, Massachusetts General Hospital, Boston, MA.

Department

OB/GYN

Document Type

Article

Publication Title

Jco Clinical Cancer Informatics

Abstract

PURPOSE: Natural language processing (NLP) techniques have been adopted to reduce the curation costs of electronic health records. However, studies have questioned whether such techniques can be applied to data from previously unseen institutions. We investigated the performance of a common neural NLP algorithm on data from both known and heldout (ie, institutions whose data were withheld from the training set and only used for testing) hospitals. We also explored how diversity in the training data affects the system's generalization ability. METHODS: We collected 24,881 breast pathology reports from seven hospitals and manually annotated them with nine key attributes that describe types of atypia and cancer. We trained a convolutional neural network (CNN) on annotations from either only one (CNN1), only two (CNN2), or only four (CNN4) hospitals. The trained systems were tested on data from five organizations, including both known and heldout ones. For every setting, we provide the accuracy scores as well as the learning curves that show how much data are necessary to achieve good performance and generalizability. RESULTS: The system achieved a cross-institutional accuracy of 93.87% when trained on reports from only one hospital (CNN1). Performance improved to 95.7% and 96%, respectively, when the system was trained on reports from two (CNN2) and four (CNN4) hospitals. The introduction of diversity during training did not lead to improvements on the known institutions, but it boosted performance on the heldout institutions. When tested on reports from heldout hospitals, CNN4 outperformed CNN1 and CNN2 by 2.13% and 0.3%, respectively. CONCLUSION: Real-world scenarios require that neural NLP approaches scale to data from previously unseen institutions. We show that a common neural NLP algorithm for information extraction can achieve this goal, especially when diverse data are used during training.

First Page

Last Page

DOI

10.1200/CCI.18.00160

Volume

Publication Date

7-1-2019

Medical Subject Headings

Algorithms; Databases, Factual; Electronic Health Records (economics, organization & administration, standards); Humans; Information Storage and Retrieval; Medical Informatics (economics, methods, organization & administration, standards); Natural Language Processing

PubMed ID

31310566

Recommended Citation

Santus, E., Li, C., Yala, A., Peck, D., Soomro, R., Faridi, N., Mamshad, I., Tang, R., Lanahan, C. R., Barzilay, R., & Hughes, K. (2019). Do Neural Information Extraction Algorithms Generalize Across Institutions?. Jco Clinical Cancer Informatics, 3, 1-8. https://doi.org/10.1200/CCI.18.00160

Link to Full Text

Find in your library

COinS

Rochester Regional Health authored publications and proceedings

Do Neural Information Extraction Algorithms Generalize Across Institutions?

Department

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Volume

Publication Date

Medical Subject Headings

PubMed ID

Recommended Citation

Browse

Search

Author Corner

Rochester Regional Health authored publications and proceedings

Do Neural Information Extraction Algorithms Generalize Across Institutions?

Authors

Department

Document Type

Publication Title

Abstract

First Page

Last Page

DOI

Volume

Publication Date

Medical Subject Headings

PubMed ID

Recommended Citation

Share

Browse

Search

Author Corner