Accelerating Cancer Data Standardization and Sharing through CancerOntoGPT: A Novel Architecture Leveraging Large Language Models
Introduction
The heterogeneity of cancer data presents significant challenges in standardization and sharing, which hinders the progress of cancer research and patient care. The mCODE™ initiative aims to address this issue by establishing a core set of structured data elements for oncology electronic health records (EHRs). However, the rapid advancements in large language models (LLMs) such as GPT-4 and LlaMa offer new opportunities to further accelerate this process. In this work, we introduce CancerOntoGPT, a novel architecture designed to leverage the power of LLMs for cancer data standardization and sharing.
Methods
CancerOntoGPT combines the strengths of mCODE™ and LLMs to create a more efficient and effective approach to data standardization. By integrating LLMs into the standardization process, CancerOntoGPT can better understand and interpret the complex language and semantics of oncology data, enabling more accurate and consistent data extraction and transformation. We utilized various novel components like code interpreter and nested parsing, as well as the advanced natural language processing capabilities of LLMs that can facilitate seamless data sharing and collaboration among researchers and clinicians, improving the overall quality and accessibility of cancer data.
Results
We evaluated CancerOntoGPT on 200 synthetic oncology notes, achieving an average accuracy of 88.76% in data extraction and standardization (based on the ground truth). This demonstrates the potential of our novel architecture in effectively handling complex cancer data. To showcase the capabilities of CancerOntoGPT, we have also developed a demo available on the HuggingFace platform (https://mcodegpt.org).
Conclusion
Through the implementation of CancerOntoGPT, we aim to accelerate the standardization and sharing of cancer data, ultimately contributing to the advancement of cancer research and the improvement of patient care. We hope to engage with leading experts in the field and foster collaborations that will drive the development and adoption of CancerOntoGPT in the oncology community.