WONDER: Accelerating Real World Data-driven Precision Oncology through Data Science and Informatics Excellence in Research
Introduction
There are significant potentials to leverage various types of real-world data (RWD) sources to derive data-driven insights associated with the translation and implementation of novel cancer therapies. The digital transformation in healthcare has enabled us to create longitudinal patient records (LPRs) that combine data from a variety of sources throughout the healthcare system. Despite the recent progress of using LPRs for cancer research, many of these studies are limited to use imaging or structured data while critical information is embedded in clinical narratives. Additionally, it is not clear the impact of LPR-based studies on underrepresented and disadvantaged groups due to potential digital divide caused on disparities in health care.
Methods
The WONDER project aims to advance data science and informatics to accelerate real world data-driven precision oncology research. Specifically, we will focus on cancer information extraction, computational phenotyping, and cyberinfrastructure in advancing RWD/LPR for precision oncology. Specifically, ascertaining cancer cases accurately is crucial when using RWD/LPR for cancer research. However, they are often sub-optimally recorded where an accurate and complete diagnosis may be found in progress notes at an oncology practice. Additionally, much of detailed clinical information required for cancer research is often embedded in narrative clinical notes, which are not directly available for computational analysis, and manual extraction is very time consuming and costly. Meanwhile, cohort identification is to determine if a patient is meeting established criteria based on the current and past medical information of the patient. In the cancer domain, due to the inherent complexity of the criteria, the screening process can be labor-intensive and costly. There has been an increasing effort to develop artificial Intelligence (AI) solutions leveraging high-throughput methods to improve the efficiency and effectiveness of the identification process. However, the impact of these AI solutions on underrepresented and disadvantaged groups is unclear. The lack of well-defined guidelines, processes, and tools hinders the proportional representation of minority groups in cancer research. This problem can be exacerbated when applying LPR-based AI solutions due to the biased, incomplete, or non-interoperable LPR data resulting from the disparity of clinical care.
Results
The WONDER project will be built upon several existing projects, including OHNLP which focuses on multi-site information extraction, TRUST which focuses on scientific rigor process, IMPACT which focuses on cohort identification, and LivES which focuses on living evidence synthesis. An initial architecture is drafted and we are initiating the development environment.
Conclusion
The successful delivery of the WONDER project depends on stakeholders and partnerships across the community. We are looking for conversations and engagement which can help us develop better solutions for the cancer research community using real-world data.