# Integrating predictive coding and a user-centric interface for enhanced auditing and quality in cancer registry data

**Authors:** Hong-Jie Dai, Chien-Chang Chen, Tatheer Hussain Mir, Ting-Yu Wang, Chen-Kai Wang, Ya-Chen Chang, Shu-Jung Yu, Yi-Wen Shen, Cheng-Jiun Huang, Chia-Hsuan Tsai, Ching-Yun Wang, Hsiao-Jou Chen, Pei-Shan Weng, You-Xiang Lin, Sheng-Wei Chen, Ming-Ju Tsai, Shian-Fei Juang, Su-Ying Wu, Wen-Tsung Tsai, Ming-Yii Huang, Chih-Jen Huang, Chih-Jen Yang, Ping-Zun Liu, Chiao-Wen Huang, Chi-Yen Huang, William Yu Chung Wang, Inn-Wen Chong, Yi-Hsin Yang

PMC · DOI: 10.1016/j.csbj.2024.04.007 · Computational and Structural Biotechnology Journal · 2024-04-07

## TL;DR

A system combining AI and expert rules helps cancer registrars efficiently and accurately extract lung cancer data from patient records.

## Contribution

A hybrid neural-symbolic system with a user-centric interface improves cancer registry data quality and auditing.

## Key findings

- The system achieved F1-scores of 0.85 and 1.00 across 30 coding items.
- Registrar feedback confirmed the system's reliability in assisting and auditing data abstraction.
- The system reduces labor resources and time for data abstraction tasks.

## Abstract

Data curation for a hospital-based cancer registry heavily relies on the labor-intensive manual abstraction process by cancer registrars to identify cancer-related information from free-text electronic health records. To streamline this process, a natural language processing system incorporating a hybrid of deep learning-based and rule-based approaches for identifying lung cancer registry-related concepts, along with a symbolic expert system that generates registry coding based on weighted rules, was developed. The system is integrated with the hospital information system at a medical center to provide cancer registrars with a patient journey visualization platform. The embedded system offers a comprehensive view of patient reports annotated with significant registry concepts to facilitate the manual coding process and elevate overall quality. Extensive evaluations, including comparisons with state-of-the-art methods, were conducted using a lung cancer dataset comprising 1428 patients from the medical center. The experimental results illustrate the effectiveness of the developed system, consistently achieving F1-scores of 0.85 and 1.00 across 30 coding items. Registrar feedback highlights the system’s reliability as a tool for assisting and auditing the abstraction. By presenting key registry items along the timeline of a patient’s reports with accurate code predictions, the system improves the quality of registrar outcomes and reduces the labor resources and time required for data abstraction. Our study highlights advancements in cancer registry coding practices, demonstrating that the proposed hybrid weighted neural-symbolic cancer registry system is reliable and efficient for assisting cancer registrars in the coding workflow and contributing to clinical outcomes.

ga1

## Linked entities

- **Diseases:** lung cancer (MONDO:0005138)

## Full-text entities

- **Diseases:** cancer (MESH:D009369), lung cancer (MESH:D008175)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC11059324/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC11059324/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/PMC11059324/full.md

---
Source: https://tomesphere.com/paper/PMC11059324