Medical Scientific Table-to-Text Generation with Human-in-the-Loop under the Data Sparsity Constraint
Heng-Yi Wu, Jingqing Zhang, Julia Ive, Tong Li, Vibhor Gupta, Bingyuan, Chen, Yike Guo

TL;DR
This paper introduces a novel two-step table-to-text generation system with human-in-the-loop validation, addressing data sparsity issues in biomedical report generation and demonstrating improved accuracy and adaptability with limited data.
Contribution
The paper presents a new two-step architecture with auto-correction, copy mechanism, and synthetic data augmentation for biomedical table-to-text generation, effective under data sparsity constraints.
Findings
Improved precision in copying tabular values (up to 0.13 increase).
Effective adaptation with only 40% of training data.
Validated outputs through human expert evaluation.
Abstract
Structured (tabular) data in the preclinical and clinical domains contains valuable information about individuals and an efficient table-to-text summarization system can drastically reduce manual efforts to condense this data into reports. However, in practice, the problem is heavily impeded by the data paucity, data sparsity and inability of the state-of-the-art natural language generation models (including T5, PEGASUS and GPT-Neo) to produce accurate and reliable outputs. In this paper, we propose a novel table-to-text approach and tackle these problems with a novel two-step architecture which is enhanced by auto-correction, copy mechanism and synthetic data augmentation. The study shows that the proposed approach selects salient biomedical entities and values from structured data with improved precision (up to 0.13 absolute increase) of copying the tabular values to generate coherent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling · Scientific Computing and Data Management
MethodsAttention Is All You Need · PEGASUS · Linear Layer · Softmax · Layer Normalization · Byte Pair Encoding · Dense Connections · Dropout · Inverse Square Root Schedule · SentencePiece
