Reasoning before Comparison: LLM-Enhanced Semantic Similarity Metrics for Domain Specialized Text Analysis
Shaochen Xu, Zihao Wu, Huaqin Zhao, Peng Shu, Zhengliang Liu, Wenxiong, Liao, Sheng Li, Andrea Sikora, Tianming Liu, Xiang Li

TL;DR
This paper introduces a novel framework using large language models like GPT-4 to improve semantic similarity metrics for domain-specific texts, especially radiology reports, outperforming traditional NLP metrics.
Contribution
The study presents a new LLM-based approach for semantic similarity assessment that leverages zero-shot label generation, enhancing accuracy in specialized domains.
Findings
GPT-4 generated labels align better with clinical ground truth.
The framework outperforms traditional metrics like ROUGE and BLEU.
Applicable to other specialized domains beyond radiology.
Abstract
In this study, we leverage LLM to enhance the semantic analysis and develop similarity metrics for texts, addressing the limitations of traditional unsupervised NLP metrics like ROUGE and BLEU. We develop a framework where LLMs such as GPT-4 are employed for zero-shot text identification and label generation for radiology reports, where the labels are then used as measurements for text similarity. By testing the proposed framework on the MIMIC data, we find that GPT-4 generated labels can significantly improve the semantic similarity assessment, with scores more closely aligned with clinical ground truth than traditional NLP metrics. Our work demonstrates the possibility of conducting semantic analysis of the text data using semi-quantitative reasoning results by the LLMs for highly specialized domains. While the framework is implemented for radiology report similarity analysis, its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Adam · Softmax · Multi-Head Attention · Layer Normalization · Dropout · Residual Connection
