Comparing LLM and Fine-Tuned Model Performance on NVDRS Circumstance Extraction with Varying Prompt Complexity
Geoffrey Martin, Xuan Zhong Feng, Yifan Peng

TL;DR
This study compares large language models and fine-tuned models in extracting complex circumstances from NVDRS death narratives, proposing a hybrid approach based on prompt complexity analysis.
Contribution
It introduces a Complexity Score algorithm to optimize prompt strategies and demonstrates a hybrid model approach for improved inference accuracy.
Findings
LLMs outperform fine-tuned models on low-prevalence, complex circumstances.
The framework generalizes across multiple frontier LLMs.
A hybrid approach improves overall extraction performance.
Abstract
Suicide is a leading cause of death in the United States, and understanding the circumstances that precede it requires extracting structured information from death investigation narratives. Many of these circumstances require semantic inference beyond simple keyword matching. We develop a ``Complexity Score'' algorithm that analyzes coding manual structure to predict when detailed prompts with full coding guidelines improve over name-only prompts. We then construct a hybrid approach that selects prompt strategy per circumstance. We evaluate large language models (LLMs) against fine-tuned RoBERTa on 25 inferentially complex circumstances from the National Violent Death Reporting System (NVDRS). We found that LLMs substantially outperform on low-prevalence circumstances where training data is insufficient. We further demonstrate that our framework generalizes across frontier LLMs, with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
