TL;DR
This paper introduces LiteCoST, a framework combining structured reasoning templates and fine-tuned small language models to improve long-document question answering with high accuracy and efficiency.
Contribution
It proposes a novel two-pillar approach using structured reasoning templates and targeted fine-tuning to enable small models to perform comparably to large models on long-document QA.
Findings
Achieves LLM-like quality with 3B/7B models on multi-domain long-document QA.
Delivers 2-4x lower latency than GPT-4o and DeepSeek-R1.
Uses a structured reasoning template to guide small models in producing verifiable outputs.
Abstract
Large language models (LLMs) are widely applied to data analytics over documents, yet direct reasoning over long, noisy documents remains brittle and error-prone. Hence, we study document question answering (QA) that consolidates dispersed evidence into a structured output (e.g., a table, graph, or chunks) to support reliable, verifiable QA. We propose a two-pillar framework, LiteCoST, to achieve both high accuracy and low latency with small language models (SLMs). Pillar 1: Chain-of-Structured-Thought (CoST). We introduce a CoST template, a schema-aware instruction that guides a strong LLM to produce both a step-wise CoST trace and the corresponding structured output. The process induces a minimal structure, normalizes entities/units, aligns records, serializes the output, and verifies/refines it, yielding auditable supervision. Pillar 2: SLM fine-tuning. The compact models are trained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
