TL;DR
This paper introduces DecompTune, a novel post-training approach that enhances language models' ability to generate answer decompositions, significantly improving attribution accuracy in complex, multi-hop question answering tasks.
Contribution
It proposes a new decomposition-based training method and curated dataset to improve attribution in language models for complex QA, outperforming prior methods.
Findings
DecompTune improves attribution quality in complex QA tasks.
Models trained with DecompTune outperform prior attribution methods.
DecompTune matches or exceeds state-of-the-art models in attribution accuracy.
Abstract
Large language models (LLMs) are increasingly used for long-document question answering, where reliable attribution to sources is critical for trust. Existing post-hoc attribution methods work well for extractive QA but struggle in multi-hop, abstractive, and semi-extractive settings, where answers synthesize information across passages. To address these challenges, we argue that post-hoc attribution can be reframed as a reasoning problem, where answers are decomposed into constituent units, each tied to specific context. We first show that prompting models to generate such decompositions alongside attributions improves performance. Building on this, we introduce DecompTune, a post-training method that teaches models to produce answer decompositions as intermediate reasoning steps. We curate a diverse dataset of complex QA tasks, annotated with decompositions by a strong LLM, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
