TL;DR
This paper investigates how long-context language models struggle to utilize information evenly across inputs in multi-hop question answering, especially when relevant info is spread out, and proposes methods to improve reasoning over long contexts.
Contribution
It reveals the 'lost in the middle' bias in multi-hop QA and introduces techniques like knowledge graph extraction, summarization, and chain-of-thought prompting to mitigate this issue.
Findings
Performance drops as relevant info moves away from input edges
Reducing extraneous content improves model reasoning
Chain-of-thought prompting enhances multi-hop reasoning
Abstract
Previous work finds that recent long-context language models fail to make equal use of information in the middle of their inputs, preferring pieces of information located at the tail ends which creates an undue bias in situations where we would like models to be equally capable of using different parts of the input. Thus far, the problem has mainly only been considered in settings with single pieces of critical information, leading us to question what happens when multiple necessary pieces of information are spread out over the inputs. Here, we demonstrate the effects of the "lost in the middle" problem in the multi-hop question answering setting -- in which multiple reasoning "hops" over disconnected documents are required -- and show that performance degrades not only with respect to the distance of information from the edges of the context, but also between pieces of information.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
