Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries
Kiran Vodrahalli, Santiago Ontanon, Nilesh Tripuraneni, Kelvin Xu,, Sanil Jain, Rakesh Shivanna, Jeffrey Hui, Nishanth Dikkala, Mehran Kazemi,, Bahare Fatemi, Rohan Anil, Ethan Dyer, Siamak Shakeri, Roopali Vij, Harsh, Mehta, Vinay Ramasesh, Quoc Le, Ed Chi, Yifeng Lu

TL;DR
Michelangelo introduces a novel framework for evaluating large language models' ability to understand and manipulate long contexts by revealing latent structures, providing more meaningful diagnostics than traditional retrieval tasks.
Contribution
The paper presents the Latent Structure Queries framework, a new method for creating long-context evaluation tasks that measure deep understanding beyond simple retrieval.
Findings
Evaluations are high-signal and effective for assessing long-context understanding.
State-of-the-art models show significant room for improvement in long-context reasoning.
The framework applies across code and natural language domains.
Abstract
We introduce Michelangelo: a minimal, synthetic, and unleaked long-context reasoning evaluation for large language models which is also easy to automatically score. This evaluation is derived via a novel, unifying framework for evaluations over arbitrarily long contexts which measure the model's ability to do more than retrieve a single piece of information from its context. The central idea of the Latent Structure Queries framework (LSQ) is to construct tasks which require a model to ``chisel away'' the irrelevant information in the context, revealing a latent structure in the context. To verify a model's understanding of this latent structure, we query the model for details of the structure. Using LSQ, we produce three diagnostic long-context evaluations across code and natural-language domains intended to provide a stronger signal of long-context language model capabilities. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Image Retrieval and Classification Techniques · Natural Language Processing Techniques
