Descriptive History Representations: Learning Representations by Answering Questions
Guy Tennenholtz, Jihwan Jeong, Chih-Wei Hsu, Yinlam Chow, Craig Boutilier

TL;DR
This paper introduces Descriptive History Representations (DHRs), a method for compressing interaction histories into informative summaries that answer relevant questions, improving decision making in partially observable environments.
Contribution
The paper proposes DHRs as sufficient statistics for history summarization, integrating multi-agent learning to optimize representations for decision making and question answering.
Findings
DHRs effectively summarize historical data for decision tasks.
The approach produces interpretable user profiles for preference prediction.
Validated on public datasets with positive results.
Abstract
Effective decision making in partially observable environments requires compressing long interaction histories into informative representations. We introduce Descriptive History Representations (DHRs): sufficient statistics characterized by their capacity to answer relevant questions about past interactions and potential future outcomes. DHRs focus on capturing the information necessary to address task-relevant queries, providing a structured way to summarize a history for optimal control. We propose a multi-agent learning framework, involving representation, decision, and question-asking components, optimized using a joint objective that balances reward maximization with the representation's ability to answer informative questions. This yields representations that capture the salient historical details and predictive structures needed for effective decision making. We validate our…
Peer Reviews
Decision·Submitted to ICLR 2026
(1) The formulation of representation learning through question-answering is novel. It provides a interpretable alternative to methods that learn representations implicitly through prediction or reconstruction losses. (2) The framework offers a high degree of interpretability. This is a significant advantage over existing methods. (3) The paper provides comprehensive experiments on several datasets. It includes extensive ablation studies (e.g., on history length, profile length, number of qu
(1) Dependence on a powerful QA-generator: The framework's performance is dependent on the quality and relevance of the questions generated by the oracle. While the use of a fixed LLM is practical, it introduces a dependency on the capabilities and potential biases of that specific model. The paper notes that adversarial training of the QA-generator yielded marginal gains, suggesting room for improvement in dynamically learning the optimal question set. (2) Computational complexity: The multi-
1. The core idea is novel: defining representation sufficiency in terms of the ability to answer task-relevant questions offers a new, principled approach to interpretable representation learning. 2. The experimental evaluation is thorough, covering multiple datasets and metrics. The results show consistent improvements over standard LLM-based and specialised recommendation methods. 3. The practical framework leverages off-the-shelf LLMs as QA generators and supports offline training, enabling d
1. The QA generator uses future user behaviour to construct questions and ground-truth answers. However, can the method work in online cold-start settings where new users have no explicit ratings or reviews? Without ground-truth answers to supervise the QA generator, it is unclear how meaningful questions could be constructed or how the DHR would learn. 2. The optimization objective is a complex min-max game involving the joint training of multiple components. Did the authors observe any trainin
* Novelty and Abstraction: The paper's main strength is its novel approach to representation learning. By focusing on answering high-level, semantically meaningful questions rather than predicting low-level observations, it shifts the representation burden to a more abstract and potentially more task-relevant level. * Interpretability: In an age of black-box models, the ability to generate an interpretable textual user profile as the history representation is a significant advantage, particular
* Reliance on QA-Generator: The entire framework is critically dependent on the availability of a high-quality QA-generator ($\nu_{QA}^{*}$) to provide "sufficient" questions and ground-truth answers during training. The paper acknowledges that designing this oracle is challenging and relies on a pre-trained LLM for its main experiments. This dependency might limit its applicability in domains where such questions are hard to formulate or where a powerful pre-trained generator is unavailable. *
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducator Training and Historical Pedagogy
