Retrospective Learning from Interactions
Zizhao Chen, Mustafa Omer Gul, Yiwei Chen, Gloria Geng, Anne Wu, Yoav Artzi

TL;DR
ReSpect is a method that enables large language models to learn from implicit feedback signals in multi-turn interactions, improving task performance without extra annotations.
Contribution
It introduces ReSpect, a novel retrospective learning approach that leverages implicit feedback from interactions to enhance LLM performance without additional labels.
Findings
Task completion rate improved from 31% to 82%.
ReSpect effectively learns from implicit signals in multimodal interactions.
No external annotations were needed for the learning process.
Abstract
Multi-turn interactions between large language models (LLMs) and users naturally include implicit feedback signals. If an LLM responds in an unexpected way to an instruction, the user is likely to signal it by rephrasing the request, expressing frustration, or pivoting to an alternative task. Such signals are task-independent and occupy a relatively constrained subspace of language, allowing the LLM to identify them even if it fails on the actual task. We introduce ReSpect, a method to learn from such signals in past interactions via retrospection without additional annotations. We deploy ReSpect in a new multimodal interaction scenario, where humans instruct a multimodal LLM to solve an abstract reasoning task with a combinatorial solution space. Through thousands of interactions with humans, we show how ReSpect gradually improves task completion rate from 31% to 82%, all without any…
Peer Reviews
Decision·Submitted to ICLR 2025
- The idea of learning from past mistakes is really interesting and the proposed framework, RESPECT, doesn't depend on the optimization strategy. As highlighted in the paper, this framework can be used with various optimization strategy (Supervised Learning, Reinforcement Learning, Utility Maximization). - The paper also contributed with a new task, MULTIREF, which will be very useful for the future development of this domain.
**Major:** - **Excessive use of training data:** The proposed method relies heavily on data. The model is fine-tuned at each step with all the interaction data acquired from past steps. Now, although the authors mention that they are taking measures to avoid overfitting (lines 246-248), this much repeated data usage would eventually result in overfitting. - **Lack of metric evaluation:** Although the authors showcases various observations and results through plots and confusion matrix, they lack
1. The paper proposed a learning method, RESPECT that utilizes implicit human-in-the-loop feedback for explicit action improvement 2. The paper experimented with 3 learning methods: supervised learning, REINFORCE, and KTO 3. The paper conducted thorough experiments in a multimodal referential game 4. The paper conducted pre-training as well as online testing for iterative model improvement and evaluation 5. The paper is very well structured and well-written. The paper analyzed in detail about le
The paper wishes to highlight the contribution on 'continual learning' and model's iterative improvement with human's online feedback, but the actual experiments conducted is slightly misleading. The authors were careful to distinguish the differences between 'round' and 'turn. - In the setup, each 'round' includes multiple 'turns' of interactions between a human and the bot. - The model is retrained after each 'round', with the history of all previous 'rounds' - After fine-tuning at the end
- The use of continual learning in the RESPECT framework demonstrates strong potential for developing LLMs that improve continuously from real-world interactions. - The retrospective aspect of RESPECT is particularly compelling, as it enables models to learn from user corrective feedbacks.
- The experiments are confined to the MULTIREF scenario with abstract tangram shapes. This limited scope raises questions about the generalizability of RESPECT to other domains. Applying RESPECT to diverse settings, such as conversational agents could demonstrate its robustness and adaptability across a broader range of applications, particularly those involving complex language or high-stakes interactions. - There's a risk that the model might overfit to specific patterns of implicit feedback r
Code & Models
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Topic Modeling
