Building, Reusing, and Generalizing Abstract Representations from Concrete Sequences
Shuchen Wu, Mirko Thalmann, Peter Dayan, Zeynep Akata, Eric Schulz

TL;DR
This paper introduces a hierarchical variable learning model (HVM) that efficiently learns and abstracts patterns from sequences, outperforming standard algorithms and LLMs in transfer and compression tasks, aligning with human cognition.
Contribution
The paper presents a novel non-parametric hierarchical model that learns abstract sequence representations and demonstrates its effectiveness in language tasks and human-like transfer capabilities.
Findings
HVM learns more efficient dictionaries than Lempel-Ziv.
HVM's sequence likelihood correlates with human recall times.
HVM effectively balances compression and generalization.
Abstract
Humans excel at learning abstract patterns across different sequences, filtering out irrelevant details, and transferring these generalized concepts to new sequences. In contrast, many sequence learning models lack the ability to abstract, which leads to memory inefficiency and poor transfer. We introduce a non-parametric hierarchical variable learning model (HVM) that learns chunks from sequences and abstracts contextually similar chunks as variables. HVM efficiently organizes memory while uncovering abstractions, leading to compact sequence representations. When learning on language datasets such as babyLM, HVM learns a more efficient dictionary than standard compression algorithms such as Lempel-Ziv. In a sequence recall task requiring the acquisition and transfer of variables embedded in sequences, we demonstrate HVM's sequence likelihood correlates with human recall times. In…
Peer Reviews
Decision·ICLR 2025 Poster
* The model is an interesting, sensible, and original improvement over HCM. * Empirical results show that the model learns well and works well on synthetic data and some natural language datasets. * The work is very well put into wider perspective in the introduction.
* Many of the main experiments were conducted on data from a generative model that fits exactly the modeling assumptions for HVM (or tasks inspired by these assumptions). As is often the case when a paper proposes a novel model and a novel data set / generator, the fact that the model works well on data that was specifically designed for the model to work well is not a very strong argument. Luckily the paper also shows good results on “natural” data, and qualitatively matches some aspects of hum
Impressively, this paper presents some novel theoretical contribution, a clear theoretical framework for combining chunking and abstraction in sequence learning, with formal proofs and guarantees. They also evaluated their model through multiple angles: computational efficiency, correlation with human behavior, comparison with LLMs, a good set of comparisons. And I enjoyed their connection to cognitive science: The work bridges computational and cognitive approaches, providing insights into h
1. The paper's comparison to LLMs is relatively narrow and focuses primarily on a specific sequence recall task and limited to short sequences, and therefore seems slightly contrived situation. The paper would benefit from explorations of slightly more complex abstraction tasks to study the general applicability of their method. 2. The comparisons in the paper are quite limited and don't adequately address the rich literature on sequence compression and pattern detection. A single example I ha
* I think this is a really innovative algorithm that builds on the recent HCM in a pretty novel and cool way. It's clearly motivated by the human ability to abstract. * The evaluation is quite rigorous and showing performance on real world data as well as accounting for human behavior in a relevant task is a very nice touch. * There are a lot of rigorous proofs in the appendix. The authors have clearly thought a lot about the theoretical foundations of this algorithm as well as shown good em
* I think the paper can mainly be improved in clarity. For example, when getting to Figure 3, it's kind of hard to figure out which exact datasets these results are from. In general, most of the text in the work is dedicated to describing the algorithm and results. I think adding some more information on what datasets are being used would be useful. For example, line 310: " BabyLM language dataset, which contain text snippets from a collection of data domains" what data domains are there? It was
Videos
Taxonomy
TopicsBIM and Construction Integration
