Characterizing Pattern Matching and Its Limits on Compositional Task Structures
Hoyeon Chang, Jinho Park, Hanseul Cho, Sohee Yang, Miyoung Ko, Hyeonbin Hwang, Seungpil Won, Dohaeng Lee, Youbin Ahn, Minjoon Seo

TL;DR
This paper formalizes pattern matching as functional equivalence in language models, providing theoretical bounds and empirical evidence on how models generalize in compositional tasks and the limitations posed by path ambiguity.
Contribution
It introduces a formal framework for pattern matching as functional equivalence and offers theoretical bounds and empirical validation for model generalization in compositional tasks.
Findings
Success is predicted by the number of contexts witnessing functional equivalence.
A tight sample complexity bound for learning two-hop structures is established.
Path ambiguity impairs model accuracy and interpretability.
Abstract
Despite impressive capabilities, LLMs' successes often rely on pattern-matching behaviors, yet these are also linked to OOD generalization failures in compositional tasks. However, behavioral studies commonly employ task setups that allow multiple generalization sources (e.g., algebraic invariances, structural repetition), obscuring a precise and testable account of how well LLMs perform generalization through pattern matching and their limitations. To address this ambiguity, we first formalize pattern matching as functional equivalence, i.e., identifying pairs of subsequences of inputs that consistently lead to identical results when the rest of the input is held constant. Then, we systematically study how decoder-only Transformer and Mamba behave in controlled tasks with compositional structures that isolate this mechanism. Our formalism yields predictive and quantitative insights:…
Peer Reviews
Decision·ICLR 2026 Poster
1. This paper studies an important problem of the compositional generalization of language models. 2. The experiments include various settings of practical relevance.
1. The results are limited to small synthetic task structures. 2. The settings require a deterministic function and strict functional equivalence, which may be too restrictive in a real-world NLP dataset.
- At a high level, thinking about generalization in terms of many-to-one functions seems like it clearly captures a kind of task-level generalization. Completing the task correctly requires non-trivial logical reasoning. The task has the nice properties that (1) it is possible to get 100% accuracy when correctly applying logical reasoning / a graph algorithm and (2) the LLM never sees the exact problem instance it is evaluated on. - The empirical results in the paper strongly support the narrati
- I found the exposition introducing the problem to be a bit confusing. It wasn’t clear to me whether pattern matching is a desirable or undesirable property of transformers (is it capturing overfitting or generalizing?) The abstract suggests that surface-level pattern matching is bad, but perhaps that deeper pattern matching (which survives multiple logical steps) is a good thing. - I am also confused about why it is interesting to understand pattern matching in LLMs. I’m not sure how the toy p
The domain setup seems to eliminate other potential sources of information cleanly. The definition of functional equivalence and specifically k-equivalence are simple and naturalistic definitions. The large sweep over a variety of dataset sizes is also helpful for determining the role of data access.
The abstract and first paragraph of the introduction do not make it clear enough that “pattern matching" is undesirable. The first sentence could be read as “pattern matching" performed by LLMs as being too surface level. This reading recontextualizes later uses of the term to be neutral rather than negative, confusing such a reader. It should be made more clear that “pattern matching" specifically is being used to exclusively refer to undesirably syntactic/surface level heuristics. "Functional
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEconomic, financial, and policy analysis · Italy: Economic History and Contemporary Issues · Economic Policies and Impacts
MethodsSparse Evolutionary Training
