Universal Algorithm-Implicit Learning
Stefano Woerner, Seong Joon Oh, Christian F. Baumgartner

TL;DR
This paper introduces a theoretical framework for universal meta-learning and presents TAIL, a transformer-based algorithm-implicit meta-learner that generalizes across diverse tasks, domains, and modalities with state-of-the-art performance.
Contribution
The paper defines practical universality in meta-learning, introduces a new framework distinguishing algorithm-explicit and algorithm-implicit learning, and develops TAIL, a versatile meta-learner with novel encoding and processing techniques.
Findings
TAIL achieves state-of-the-art results on few-shot benchmarks.
TAIL generalizes to unseen domains and modalities, including text classification from images.
TAIL handles larger label spaces and offers significant computational efficiency.
Abstract
Current meta-learning methods are constrained to narrow task distributions with fixed feature and label spaces, limiting applicability. Moreover, the current meta-learning literature uses key terms like "universal" and "general-purpose" inconsistently and lacks precise definitions, hindering comparability. We introduce a theoretical framework for meta-learning which formally defines practical universality and introduces a distinction between algorithm-explicit and algorithm-implicit learning, providing a principled vocabulary for reasoning about universal meta-learning methods. Guided by this framework, we present TAIL, a transformer-based algorithm-implicit meta-learner that functions across tasks with varying domains, modalities, and label configurations. TAIL features three innovations over prior transformer-based meta-learners: random projections for cross-modal feature encoding,…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The motivation is clear and strong. 2. The paper’s claims are well-supported by both strong theoretical grounding and comprehensive empirical validation. - The theoretical categories of algorithm-implicit vs. explicit learning are novel. - Experiments cover a broad evaluations: in-domain, cross-domain, cross-modality, and label extrapolation, with ablation studies validating the design choices. Also, the computation is efficient.
1. Pretrained encoder dependency: Although random projections help, the reliance on large pretrained encoders complicates the claim of “from-scratch” universality, as part of the final performance may stem from the backbone’s prior knowledge. The importance of this encoder component should be explored more thoroughly. For example, by using the same pretrained encoder for both TAIL and the baselines to isolate its contribution. 2. Computation cost in experiments: While the paper reports efficien
The overall architecture design makes the model flexible and generalizable to various input modalities and cardinalities. Specifically, the authors have devised random permutation mappings on top of the encoded features from the inputs, achieving the benefits of preventing the model overfitting over fixed-structured features, as well as implicitly realizing input augmentations which encourage the model robustness. Furthermore, a global learnable dictionary is employed to enable the model to ada
The proposed method is fundamentally the same as the model-based meta-learning methods that can be dated back in 2016, which the authors have identified under the Related Work section (with LSTMs or transformers): the few-shot training samples with labels are provided in together with the query sample as a sequence to the model, which directly predicts the label for the query. While the authors have identified short comings from prior works (e.g. not being invariant in sample ordering, generaliz
- The extension of previous approaches such as CAML and GPICL, to multiple modalities and large number of classes is interesting and in line with the current research directions in foundation models. - The distinction between algorithm-explicit and algorithm-implicit learning and the formal formulation of universality is useful and clearly described. - Experimental results show the strength of the proposed approach.
- The formulation of the meta-learning problem and particularly the task definition could be improved by also providing references to survey papers (e.g., [1], [2], etc). - The motivation behind the choice of the vision and text encoder should be clarified. An ablation experiment on different encoders can strengthen the paper, similarly to what has been done in [3]. - The claim about the computational efficiency at scale is not well supported. Additional information about the memory usage for tr
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and Data Classification · Text and Document Classification Technologies
