Adapting, Fast and Slow: Transportable Circuits for Few-Shot Learning
Kasra Jalaldoust, Elias Bareinboim

TL;DR
This paper introduces a novel approach for zero-shot and few-shot learning across domains using transportable circuits based on causal graphs, enabling effective generalization with limited target data.
Contribution
It proposes Circuit-TR, a method leveraging causal transportability theory for domain adaptation and zero-shot generalization, with theoretical analysis and practical algorithms.
Findings
Theoretical characterization of few-shot learnability via circuit transportability.
Circuit-TR effectively adapts to new domains with limited data.
Simulations validate the theoretical insights and method effectiveness.
Abstract
Generalization across the domains is not possible without asserting a structure that constrains the unseen target domain w.r.t. the source domain. Building on causal transportability theory, we design an algorithm for zero-shot compositional generalization which relies on access to qualitative domain knowledge in form of a causal graph for intra-domain structure and discrepancies oracle for inter-domain mechanism sharing. \textit{Circuit-TR} learns a collection of modules (i.e., local predictors) from the source data, and transport/compose them to obtain a circuit for prediction in the target domain if the causal structure licenses. Furthermore, circuit transportability enables us to design a supervised domain adaptation scheme that operates without access to an explicit causal structure, and instead uses limited target data. Our theoretical results characterize classes of few-shot…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The work appears to be technically solid 2. The motivation for studying compositionality at circuit/mechanism level is clear
1. Broader context is unclear: the paper does not convincingly show how the proposed framework could be useful in real applications or what kinds of problems would benefit from it. It reads primarily as a technically detailed but narrowly scoped work rather than a conceptual contribution 2. Empirical results are weak both in breadth and depth; they appear to be sanity checks as opposed to an in-depth stress-tests of the proposals 3. Missing related work: the idea of reusing and composing operato
There are just too many mistakes in the draft and thus I did not see any strengths of the paper.
1. Some of the theory part is wrong. 2. The claim of ``causal'' graph is wrong. 3. The experiment is weak and only simple simulation is considered and from the current model presentation I can hardly believe that the proposed model would work for a real-world setting.
- The link between causal transportability theory and circuit complexity is novel. The authors map few-shot adaptation rates to circuit size complexity and reframe sample-efficiency in terms of structural complexity. - The work is theoretically sound: the authors prove strong control of excess risk for both structure-informed (Circuit-TR, Theorem 2.7) agnostic and agnostic adaptation (Circuit-AD, Theorem 3.2). - The proposed gradient-based surrogate for Circuit-AD borrows attention-like compone
- Empirical validation is only on synthetic arithmetic sequences with a small number of observed variables ($T = 10$) and a small vocabulary size. The gap between these experiments and real domain adaptation challenges is huge. While real-world experiments are not necessary, more empirical evaluation on synthetic sequences of varying $T$ and $\mathcal{V}$ would help understand the computational viability of the proposed algorithms, especially since Circuit-AD is exponential in $T$ (if I underst
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Advanced Graph Neural Networks · Topic Modeling
