Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li; Keya Hu; Carter Larsen; Yuqing Wu; Simon Alford; Caleb; Woo; Spencer M. Dunn; Hao Tang; Michelangelo Naim; Dat Nguyen; Wei-Long; Zheng; Zenna Tavares; Yewen Pu; Kevin Ellis

arXiv:2411.02272·cs.LG·December 3, 2024·2 cites

Combining Induction and Transduction for Abstract Reasoning

Wen-Ding Li, Keya Hu, Carter Larsen, Yuqing Wu, Simon Alford, Caleb, Woo, Spencer M. Dunn, Hao Tang, Michelangelo Naim, Dat Nguyen, Wei-Long, Zheng, Zenna Tavares, Yewen Pu, Kevin Ellis

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper investigates combining induction and transduction methods for abstract reasoning, demonstrating that their ensemble achieves human-level performance on ARC by leveraging their complementary strengths.

Contribution

It introduces a combined approach of induction and transduction for neural models on ARC, showing their complementary capabilities and improved performance.

Findings

01

Inductive models excel at precise computations and concept composition.

02

Transductive models perform better on perceptual concepts.

03

Ensembling both approaches achieves human-level performance.

Abstract

When learning an input-output mapping from very few examples, is it better to first infer a latent function that explains the examples, or is it better to directly predict new test outputs, e.g. using a neural network? We study this question on ARC by training neural models for induction (inferring latent functions) and transduction (directly predicting the test output for a given test input). We train on synthetically generated variations of Python programs that solve ARC training tasks. We find inductive and transductive models solve different kinds of test problems, despite having the same training problems and sharing the same neural architecture: Inductive program synthesis excels at precise computations, and at composing multiple concepts, while transduction succeeds on fuzzier perceptual concepts. Ensembling them approaches human-level performance on ARC.

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 4

Strengths

* The study of the different types of problems that induction and transduction solve is of great conceptual interest * The synthetic dataset and it's generation pipeline has value that can be of independent interest * They achieve compelling performance (40%) with LLama 3.1, matching or beating stronger models GPT-4o, Claude 3.5 * The illustrations are informative

Weaknesses

* A more principled understanding of the difference between induction and transduction, or a categorization of the problems they can solve seems to be missing. Adding some minimal insight into this could enhance the papers impact.

Reviewer 02Rating 3Confidence 3

Strengths

1. The authors have developed a diverse set of experiments around transduction and induction with respect to the ARC task. 2. The authors improve the performance on ARC tasks with an ensemble that includes both transduction and induction.

Weaknesses

1. The overall presentation could be improved. The presentation about the main contributions is confusing. The paper's title combines induction and transduction, but the abstract claims that inductive models and transductive models perform differently. The abstract does not summarize how the authors combine induction and transduction, nor does it provide a specific statement on the differences. 2. The paper does not provide a theoretical analysis, and the experimental validation is somewhat nar

Reviewer 03Rating 6Confidence 4

Strengths

1. The paper contributes a expanded ARC dataset generated by LLM combined with manual efforts. This is valuable to the community. 2. The paper performs a thorough evaluation of induction-based and transduction-based approach. 3. The paper is written clearly and easy to follow.

Weaknesses

1. As we know LLM often hallucinates. How do the authors quality check the generated dataset? 2. While the authors concluded that induction and transduction approaches are complementary, I hoped to see more in-depth analysis of why induction works better for some while transduction works better for the others. 3. While the authors also discussed the limitation that such method is only tested on ARC, I still think this is a weakness. I wonder the conclusion would be different for some other real-

Code & Models

Repositories

xu3kev/barc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies