GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning
Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim

TL;DR
This paper introduces GIFARC, a synthetic dataset that incorporates human-like analogies into AI reasoning tasks to improve the performance of models on the challenging Abstraction and Reasoning Corpus (ARC).
Contribution
GIFARC is the first dataset to embed human-intuitive analogies into ARC-style tasks, guiding AI models to reason analogically and reducing problem complexity.
Findings
Guided LLMs adopt more human-like reasoning strategies.
GIFARC improves model accuracy on ARC-style tasks.
Analogic approach reduces problem complexity.
Abstract
The Abstraction and Reasoning Corpus (ARC) poses a stringent test of general AI capabilities, requiring solvers to infer abstract patterns from only a handful of examples. Despite substantial progress in deep learning, state-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition, indicative of a significant gap between their performance and human-level reasoning. In this work, we seek to bridge that gap by introducing an analogy-inspired ARC dataset, GIFARC. Leveraging large language models (LLMs) and vision-language models (VLMs), we synthesize new ARC-style tasks from a variety of GIF images that include analogies. Each new task is paired with ground-truth analogy, providing an explicit mapping between visual transformations and everyday concepts. By embedding robust human-intuitive analogies into ARC-style tasks, GIFARC guides AI agents to evaluate…
Peer Reviews
Decision·Submitted to ICLR 2026
-The authors investigate a highly relevant problem. The struggles of modern models with ARC-style tasks is indicative of a substantial gap in modern AI architectures or training approaches. -The idea to extract visual analogies from GIF sources is original and promising.
While the idea of the paper is quite promising, I believe that there are very substantial flaws with experimental evaluation. Some crucial aspects of evaluation are missing. For example, it's crucial to check that the generated problems are human-solvable. The provided human evaluations are insufficient. Not only the sample is very small (three experts and 12 problems), but also analogy description is very different from solvability. For example, while a human can, perhaps with some difficu
- The idea to use GIFs as a source of analogical transformations is interesting, and intuitively makes sense given that GIFs often contain visual motion, which is one of the important priors in ARC tasks. - Synthetic data generation is a promising framework for improving abstract reasoning in LLMs. - The fine-tuned model is evaluated on both ARC-AGI-1 and ARC-AGI-2.
- The primary weakness is that fine-tuning on the GIFARC dataset does not yield significant improvements. On ARC-AGI-1, the fine-tuning only improves performance from 0.2% to 2.9%, which is a very small improvement, and very poor performance both before and after fine-tuning. On ARC-AGI-2, performance is 0% both before and after fine-tuning. These results do not suggest that the dataset is successful at improving performance on ARC tasks. - The LLM-evaluated similarity results shown in Figure 3
The paper attempts to incorporate analogical reasoning into a method for improving LLM ARC solvers, in an original way.
There are two main weaknesses: First, it was a struggle to understand this paper---there are many aspects of it I found unclear. It would really help to have a running example to illustrate the steps of starting with a GIF and generating a GIFARC task. It would also be really helpful to have examples for the "full-description", "without analogy", and "without analogy and solution" data. Also, Section 4.2 was particularly unclear. Second, the authors make claims that are not clearly su
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · AI-based Problem Solving and Planning · Time Series Analysis and Forecasting
