GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Woochang Sim; Hyunseok Ryu; Kyungmin Choi; Sungwon Han; Sundong Kim

arXiv:2505.20672·cs.AI·May 28, 2025

GIFARC: Synthetic Dataset for Leveraging Human-Intuitive Analogies to Elevate AI Reasoning

Woochang Sim, Hyunseok Ryu, Kyungmin Choi, Sungwon Han, Sundong Kim

PDF

Open Access 3 Reviews

TL;DR

This paper introduces GIFARC, a synthetic dataset that incorporates human-like analogies into AI reasoning tasks to improve the performance of models on the challenging Abstraction and Reasoning Corpus (ARC).

Contribution

GIFARC is the first dataset to embed human-intuitive analogies into ARC-style tasks, guiding AI models to reason analogically and reducing problem complexity.

Findings

01

Guided LLMs adopt more human-like reasoning strategies.

02

GIFARC improves model accuracy on ARC-style tasks.

03

Analogic approach reduces problem complexity.

Abstract

The Abstraction and Reasoning Corpus (ARC) poses a stringent test of general AI capabilities, requiring solvers to infer abstract patterns from only a handful of examples. Despite substantial progress in deep learning, state-of-the-art models still achieve accuracy rates of merely 40-55% on 2024 ARC Competition, indicative of a significant gap between their performance and human-level reasoning. In this work, we seek to bridge that gap by introducing an analogy-inspired ARC dataset, GIFARC. Leveraging large language models (LLMs) and vision-language models (VLMs), we synthesize new ARC-style tasks from a variety of GIF images that include analogies. Each new task is paired with ground-truth analogy, providing an explicit mapping between visual transformations and everyday concepts. By embedding robust human-intuitive analogies into ARC-style tasks, GIFARC guides AI agents to evaluate…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

-The authors investigate a highly relevant problem. The struggles of modern models with ARC-style tasks is indicative of a substantial gap in modern AI architectures or training approaches. -The idea to extract visual analogies from GIF sources is original and promising.

Weaknesses

While the idea of the paper is quite promising, I believe that there are very substantial flaws with experimental evaluation. Some crucial aspects of evaluation are missing. For example, it's crucial to check that the generated problems are human-solvable. The provided human evaluations are insufficient. Not only the sample is very small (three experts and 12 problems), but also analogy description is very different from solvability. For example, while a human can, perhaps with some difficu

Reviewer 02Rating 2Confidence 5

Strengths

- The idea to use GIFs as a source of analogical transformations is interesting, and intuitively makes sense given that GIFs often contain visual motion, which is one of the important priors in ARC tasks. - Synthetic data generation is a promising framework for improving abstract reasoning in LLMs. - The fine-tuned model is evaluated on both ARC-AGI-1 and ARC-AGI-2.

Weaknesses

- The primary weakness is that fine-tuning on the GIFARC dataset does not yield significant improvements. On ARC-AGI-1, the fine-tuning only improves performance from 0.2% to 2.9%, which is a very small improvement, and very poor performance both before and after fine-tuning. On ARC-AGI-2, performance is 0% both before and after fine-tuning. These results do not suggest that the dataset is successful at improving performance on ARC tasks. - The LLM-evaluated similarity results shown in Figure 3

Reviewer 03Rating 2Confidence 4

Strengths

The paper attempts to incorporate analogical reasoning into a method for improving LLM ARC solvers, in an original way.

Weaknesses

There are two main weaknesses: First, it was a struggle to understand this paper---there are many aspects of it I found unclear. It would really help to have a running example to illustrate the steps of starting with a GIF and generating a GIFARC task. It would also be really helpful to have examples for the "full-description", "without analogy", and "without analogy and solution" data. Also, Section 4.2 was particularly unclear. Second, the authors make claims that are not clearly su

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · AI-based Problem Solving and Planning · Time Series Analysis and Forecasting