Read, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideation

Yunju Choi; Min Song

arXiv:2605.11532·cs.AI·May 13, 2026

Read, Grep, and Synthesize: Diagnosing Cross-Domain Seed Exposure for LLM Research Ideation

Yunju Choi, Min Song

PDF

1 Repo

TL;DR

This paper investigates how cross-domain seed retrieval influences LLM ideation, finding that diversity helps but semantic relevance is not yet reliably exploited, with tools and datasets released for further research.

Contribution

It introduces a three-stage pipeline for seed extraction, retrieval, and synthesis, demonstrating the impact of diverse seed exposure on LLM ideation.

Findings

01

Cross-domain retrieval increases seed diversity and novelty.

02

Tool-augmented extraction improves seed specificity.

03

Diverse seeds enhance ideation but semantic relevance is underutilized.

Abstract

The discovery of novel methodologies for emerging problems is a continuing cycle in ML, often driven by the migration of techniques across domains. Building on this observation, we ask whether current LLM ideation systems benefit from targeted cross-domain retrieval or simply from exposure to diverse mechanisms. We study this question through PaperGym, a three-stage pipeline: (1) tool-augmented seed extraction via read, grep, and bash over an isolated paper environment, (2) cross-domain seed retrieval via paraphrasing across seven ML domains, and (3) method synthesis from retrieved seeds, each scored by rubric-based judges. Tool-augmented extraction improves specificity, and paraphrase-based retrieval broadens domain coverage. In synthesis, cross-domain retrieval receives more pairwise novelty wins than no-retrieval and same-domain baselines, but shows no significant difference from a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yunjoochoi/PaperGym
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.