From Brute Force to Semantic Insight: Performance-Guided Data Transformation Design with LLMs
Usha Shrestha, Dmitry Ignatov, Radu Timofte

TL;DR
This paper presents a novel method for LLMs to autonomously optimize code transformations by internalizing empirical performance cues, significantly reducing search space and improving task-specific code synthesis.
Contribution
It introduces a performance-guided, closed-loop approach for LLMs to engineer data transformations without reinforcement learning or symbolic rewards, using a large annotated dataset.
Findings
Achieves up to 600x fewer candidate evaluations than brute-force methods.
Maintains competitive accuracy in code synthesis tasks.
Models internalize semantic performance cues rather than syntax.
Abstract
Large language models (LLMs) have achieved notable performance in code synthesis; however, data-aware augmentation remains a limiting factor, handled via heuristic design or brute-force approaches. We introduce a performance-aware, closed-loop solution in the NNGPT ecosystem of projects that enables LLMs to autonomously engineer optimal transformations by internalizing empirical performance cues. We fine-tune LLMs with Low-Rank Adaptation on a novel repository of more than 6,000 empirically evaluated PyTorch augmentation functions, each annotated solely by downstream model accuracy. Training uses pairwise performance ordering (better-worse transformations), enabling alignment through empirical feedback without reinforcement learning, reward models, or symbolic objectives. This reduces the need for exhaustive search, achieving up to 600x times fewer evaluated candidates than brute-force…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Machine Learning in Materials Science
