TL;DR
FT-Dojo introduces an interactive benchmark environment and autonomous framework for fine-tuning large language models across multiple domains, emphasizing automation and structured feedback.
Contribution
The paper presents FT-Dojo and FT-Agent, pioneering systematic, autonomous LLM fine-tuning with standardized interfaces and iterative learning strategies.
Findings
FT-Agent achieves top performance on 10 of 13 tasks.
Agents can recover from failures through cumulative learning.
The implementation is publicly available at https://github.com/microsoft/rd-agent.
Abstract
Fine-tuning large language models for vertical domains remains labor-intensive, requiring practitioners to curate data, configure training, and iteratively diagnose model behavior. Despite growing interest in autonomous machine learning and language agents, end-to-end LLM fine-tuning has not been systematically studied as an interactive agent task. We introduce FT-Dojo, an interactive benchmark environment for autonomous LLM fine-tuning, comprising 13 tasks across 5 domains. Rather than a new collection of static datasets, FT-Dojo standardizes a task interface, shared raw-data repository, sandboxed execution environment, structured feedback protocol, and held-out evaluation procedure. We further develop FT-Agent, a fine-tuning-oriented autonomous framework that uses structured iteration planning, fail-fast validation, and multi-level feedback analysis to refine data and training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
