Continually self-improving AI

Zitong Yang

arXiv:2603.18073·cs.AI·March 20, 2026

Continually self-improving AI

Zitong Yang

PDF

Open Access

TL;DR

This paper explores methods for creating AI systems that can improve themselves continually by generating synthetic data, self-bootstrapping knowledge, and autonomously exploring new training algorithms, aiming to surpass current human-dependent limitations.

Contribution

It introduces three novel techniques: synthetic data generation for efficient knowledge update, self-generated data for pretraining, and automated search over learning algorithms to enable self-improvement.

Findings

01

Synthetic data enhances knowledge acquisition from limited sources.

02

Self-generated data can bootstrap pretraining without external models.

03

Automated algorithm search surpasses human-designed training paradigms.

Abstract

Modern language model-based AI systems are remarkably powerful, yet their capabilities remain fundamentally capped by their human creators in three key ways. First, although a model's weights can be updated via fine-tuning, acquiring new knowledge from small, specialized corpora after pretraining remains highly data-inefficient. Second, the training of these systems relies heavily on finite, human-generated data from across history. Third, the pipelines used to train AI models are confined by the algorithms that human researchers can discover and explore. This thesis takes a small step toward overcoming these inherent limitations, presenting three chapters aimed at breaking these dependencies to create continually self-improving AI. First, to overcome this data-efficiency barrier in knowledge acquisition, we propose a synthetic data approach that diversifies and amplifies small corpora…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Big Data and Digital Economy