TL;DR
This paper investigates whether large language models can reinvent foundational algorithms by removing existing knowledge and testing their ability to generate these algorithms anew, revealing both potential and limitations.
Contribution
The study introduces an unlearning-and-reinvention pipeline and demonstrates that strong LLMs can successfully reinvent several foundational algorithms with minimal hints.
Findings
Qwen3-4B-Thinking-2507 reinvents 50-90% of algorithms depending on hints
High-level hints improve reinvention success, but complex algorithms often require more
Test-time reinforcement learning aids in reinvention of complex algorithms like Strassen
Abstract
LLMs have shown strong potential to advance scientific discovery. Whether they possess the capacity for foundational innovation, however, remains an open question. In this work, we focus on a prerequisite for foundational innovation: can LLMs reinvent foundational algorithms in computer science? Our \textit{Unlearn-and-Reinvent} pipeline applies LLM unlearning to remove a specific foundational algorithm, such as Dijkstra's or Euclid's algorithm, from an LLM's pretrained knowledge, and then tests whether the model can reinvent it in a controlled environment. To enable effective unlearning, we adopt a GRPO-based, on-policy unlearning method. Across 10 target algorithms, 3 strong open-weight models, and 3 hint levels, our experiments demonstrate that (1) the strongest model Qwen3-4B-Thinking-2507 successfully reinvents 50% of the algorithms with no hint, 70% at hint level 1, and 90% at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
