SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Kechen Li; Wenqi Zhu; Coralia Cartis; Tianbo Ji; Shiwei Liu

arXiv:2502.20545·cs.LG·March 3, 2025

SoS1: O1 and R1-Like Reasoning LLMs are Sum-of-Square Solvers

Kechen Li, Wenqi Zhu, Coralia Cartis, Tianbo Ji, Shiwei Liu

PDF

TL;DR

This paper demonstrates that large language models can be guided to solve complex polynomial nonnegativity problems, achieving high accuracy with minimal fine-tuning and structured reasoning instructions.

Contribution

The introduction of the SoS-1K dataset and expert-designed reasoning instructions enables LLMs to effectively address a computationally intractable problem, advancing mathematical reasoning capabilities.

Findings

01

Structured instructions significantly improve LLM accuracy.

02

Fine-tuned 7B model outperforms larger models in polynomial nonnegativity tasks.

03

LLMs can be trained efficiently to solve NP-hard problems with minimal resources.

Abstract

Large Language Models (LLMs) have achieved human-level proficiency across diverse tasks, but their ability to perform rigorous mathematical problem solving remains an open challenge. In this work, we investigate a fundamental yet computationally intractable problem: determining whether a given multivariate polynomial is nonnegative. This problem, closely related to Hilbert's Seventeenth Problem, plays a crucial role in global polynomial optimization and has applications in various fields. First, we introduce SoS-1K, a meticulously curated dataset of approximately 1,000 polynomials, along with expert-designed reasoning instructions based on five progressively challenging criteria. Evaluating multiple state-of-the-art LLMs, we find that without structured guidance, all models perform only slightly above the random guess baseline 50%. However, high-quality reasoning instructions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.