CoMind: Towards Community-Driven Agents for Machine Learning Engineering
Sijie Li, Weiwei Sun, Shanda Li, Ameet Talwalkar, Yiming Yang

TL;DR
CoMind is a multi-agent system that leverages collective knowledge and community engagement to improve machine learning engineering, achieving state-of-the-art results on Kaggle competitions and outperforming most human competitors.
Contribution
This paper introduces CoMind, a novel multi-agent system that systematically utilizes external community knowledge for ML engineering tasks, with a new evaluation framework and superior performance.
Findings
Achieves a 36% medal rate on past Kaggle competitions.
Outperforms 92.6% of human competitors in ongoing competitions.
Places in the top 5% and 1% on multiple leaderboards.
Abstract
Large language model (LLM) agents show promise in automating machine learning (ML) engineering. However, existing agents typically operate in isolation on a given research problem, without engaging with the broader research community, where human researchers often gain insights and contribute by sharing knowledge. To bridge this gap, we introduce MLE-Live, a live evaluation framework designed to assess an agent's ability to communicate with and leverage collective knowledge from a simulated Kaggle research community. Building on this framework, we propose CoMind, a multi-agent system designed to systematically leverage external knowledge. CoMind employs an iterative parallel exploration mechanism, developing multiple solutions simultaneously to balance exploratory breadth with implementation depth. On 75 past Kaggle competitions within our MLE-Live framework, CoMind achieves a 36% medal…
Peer Reviews
Decision·ICLR 2026 Poster
1. Novelty of the main idea The idea of incorporating real-time information from external communities for machine learning engineering agents is somewhat novel and interesting. In addition to machine learning competitions, CoMind has the potential to be leveraged for performing collaborative engineering tasks where human inputs are provided as real-time guidance, and this could be a more important potential application of CoMind. 2. Presentation Overall, the manuscript is easy to follow and
1. Comparison across MLE-Live and MLE-Bench While it is an interesting idea to take inputs from external communities, the community artifacts may contain too many hints. Although I think it makes sense to utilize the information under fair settings (such as comparisons with other Kaggle participants), comparing CoMind with other baselines on MLE-Bench (Table 1) is unlikely to be an apples-to-apples comparison. I'm aware that there are the results with CoMind but without the resources, but pres
1. The paper provides a live evaluation framework, which contains not only competition, but also discussions and kernels. The paper cut the data before the competition deadline to mitigate data leakage. 2. The CoMind has an analyzer to distill knowledge from community artifacts. The paper also evaluates the eight Live competitions. Apart from quantitive analysis, the paper has some qualitative analysis on task category, winrate, and code complexity. 3. The paper provides the code and its model.
1. The evaluation metrics follow the standard MLE-Bench for the main table. The ablation study only reports the win rate. I did not see any metrics directly related to the collaborative nature of agents. The novelty of MLE-Live seems to be limited. 2. Some parts of the paper are not clear. For example, what are the tasks included in the MLE-Live? The paper did not list any statistics or any details of those competitions. The paper also seems to categorize the tasks into three levels. However, it
- The paper introduces the novel concept of community-driven evaluation and agent collaboration in machine learning engineering. MLE-Live and CoMind simulate social learning and information exchange, a key feature of real-world research. I think this is quite meaningful. - The paper is well-organized and readable. - Its strong empirical gains and reproducible open-world setup could influence future benchmarks and frameworks in agentic AI, AutoML, and collaborative reasoning. - I like the ablatio
```Gain came from the community data``` First I want to make the point that I believe a MLE-Bench with public resource is good. I have nothing against that. This weakness is mainly concern about the scaffolds proposed. In figure 4 there is the ablation where CoMind w R is much better than CoMind w/o R. CoMind w/o R is roughly equivalent to AIDE with the same resource and backend model. The improvement from scaffold itself is constrained. It would be great to see more evidences that CoMind is a
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Stream Mining Techniques · Scientific Computing and Data Management · Mobile Crowdsensing and Crowdsourcing
