LLM-ProS: Analyzing Large Language Models' Performance in Competitive Problem Solving
Md Sifat Hossain, Anika Tabassum, Md. Fahim Arefin, Tarannum Shaila Zaman

TL;DR
This paper introduces LLM-ProS, a new evaluation method for assessing large language models' performance on competitive programming problems, revealing strengths and limitations of current models through benchmarking on ICPC problems.
Contribution
It presents a novel evaluation technique, LLM-ProS, and benchmarks multiple LLMs on a curated dataset of ICPC problems, providing insights into their reasoning and problem-solving capabilities.
Findings
Significant performance differences among models in solving ICPC problems.
Impact of training methods and reasoning techniques on model accuracy.
Insights into optimizing LLMs for algorithmic problem-solving.
Abstract
The rapid advancement of large language models has opened new avenues for automating complex problem-solving tasks such as algorithmic coding and competitive programming. This paper introduces a novel evaluation technique, LLM-ProS, to assess the performance of state-of-the-art LLMs on International Collegiate Programming Contest (ICPC) problems. Using a curated dataset of 166 World Finals problems from 2011 to 2024, we benchmark the models' reasoning, accuracy, and efficiency. We evaluate the five models-GPT-4o, Mistral Large, Llama-3.1-405B, and the o1 family, consisting of o1-mini and o1-preview, across critical metrics like correctness, resource utilization, and response calibration. Our results reveal significant differences in the models' abilities to generalize, adapt, and solve novel problems. We also investigated the impact of training methodologies, dataset contamination, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
