Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models
Gunjan Das, Paheli Bhattacharya, Rishabh Gupta

TL;DR
This paper systematically compares general-purpose and code-specific large language models across multiple domains, revealing that code-optimized models excel in reasoning and syntax, even in non-coding tasks.
Contribution
It provides a comprehensive cross-domain benchmarking framework for LLMs, unifying linguistic, reasoning, and code understanding evaluations.
Findings
Code-optimized models like CodeLLaMA perform strongly in reasoning tasks.
Code-specific models show measurable advantages even in non-coding domains.
General-purpose models like Mistral-7B and Llama-3-8B lag behind in certain benchmarks.
Abstract
Large Language Models (LLMs) have revolutionized both general natural language processing and domain-specific applications such as code synthesis, legal reasoning, and finance. However, while prior studies have explored individual model capabilities, a systematic cross-domain comparison that unifies linguistic, reasoning, and code understanding abilities remains underexplored. In this work, we present a comprehensive evaluation of five general-purpose and three code-specific state-of-the-art LLMs across six diverse benchmarks encompassing linguistic competence, mathematical reasoning, and trustworthiness. Additionally, we analyze model behavior on the CoNaLa dataset for code explanation, comparing natural language and code-specialized LLMs. Our findings reveal that models optimized for code (e.g., CodeLLaMA variants) exhibit strong reasoning and syntactic precision, that even for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification
