Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Gunjan Das; Paheli Bhattacharya; Rishabh Gupta

arXiv:2512.04673·cs.SE·December 5, 2025

Cross-Task Benchmarking and Evaluation of General-Purpose and Code-Specific Large Language Models

Gunjan Das, Paheli Bhattacharya, Rishabh Gupta

PDF

Open Access

TL;DR

This paper systematically compares general-purpose and code-specific large language models across multiple domains, revealing that code-optimized models excel in reasoning and syntax, even in non-coding tasks.

Contribution

It provides a comprehensive cross-domain benchmarking framework for LLMs, unifying linguistic, reasoning, and code understanding evaluations.

Findings

01

Code-optimized models like CodeLLaMA perform strongly in reasoning tasks.

02

Code-specific models show measurable advantages even in non-coding domains.

03

General-purpose models like Mistral-7B and Llama-3-8B lag behind in certain benchmarks.

Abstract

Large Language Models (LLMs) have revolutionized both general natural language processing and domain-specific applications such as code synthesis, legal reasoning, and finance. However, while prior studies have explored individual model capabilities, a systematic cross-domain comparison that unifies linguistic, reasoning, and code understanding abilities remains underexplored. In this work, we present a comprehensive evaluation of five general-purpose and three code-specific state-of-the-art LLMs across six diverse benchmarks encompassing linguistic competence, mathematical reasoning, and trustworthiness. Additionally, we analyze model behavior on the CoNaLa dataset for code explanation, comparing natural language and code-specialized LLMs. Our findings reveal that models optimized for code (e.g., CodeLLaMA variants) exhibit strong reasoning and syntactic precision, that even for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Text Readability and Simplification