Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper

Krishna Garg; Firoz Shaik; Sambaran Bandyopadhyay; Cornelia Caragea

arXiv:2508.14273·cs.CL·August 26, 2025

Let's Use ChatGPT To Write Our Paper! Benchmarking LLMs To Write the Introduction of a Research Paper

Krishna Garg, Firoz Shaik, Sambaran Bandyopadhyay, Cornelia Caragea

PDF

TL;DR

This paper benchmarks large language models' ability to generate research paper introductions, evaluating their quality across multiple metrics and demonstrating LLaMA-4 Maverick's superior performance.

Contribution

It introduces the SciIG task, curates new datasets, and provides a comprehensive evaluation framework for LLMs in research introduction generation.

Findings

01

LLaMA-4 Maverick outperforms other models on most metrics.

02

Three-shot prompting improves performance over fewer-shot approaches.

03

The framework combines automated metrics with LLM-based evaluations.

Abstract

As researchers increasingly adopt LLMs as writing assistants, generating high-quality research paper introductions remains both challenging and essential. We introduce Scientific Introduction Generation (SciIG), a task that evaluates LLMs' ability to produce coherent introductions from titles, abstracts, and related works. Curating new datasets from NAACL 2025 and ICLR 2025 papers, we assess five state-of-the-art models, including both open-source (DeepSeek-v3, Gemma-3-12B, LLaMA 4-Maverick, MistralAI Small 3.1) and closed-source GPT-4o systems, across multiple dimensions: lexical overlap, semantic similarity, content coverage, faithfulness, consistency, citation correctness, and narrative quality. Our comprehensive framework combines automated metrics with LLM-as-a-judge evaluations. Results demonstrate LLaMA-4 Maverick's superior performance on most metrics, particularly in semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.