Eliminating Agentic Workflow for Introduction Generation with Parametric Stage Tokens

Meicong Zhang; Tiancheng su; Guoxiu He

arXiv:2601.09728·cs.CL·January 16, 2026

Eliminating Agentic Workflow for Introduction Generation with Parametric Stage Tokens

Meicong Zhang, Tiancheng su, Guoxiu He

PDF

Open Access 3 Reviews

TL;DR

This paper introduces STIG, a novel method that encodes the logical structure of introduction generation directly into LLMs using stage tokens, enabling single-inference multi-stage writing without external workflows.

Contribution

The paper proposes STIG, a parametric approach that replaces external agentic workflows with stage tokens embedded in the model, improving coherence and efficiency in introduction generation.

Findings

01

STIG outperforms traditional workflows on semantic similarity.

02

STIG produces more structurally rational introductions.

03

Single-inference generation reduces complexity and errors.

Abstract

In recent years, using predefined agentic workflows to guide large language models (LLMs) for literature classification and review has become a research focus. However, writing research introductions is more challenging. It requires rigorous logic, coherent structure, and abstract summarization. Existing workflows often suffer from long reasoning chains, error accumulation, and reduced textual coherence. To address these limitations, we propose eliminating external agentic workflows. Instead, we directly parameterize their logical structure into the LLM. This allows the generation of a complete introduction in a single inference. To this end, we introduce the Stage Token for Introduction Generation (STIG). STIG converts the multiple stages of the original workflow into explicit stage signals. These signals guide the model to follow different logical roles and functions during…

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 4Confidence 4

Strengths

- The paper proposes STIG, a method that can combine multi-stage agentic workflow generation of research writing into a single inference pass. - The paper constructs a high-quality dataset from over 3,800 scientific papers from ACL main conferences, utilizing MinerU, GPT-4o, and the Semantic Scholar API.

Weaknesses

The evaluation metrics are insufficient. The reliance on automated metrics without human validation means we don't actually know if STIG produces good introductions. We only know that it produces text that scores well on these specific metrics. Furthermore, for scientific writing, one of the most important metrics that you should consider evaluating on is the factual accuracy (whether the claims in the introduction is accurate, whether there is fabricated content, etc). Furthermore, I am not sur

Reviewer 02Rating 4Confidence 3

Strengths

1. STIG outperforms several training-free agentic baselines (AutoSurvey, Outline-Writing) in structural rationality, content coverage, while using fewer tokens. 2. First work to parameterise an entire writing workflow into stage tokens. 3. Contribute a customized dataset tailored for training and testing introduction generation, derived from over 3,800 ACL main conference papers.

Weaknesses

1. Trained only on ACL NLP papers; no CV, Theory or other domains. Claims “research introductions” but evidence is NLP-only (ACL). 2. The eight stage tokens are defined for the four subsections that appear tailored to research-style papers; however, ACL also contains many dataset papers whose introductions do not necessarily follow the Background–Problem & Limitations–Method & Experimental Results & Contributions structure. 3. The 'SR' metric is aligned with STIG’s own staged structure, mak

Reviewer 03Rating 2Confidence 4

Strengths

S1. The paper introduces an original idea that embeds workflow logic directly into model parameters via parametric stage tokens. S2. This approach reduces multi-agent dependency and improves inference efficiency in structured text generation. S3. The dataset of 3,800 ACL papers is large and well-structured. It represents a meaningful contribution for future research in the field. S4. Five multi-dimensional evaluation metrics comprehensively capture semantic, structural, and narrative quality.

Weaknesses

W1. The paper lacks clarity in distinguishing between the STIG framework and the STIG fine-tuned model. While the conclusion claims STIG eliminates agentic workflows, the framework still depends on them for data construction and stage definition. W2. Fine-tuning details are incomplete. No hyperparameter settings, training configurations, or sensitivity analyses are reported. W3. A hyperparameter study is essential to confirm the stability of stage-token fine-tuning. All five evaluation metric

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Machine Learning in Materials Science · Scientific Computing and Data Management