CAST: Achieving Stable LLM-based Text Analysis for Data Analytics
Jinxiang Xie, Zihao Li, Wei He, Rui Ding, Shi Han, Dongmei Zhang

TL;DR
CAST is a framework that improves the stability of large language models in text analysis tasks like summarization and tagging by constraining their reasoning process.
Contribution
It introduces a novel prompting framework combining algorithmic prompting and explicit intermediate commitments to enhance output stability.
Findings
CAST achieves up to 16.2% higher stability scores.
It maintains or improves output quality across benchmarks.
CAST outperforms all baseline methods in stability.
Abstract
Text analysis of tabular data relies on two core operations: \emph{summarization} for corpus-level theme extraction and \emph{tagging} for row-level labeling. A critical limitation of employing large language models (LLMs) for these tasks is their inability to meet the high standards of output stability demanded by data analytics. To address this challenge, we introduce \textbf{CAST} (\textbf{C}onsistency via \textbf{A}lgorithmic Prompting and \textbf{S}table \textbf{T}hinking), a framework that enhances output stability by constraining the model's latent reasoning path. CAST combines (i) Algorithmic Prompting to impose a procedural scaffold over valid reasoning transitions and (ii) Thinking-before-Speaking to enforce explicit intermediate commitments before final generation. To measure progress, we introduce \textbf{CAST-S} and \textbf{CAST-T}, stability metrics for bulleted…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
