Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

Biao Zhang; Yong Cheng; Siamak Shakeri; Xinyi Wang; Min Ma; Orhan Firat

arXiv:2510.26622·cs.CL·October 31, 2025

Encoder-Decoder or Decoder-Only? Revisiting Encoder-Decoder Large Language Model

Biao Zhang, Yong Cheng, Siamak Shakeri, Xinyi Wang, Min Ma, Orhan Firat

PDF

TL;DR

This paper compares encoder-decoder and decoder-only large language models across different scales, revealing that encoder-decoder models can be competitive in performance and efficiency, challenging the current dominance of decoder-only architectures.

Contribution

It provides a comprehensive, scale-aware comparison of encoder-decoder and decoder-only LLMs, demonstrating the potential of encoder-decoder models with recent training recipes.

Findings

01

RedLLM shows strong scaling and extrapolation capabilities.

02

RedLLM achieves comparable or better downstream task performance after instruction tuning.

03

RedLLM offers substantially better inference efficiency than DecLLM.

Abstract

Recent large language model (LLM) research has undergone an architectural shift from encoder-decoder modeling to nowadays the dominant decoder-only modeling. This rapid transition, however, comes without a rigorous comparative analysis especially \textit{from the scaling perspective}, raising concerns that the potential of encoder-decoder models may have been overlooked. To fill this gap, we revisit encoder-decoder LLM (RedLLM), enhancing it with recent recipes from decoder-only LLM (DecLLM). We conduct a comprehensive comparison between RedLLM, pretrained with prefix language modeling (LM), and DecLLM, pretrained with causal LM, at different model scales, ranging from $\sim$ 150M to $\sim$ 8B. Using RedPajama V1 (1.6T tokens) for pretraining and FLAN for instruction tuning, our experiments show that RedLLM produces compelling scaling properties and surprisingly strong performance. While…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.