Sequence Repetition Enhances Token Embeddings and Improves Sequence Labeling with Decoder-only Language Models
Matija Luka Kuki\'c, Marko \v{C}uljak, David Duki\'c, Martin Tutek, Jan \v{S}najder

TL;DR
This paper introduces sequence repetition as a simple method to make decoder-only language models bidirectional, enhancing token embeddings and sequence labeling performance without major model modifications.
Contribution
It demonstrates that sequence repetition naturally induces bidirectionality in decoder-only models, improving token-level embeddings and sequence labeling accuracy.
Findings
Sequence repetition improves token embedding quality.
SR surpasses encoder-only models in sequence labeling.
Intermediate layer embeddings are as effective as final layers.
Abstract
Modern language models (LMs) are trained in an autoregressive manner, conditioned only on the prefix. In contrast, sequence labeling (SL) tasks assign labels to each individual input token, naturally benefiting from bidirectional context. This discrepancy has historically led SL to rely on inherently bidirectional encoder-only models. However, the rapid development of decoder-only models has raised the question of whether they can be adapted to SL. While causal mask removal has emerged as a viable technique for adapting decoder-only models to leverage the full context for SL, it requires considerable changes to the base model functionality. In this work, we explore sequence repetition (SR) as a less invasive alternative for enabling bidirectionality in decoder-only models. Through fine-tuning experiments, we show that SR inherently makes decoders bidirectional, improving the quality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education
