Failure by Interference: Language Models Make Balanced Parentheses Errors When Faulty Mechanisms Overshadow Sound Ones
Daking Rai, Samuel Miller, Kevin Moran, Ziyu Yao

TL;DR
This paper investigates why language models make balanced parentheses errors, revealing that faulty mechanisms overshadow sound ones, and introduces RASteer to enhance model accuracy by emphasizing reliable components.
Contribution
The study uncovers the internal mechanisms behind parentheses errors in LMs and proposes RASteer, a method to improve performance by boosting reliable components.
Findings
RASteer improves parentheses accuracy from 0% to ~100%.
Model performance on arithmetic reasoning increases by up to 20%.
Faulty mechanisms can dominate sound ones, causing errors.
Abstract
Despite remarkable advances in coding capabilities, language models (LMs) still struggle with simple syntactic tasks such as generating balanced parentheses. In this study, we investigate the underlying mechanisms behind the persistence of these errors across LMs of varying sizes (124M-7B) to both understand and mitigate the errors. Our study reveals that LMs rely on a number of components (attention heads and FF neurons) that independently make their own predictions. While some components reliably promote correct answers across a generalized range of inputs (i.e., implementing "sound mechanisms''), others are less reliable and introduce noise by promoting incorrect tokens (i.e., implementing "faulty mechanisms''). Errors occur when the faulty mechanisms overshadow the sound ones and dominantly affect the predictions. Motivated by this insight, we introduce RASteer, a steering method to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
