Ideology as a Problem: Lightweight Logit Steering for Annotator-Specific Alignment in Social Media Analysis
Wei Xia, Haowen Tang, Luozheng Li

TL;DR
This paper presents a lightweight linear probe method to measure and correct ideological misalignment in large language models for social media analysis, without retraining the entire model.
Contribution
It introduces a simple, efficient bias correction technique that aligns model outputs with user-specific ideological preferences by adjusting output probabilities based on internal features.
Findings
Effective quantification of ideological misalignment
Minimal correction improves alignment with user opinions
Preserves model reasoning while adjusting outputs
Abstract
LLMs internally organize political ideology along low-dimensional structures that are partially, but not fully aligned with human ideological space. This misalignment is systematic, model specific, and measurable. We introduce a lightweight linear probe that both quantifies the misalignment and minimally corrects the output layer. This paper introduces a simple and efficient method for aligning models with specific user opinions. Instead of retraining the model, we calculated a bias score from its internal features and directly adjusted the final output probabilities. This solution is practical and low-cost and preserves the original reasoning power of the model.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Computational and Text Analysis Methods · Mobile Crowdsensing and Crowdsourcing
