Improving and Assessing the Fidelity of Large Language Models Alignment to Online Communities
Minh Duc Chu, Zihao He, Rebecca Dorn, Kristina Lerman

TL;DR
This paper introduces a framework for aligning large language models with online communities and systematically evaluating their fidelity across multiple social and linguistic dimensions, demonstrating applications in health and moderation.
Contribution
It presents a novel instruction-tuning approach for community alignment and a comprehensive evaluation method, applied to health-related online communities.
Findings
Aligned LLMs can identify unhealthy beliefs in online communities.
The approach differentiates communities with varying eating disorder risks.
Aligned models show potential for automated moderation and social science research.
Abstract
Large language models (LLMs) have shown promise in representing individuals and communities, offering new ways to study complex social dynamics. However, effectively aligning LLMs with specific human groups and systematically assessing the fidelity of the alignment remains a challenge. This paper presents a robust framework for aligning LLMs with online communities via instruction-tuning and comprehensively evaluating alignment across various aspects of language, including authenticity, emotional tone, toxicity, and harm. We demonstrate the utility of our approach by applying it to online communities centered on dieting and body image. We administer an eating disorder psychometric test to the aligned LLMs to reveal unhealthy beliefs and successfully differentiate communities with varying levels of eating disorder risk. Our results highlight the potential of LLMs in automated moderation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsKnowledge Management and Sharing · Wikis in Education and Collaboration · Expert finding and Q&A systems
