COMMUNITY-CROSS-INSTRUCT: Unsupervised Instruction Generation for Aligning Large Language Models to Online Communities
Zihao He, Minh Duc Chu, Rebecca Dorn, Siyi Guo, Kristina Lerman

TL;DR
Community-Cross-Instruct is an unsupervised framework that uses online community discussions to fine-tune large language models, enabling automated, scalable, and cost-effective representation and alignment of community beliefs without human-labeled instructions.
Contribution
It introduces a fully unsupervised method for aligning LLMs to online communities, eliminating the need for human-authored instructions and improving scalability.
Findings
Accurately models political and diet communities on Reddit.
Outperforms prior methods requiring human instructions.
Enables automated and cost-effective community surveying.
Abstract
Social scientists use surveys to probe the opinions and beliefs of populations, but these methods are slow, costly, and prone to biases. Recent advances in large language models (LLMs) enable the creating of computational representations or "digital twins" of populations that generate human-like responses mimicking the population's language, styles, and attitudes. We introduce Community-Cross-Instruct, an unsupervised framework for aligning LLMs to online communities to elicit their beliefs. Given a corpus of a community's online discussions, Community-Cross-Instruct automatically generates instruction-output pairs by an advanced LLM to (1) finetune a foundational LLM to faithfully represent that community, and (2) evaluate the alignment of the finetuned model to the community. We demonstrate the method's utility in accurately representing political and diet communities on Reddit.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsKnowledge Management and Sharing · Topic Modeling · Wikis in Education and Collaboration
