Voices in a Crowd: Searching for Clusters of Unique Perspectives
Nikolas Vitsakis, Amit Parekh, Ioannis Konstas

TL;DR
This paper introduces a framework for identifying and clustering diverse perspectives in language models, effectively capturing minority opinions without relying on annotator metadata, and validated through quantitative and qualitative analyses.
Contribution
The proposed method trains models without annotator metadata, extracts behavior-informed embeddings, and clusters opinions to identify voices, including minority perspectives, demonstrating robustness and generalization.
Findings
Clusters effectively capture minority perspectives.
Framework generalizes well across datasets.
Clusters are validated with quantitative and qualitative metrics.
Abstract
Language models have been shown to reproduce underlying biases existing in their training data, which is the majority perspective by default. Proposed solutions aim to capture minority perspectives by either modelling annotator disagreements or grouping annotators based on shared metadata, both of which face significant challenges. We propose a framework that trains models without encoding annotator metadata, extracts latent embeddings informed by annotator behaviour, and creates clusters of similar opinions, that we refer to as voices. Resulting clusters are validated post-hoc via internal and external quantitative metrics, as well a qualitative analysis to identify the type of voice that each cluster represents. Our results demonstrate the strong generalisation capability of our framework, indicated by resulting clusters being adequately robust, while also capturing minority…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsOnline and Blended Learning · Educational Tools and Methods · Education and Critical Thinking Development
