Conscious Data Contribution via Community-Driven Chain-of-Thought Distillation
Lena Libon, Meghana Bhange, Rushabh Solanki, Elliot Creager, Ulrich A\"ivodji

TL;DR
This paper explores how community-driven distillation of chain-of-thought data can improve model alignment with user goals while addressing privacy concerns, by empirically validating a new approach to model training.
Contribution
It introduces a novel community-based distillation method for chain-of-thought data that enhances model alignment and privacy, supported by empirical analysis.
Findings
Community diversity impacts distillation effectiveness.
Granularity of reasoning affects model performance.
Larger communities improve knowledge distillation results.
Abstract
The current era of AI development places a heavy emphasis on training large models on increasingly scaled-up datasets. This paradigm has catalyzed entirely new product categories, such as LLM chatbots, while also raising concerns about data privacy and consumer choice. In this paper, we consider questions of data portability and user autonomy in the context of LLMs that "reason" using chain-of-thought (CoT) traces, computing intermediate text artifacts from user input before producing a final output. We first interpret recent data privacy and portability law to argue that these intermediate computations qualify as users' personal data. Then, building on the existing framework of Conscious Data Contribution, we show how communities who receive low utility from an available model can aggregate and distill their shared knowledge into an alternate model better aligned with their goals. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI · Advanced Graph Neural Networks
