Aligning Language Models with Observational Data: Opportunities and Risks from a Causal Perspective
Erfan Loghmani

TL;DR
This paper explores how observational data can be used to better align large language models with human preferences by addressing confounding issues through causal methods, improving model reliability and performance.
Contribution
It introduces DeconfoundLM, a novel method that removes confounders from observational data, enhancing causal learning and model alignment.
Findings
DeconfoundLM improves causal relationship recovery in simulations.
Using observational data with causal corrections enhances model alignment.
Naive use of observational data can lead to learning spurious correlations.
Abstract
Large language models are being widely used across industries to generate content that contributes directly to key performance metrics, such as conversion rates. Pretrained models, however, often fall short when it comes to aligning with human preferences or optimizing for business objectives. As a result, fine-tuning with good-quality labeled data is essential to guide models to generate content that achieves better results. Controlled experiments, like A/B tests, can provide such data, but they are often expensive and come with significant engineering and logistical challenges. Meanwhile, companies have access to a vast amount of historical (observational) data that remains underutilized. In this work, we study the challenges and opportunities of fine-tuning LLMs using observational data. We show that while observational outcomes can provide valuable supervision, directly fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
