TL;DR
This paper introduces TextCause, a novel method for estimating the causal effects of linguistic properties from observational data, addressing challenges of noisy proxies and text adjustment, with applications in review sentiment and bureaucratic response times.
Contribution
It formalizes the causal effect of linguistic properties, proposes a bias-bounded estimator using language models and distant supervision, and introduces the TextCause algorithm.
Findings
TextCause outperforms related methods in semi-simulated experiments.
The method effectively estimates the impact of review sentiment on sales.
Applied case study shows politeness affects bureaucratic response times.
Abstract
We consider the problem of using observational data to estimate the causal effects of linguistic properties. For example, does writing a complaint politely lead to a faster response time? How much will a positive product review increase sales? This paper addresses two technical challenges related to the problem before developing a practical method. First, we formalize the causal quantity of interest as the effect of a writer's intent, and establish the assumptions necessary to identify this from observational data. Second, in practice, we only have access to noisy proxies for the linguistic properties of interest -- e.g., predictions from classifiers and lexicons. We propose an estimator for this setting and prove that its bias is bounded when we perform an adjustment for the text. Based on these results, we introduce TextCause, an algorithm for estimating causal effects of linguistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
