From Unstructured Data to Demand Counterfactuals: Theory and Practice
Timothy Christensen, Giovanni Compiani

TL;DR
This paper introduces a practical toolkit that corrects bias in demand models caused by high-dimensional unstructured data proxies, ensuring valid counterfactual inference and improving substitution predictions.
Contribution
It develops a bias correction method for demand counterfactuals using high-dimensional proxies, including ML embeddings, with minimal computation and diagnostic tools.
Findings
Improved counterfactual substitution prediction in simulations.
Effective bias correction with high-dimensional data proxies.
Toolkit applicable to market and individual-level data.
Abstract
Empirical models of demand for differentiated products rely on low-dimensional product representations to capture substitution patterns. These representations are increasingly proxied by applying ML methods to high-dimensional, unstructured data, including product descriptions and images. When proxies fail to capture the true dimensions of differentiation that drive substitution, standard workflows will deliver biased counterfactuals and invalid inference. We develop a practical toolkit that corrects this bias and ensures valid inference for a broad class of counterfactuals. Our approach applies to market-level and/or individual data, requires minimal additional computation, is efficient, delivers simple formulas for standard errors, and accommodates data-dependent proxies, including embeddings from fine-tuned ML models. It can also be used with standard quantitative attributes when…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsConsumer Market Behavior and Pricing · Digital Platforms and Economics · Supply Chain and Inventory Management
