Explaining Datasets in Words: Statistical Models with Natural Language   Parameters

Ruiqi Zhong; Heng Wang; Dan Klein; Jacob Steinhardt

arXiv:2409.08466·cs.AI·January 14, 2025

Explaining Datasets in Words: Statistical Models with Natural Language Parameters

Ruiqi Zhong, Heng Wang, Dan Klein, Jacob Steinhardt

PDF

Open Access 1 Repo

TL;DR

This paper introduces a flexible framework for interpreting complex statistical model parameters using natural language predicates, enabling more intuitive understanding across various data types and tasks.

Contribution

We develop a novel, model-agnostic algorithm that learns interpretable natural language parameters for diverse statistical models via gradient optimization and language model prompting.

Findings

01

Effective interpretation of high-dimensional parameters

02

Versatile application across text and visual data

03

Outperforms classical interpretability methods

Abstract

To make sense of massive data, we often fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster. However, these parameters are often high-dimensional and hard to interpret. To make model parameters directly interpretable, we introduce a family of statistical models -- including clustering, time series, and classification models -- parameterized by natural language predicates. For example, a cluster of text about COVID could be parameterized by the predicate "discusses COVID". To learn these statistical models effectively, we develop a model-agnostic algorithm that optimizes continuous relaxations of predicate parameters with gradient descent and discretizes them by prompting language models (LMs). Finally, we apply our framework to a wide range of problems: taxonomizing user chat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruiqi-zhong/nlparam
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Computational and Text Analysis Methods

MethodsFocus