Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor   Discovery without Task Supervision

Zhouhang Xie; Tushar Khot; Bhavana Dalvi Mishra; Harshit Surana,; Julian McAuley; Peter Clark; Bodhisattwa Prasad Majumder

arXiv:2502.15147·cs.CL·April 29, 2025

Latent Factor Models Meets Instructions: Goal-conditioned Latent Factor Discovery without Task Supervision

Zhouhang Xie, Tushar Khot, Bhavana Dalvi Mishra, Harshit Surana,, Julian McAuley, Peter Clark, Bodhisattwa Prasad Majumder

PDF

TL;DR

Instruct-LF combines instruction-following large language models with statistical methods to discover hidden, goal-related concepts from noisy, unstructured datasets, enhancing interpretability and downstream task performance.

Contribution

The paper introduces Instruct-LF, a novel system that integrates LLMs with statistical models for goal-oriented latent factor discovery without task supervision.

Findings

01

Improves downstream task performance by 5-52%

02

Produces interpretable latent factors

03

Achieves higher human preference in evaluations

Abstract

Instruction-following LLMs have recently allowed systems to discover hidden concepts from a collection of unstructured documents based on a natural language description of the purpose of the discovery (i.e., goal). Still, the quality of the discovered concepts remains mixed, as it depends heavily on LLM's reasoning ability and drops when the data is noisy or beyond LLM's knowledge. We present Instruct-LF, a goal-oriented latent factor discovery system that integrates LLM's instruction-following ability with statistical models to handle large, noisy datasets where LLM reasoning alone falls short. Instruct-LF uses LLMs to propose fine-grained, goal-related properties from documents, estimates their presence across the dataset, and applies gradient-based optimization to uncover hidden factors, where each factor is represented by a cluster of co-occurring properties. We evaluate latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.