GenZ: Foundational models as latent variable generators within traditional statistical models
Marko Jojic, Nebojsa Jojic

TL;DR
GenZ introduces a hybrid approach combining foundational models and statistical methods to discover interpretable semantic features that improve prediction accuracy in domain-specific tasks like real estate and movie recommendations.
Contribution
The paper presents a novel generalized EM algorithm that jointly optimizes semantic feature descriptors and statistical model parameters, leveraging foundational models as latent variable generators.
Findings
Achieves 12% median relative error in house price prediction, outperforming GPT-5 baseline.
Predicts movie embeddings with 0.59 cosine similarity, matching performance of 4000 user ratings.
Discovers dataset-specific features that diverge from general domain knowledge.
Abstract
We present GenZ, a hybrid model that bridges foundational models and statistical modeling through interpretable semantic features. While large language models possess broad domain knowledge, they often fail to capture dataset-specific patterns critical for prediction tasks. Our approach addresses this by discovering semantic feature descriptions through an iterative process that contrasts groups of items identified via statistical modeling errors, rather than relying solely on the foundational model's domain understanding. We formulate this as a generalized EM algorithm that jointly optimizes semantic feature descriptors and statistical model parameters. The method prompts a frozen foundational model to classify items based on discovered features, treating these judgments as noisy observations of latent binary features that predict real-valued targets through learned statistical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Recommender Systems and Techniques
