GenZ: Foundational models as latent variable generators within traditional statistical models

Marko Jojic; Nebojsa Jojic

arXiv:2512.24834·cs.AI·January 1, 2026

GenZ: Foundational models as latent variable generators within traditional statistical models

Marko Jojic, Nebojsa Jojic

PDF

Open Access

TL;DR

GenZ introduces a hybrid approach combining foundational models and statistical methods to discover interpretable semantic features that improve prediction accuracy in domain-specific tasks like real estate and movie recommendations.

Contribution

The paper presents a novel generalized EM algorithm that jointly optimizes semantic feature descriptors and statistical model parameters, leveraging foundational models as latent variable generators.

Findings

01

Achieves 12% median relative error in house price prediction, outperforming GPT-5 baseline.

02

Predicts movie embeddings with 0.59 cosine similarity, matching performance of 4000 user ratings.

03

Discovers dataset-specific features that diverge from general domain knowledge.

Abstract

We present GenZ, a hybrid model that bridges foundational models and statistical modeling through interpretable semantic features. While large language models possess broad domain knowledge, they often fail to capture dataset-specific patterns critical for prediction tasks. Our approach addresses this by discovering semantic feature descriptions through an iterative process that contrasts groups of items identified via statistical modeling errors, rather than relying solely on the foundational model's domain understanding. We formulate this as a generalized EM algorithm that jointly optimizes semantic feature descriptors and statistical model parameters. The method prompts a frozen foundational model to classify items based on discovered features, treating these judgments as noisy observations of latent binary features that predict real-valued targets through learned statistical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Recommender Systems and Techniques