Independent Density Estimation

Jiahao Liu; Senhao Cao

arXiv:2512.10067·cs.CV·December 23, 2025

Independent Density Estimation

Jiahao Liu, Senhao Cao

PDF

Open Access

TL;DR

This paper introduces Independent Density Estimation (IDE), a novel approach for improving compositional generalization in vision-language models by learning the connection between words and image features, demonstrated through two models and an entropy-based inference method.

Contribution

The paper proposes IDE, a new method for enhancing compositional generalization in vision-language models, with two models utilizing disentangled features and a novel inference technique.

Findings

01

Models outperform existing methods on unseen compositions

02

Disentangled representations improve generalization

03

Entropy-based inference effectively combines word predictions

Abstract

Large-scale Vision-Language models have achieved remarkable results in various domains, such as image captioning and conditioned image generation. Nevertheless, these models still encounter difficulties in achieving human-like compositional generalization. In this study, we propose a new method called Independent Density Estimation (IDE) to tackle this challenge. IDE aims to learn the connection between individual words in a sentence and the corresponding features in an image, enabling compositional generalization. We build two models based on the philosophy of IDE. The first one utilizes fully disentangled visual representations as input, and the second leverages a Variational Auto-Encoder to obtain partially disentangled features from raw images. Additionally, we propose an entropy-based compositional inference method to combine predictions of each word in the sentence. Our models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning