CFM: Language-aligned Concept Foundation Model for Vision

Kai Wittenmayer; Sukrut Rao; Amin Parchami-Araghi; Bernt Schiele; Jonas Fischer

arXiv:2601.13798·cs.CV·March 18, 2026

CFM: Language-aligned Concept Foundation Model for Vision

Kai Wittenmayer, Sukrut Rao, Amin Parchami-Araghi, Bernt Schiele, Jonas Fischer

PDF

Open Access

TL;DR

This paper introduces CFM, a vision foundation model that offers human-interpretable, spatially grounded concepts for better explainability across various vision tasks, while maintaining competitive performance.

Contribution

The work presents a novel model that provides fine-grained, spatially grounded concepts for vision tasks, enhancing interpretability without sacrificing accuracy.

Findings

01

CFM achieves competitive performance on classification, segmentation, and captioning.

02

Provides high-quality, fine-grained concept explanations.

03

Enables richer explanations through concept relationship analysis.

Abstract

Language-aligned vision foundation models perform strongly across diverse downstream tasks. Yet, their learned representations remain opaque, making interpreting their decision-making difficult. Recent work decompose these representations into human-interpretable concepts, but provide poor spatial grounding and are limited to image classification tasks. In this work, we propose CFM, a language-aligned concept foundation model for vision that provides fine-grained concepts, which are human-interpretable and spatially grounded in the input image. When paired with a foundation model with strong semantic representations, we get explanations for any of its downstream tasks. Examining local co-occurrence dependencies of concepts allows us to define concept relationships through which we improve concept naming and obtain richer explanations. On benchmark data, we show that CFM provides…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning