The Unreasonable Effectiveness of Large Language-Vision Models for   Source-free Video Domain Adaptation

Giacomo Zara; Alessandro Conti; Subhankar Roy; St\'ephane; Lathuili\`ere; Paolo Rota; Elisa Ricci

arXiv:2308.09139·cs.CV·August 23, 2023·2 cites

The Unreasonable Effectiveness of Large Language-Vision Models for Source-free Video Domain Adaptation

Giacomo Zara, Alessandro Conti, Subhankar Roy, St\'ephane, Lathuili\`ere, Paolo Rota, Elisa Ricci

PDF

Open Access 1 Repo

TL;DR

This paper demonstrates that large language-vision models can effectively provide web-supervision to improve source-free video domain adaptation for action recognition, surpassing existing methods.

Contribution

It introduces DALL-V, a simple and efficient approach leveraging large language-vision models to enhance unsupervised domain adaptation without source data.

Findings

01

DALL-V significantly outperforms state-of-the-art SFVUDA methods.

02

Web-supervision from LLVMs provides robust world prior for domain adaptation.

03

The method is parameter-efficient and easy to implement.

Abstract

Source-Free Video Unsupervised Domain Adaptation (SFVUDA) task consists in adapting an action recognition model, trained on a labelled source dataset, to an unlabelled target dataset, without accessing the actual source data. The previous approaches have attempted to address SFVUDA by leveraging self-supervision (e.g., enforcing temporal consistency) derived from the target data itself. In this work, we take an orthogonal approach by exploiting "web-supervision" from Large Language-Vision Models (LLVMs), driven by the rationale that LLVMs contain a rich world prior surprisingly robust to domain-shift. We showcase the unreasonable effectiveness of integrating LLVMs for SFVUDA by devising an intuitive and parameter-efficient method, which we name Domain Adaptation with Large Language-Vision models (DALL-V), that distills the world prior and complementary source model information into a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

giaczara/dallv
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning