First Place Solution to the ECCV 2024 BRAVO Challenge: Evaluating Robustness of Vision Foundation Models for Semantic Segmentation
Tommie Kerssies, Daan de Geus, Gijs Dubbelman

TL;DR
This paper presents the winning solution to the ECCV 2024 BRAVO Challenge, using vision foundation models with a simple decoder to achieve superior robustness in semantic segmentation across diverse datasets.
Contribution
The paper introduces a straightforward fine-tuning approach of vision foundation models for semantic segmentation, outperforming more complex methods in robustness evaluation.
Findings
Achieved first place in the ECCV 2024 BRAVO Challenge.
Outperformed existing complex approaches in robustness.
Demonstrated effectiveness of simple fine-tuning of foundation models.
Abstract
In this report, we present the first place solution to the ECCV 2024 BRAVO Challenge, where a model is trained on Cityscapes and its robustness is evaluated on several out-of-distribution datasets. Our solution leverages the powerful representations learned by vision foundation models, by attaching a simple segmentation decoder to DINOv2 and fine-tuning the entire model. This approach outperforms more complex existing approaches, and achieves first place in the challenge. Our code is publicly available at https://github.com/tue-mps/benchmark-vfm-ss.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI · Advanced Neural Network Applications · Multimodal Machine Learning Applications
