TriLite: Efficient Weakly Supervised Object Localization with Universal Visual Features and Tri-Region Disentanglement

Arian Sabaghi; Jos\'e Oramas

arXiv:2602.23120·cs.CV·February 27, 2026

TriLite: Efficient Weakly Supervised Object Localization with Universal Visual Features and Tri-Region Disentanglement

Arian Sabaghi, Jos\'e Oramas

PDF

Open Access

TL;DR

TriLite introduces a single-stage, parameter-efficient weakly supervised object localization framework leveraging a frozen Vision Transformer with a novel TriHead module for improved object coverage and state-of-the-art results.

Contribution

It proposes TriLite, a novel WSOL method using minimal trainable parameters and a disentanglement approach to improve object localization without extensive fine-tuning.

Findings

01

Sets new state-of-the-art on multiple datasets

02

Uses fewer than 800K trainable parameters

03

Easier to train than prior WSOL methods

Abstract

Weakly supervised object localization (WSOL) aims to localize target objects in images using only image-level labels. Despite recent progress, many approaches still rely on multi-stage pipelines or full fine-tuning of large backbones, which increases training cost, while the broader WSOL community continues to face the challenge of partial object coverage. We present TriLite, a single-stage WSOL framework that leverages a frozen Vision Transformer with Dinov2 pre-training in a self-supervised manner, and introduces only a minimal number of trainable parameters (fewer than 800K on ImageNet-1K) for both classification and localization. At its core is the proposed TriHead module, which decomposes patch features into foreground, background, and ambiguous regions, thereby improving object coverage while suppressing spurious activations. By disentangling classification and localization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Face recognition and analysis