# Generalized Hierarchical Co-Saliency Learning for Label-Efficient Tracking

**Authors:** Jie Zhao, Ying Gao, Chunjuan Bo, Dong Wang

PMC · DOI: 10.3390/s25154691 · Sensors (Basel, Switzerland) · 2025-07-29

## TL;DR

This paper introduces a weakly supervised tracking method that reduces annotation costs while maintaining strong performance in visual object tracking.

## Contribution

The novel approach uses co-saliency learning to enable label-efficient tracking with minimal manual annotations.

## Key findings

- The proposed method achieves competitive performance with only 3.33% of manual annotations.
- It outperforms prior fully supervised trackers on the TREK-150 dataset by 7.7%.
- The method is effective across multiple tracking frameworks and benchmarks.

## Abstract

Visual object tracking is one of the core techniques in human-centered artificial intelligence, which is very useful for human–machine interaction. State-of-the-art tracking methods have shown their robustness and accuracy on many challenges. However, a large amount of videos with precisely dense annotations are required for fully supervised training of their models. Considering that annotating videos frame-by-frame is a labor- and time-consuming workload, reducing the reliance on manual annotations during the tracking models’ training is an important problem to be resolved. To make a trade-off between the annotating costs and the tracking performance, we propose a weakly supervised tracking method based on co-saliency learning, which can be flexibly integrated into various tracking frameworks to reduce annotation costs and further enhance the target representation in current search images. Since our method enables the model to explore valuable visual information from unlabeled frames, and calculate co-salient attention maps based on multiple frames, our weakly supervised methods can obtain competitive performance compared to fully supervised baseline trackers, using only 3.33% of manual annotations. We integrate our method into two CNN-based trackers and a Transformer-based tracker; extensive experiments on four general tracking benchmarks demonstrate the effectiveness of our method. Furthermore, we also demonstrate the advantages of our method on egocentric tracking task; our weakly supervised method obtains 0.538 success on TREK-150, which is superior to prior state-of-the-art fully supervised tracker by 7.7%.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12349543/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12349543/full.md

## References

45 references — full list in the complete paper: https://tomesphere.com/paper/PMC12349543/full.md

---
Source: https://tomesphere.com/paper/PMC12349543