# Self-supervised structured object representation learning

**Authors:** Oussama Hadjerci, Antoine Letienne, Mohamed Abbas Hedjazi, Adel Hafiane

arXiv: 2508.19864 · 2025-08-28

## TL;DR

This paper introduces a self-supervised learning method that captures structured scene representations across multiple scales, improving object detection performance especially with limited labeled data.

## Contribution

It proposes a novel ProtoScale module for hierarchical visual representation learning that preserves scene context and enhances dense prediction tasks.

## Key findings

- Outperforms state-of-the-art methods in object detection
- Effective with limited annotated data
- Improves dense prediction tasks

## Abstract

Self-supervised learning (SSL) has emerged as a powerful technique for learning visual representations. While recent SSL approaches achieve strong results in global image understanding, they are limited in capturing the structured representation in scenes. In this work, we propose a self-supervised approach that progressively builds structured visual representations by combining semantic grouping, instance level separation, and hierarchical structuring. Our approach, based on a novel ProtoScale module, captures visual elements across multiple spatial scales. Unlike common strategies like DINO that rely on random cropping and global embeddings, we preserve full scene context across augmented views to improve performance in dense prediction tasks. We validate our method on downstream object detection tasks using a combined subset of multiple datasets (COCO and UA-DETRAC). Experimental results show that our method learns object centric representations that enhance supervised object detection and outperform the state-of-the-art methods, even when trained with limited annotated data and fewer fine-tuning epochs.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/2508.19864/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/2508.19864/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/2508.19864/full.md

---
Source: https://tomesphere.com/paper/2508.19864