IndoorCrowd: A Multi-Scene Dataset for Human Detection, Segmentation, and Tracking with an Automated Annotation Pipeline

Sebastian-Ion Nae; Radu Moldoveanu; Alexandra Stefania Ghita; Adina Magda Florea

arXiv:2604.02032·cs.CV·April 3, 2026

IndoorCrowd: A Multi-Scene Dataset for Human Detection, Segmentation, and Tracking with an Automated Annotation Pipeline

Sebastian-Ion Nae, Radu Moldoveanu, Alexandra Stefania Ghita, Adina Magda Florea

PDF

2 Repos 1 Datasets

TL;DR

IndoorCrowd is a comprehensive multi-scene indoor human dataset with annotations for detection, segmentation, and tracking, enabling evaluation of foundation-model auto-annotators and baseline methods across varied indoor environments.

Contribution

It introduces a large-scale, multi-scene indoor human dataset with automated annotation benchmarks and baseline evaluations, addressing the lack of real-world indoor complexity in existing datasets.

Findings

01

Auto-annotators achieve varying accuracy compared to human labels.

02

Detection, segmentation, and tracking baselines reveal scene-dependent difficulty.

03

Crowd density and occlusion significantly impact task performance.

Abstract

Understanding human behaviour in crowded indoor environments is central to surveillance, smart buildings, and human-robot interaction, yet existing datasets rarely capture real-world indoor complexity at scale. We introduce IndoorCrowd, a multi-scene dataset for indoor human detection, instance segmentation, and multi-object tracking, collected across four campus locations (ACS-EC, ACS-EG, IE-Central, R-Central). It comprises $31$ videos ( $9, 913$ frames at $5$ fps) with human-verified, per-instance segmentation masks. A $620$ -frame control subset benchmarks three foundation-model auto-annotators: SAM3, GroundingSAM, and EfficientGroundingSAM, against human labels using Cohen's $κ$ , AP, precision, recall, and mask IoU. A further $2, 552$ -frame subset supports multi-object tracking with continuous identity tracks in MOTChallenge format. We establish detection, segmentation, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

sebnae/IndoorCrowd
dataset· 106 dl
106 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.