TL;DR
This paper introduces a multi-camera unsupervised domain adaptation pipeline for object detection in cultural sites, combining adversarial learning and self-training to improve generalization across different real and synthetic datasets.
Contribution
It proposes a novel domain adaptation method that aligns feature and pixel domains and incorporates self-training, outperforming existing approaches.
Findings
Outperforms current state-of-the-art domain adaptation methods
Effectively generalizes object detection across multiple real camera domains
Provides a new dataset for multi-camera domain adaptation in cultural sites
Abstract
Object detection algorithms allow to enable many interesting applications which can be implemented in different devices, such as smartphones and wearable devices. In the context of a cultural site, implementing these algorithms in a wearable device, such as a pair of smart glasses, allow to enable the use of augmented reality (AR) to show extra information about the artworks and enrich the visitors' experience during their tour. However, object detection algorithms require to be trained on many well annotated examples to achieve reasonable results. This brings a major limitation since the annotation process requires human supervision which makes it expensive in terms of time and costs. A possible solution to reduce these costs consist in exploiting tools to automatically generate synthetic labeled images from a 3D model of the site. However, models trained with synthetic data do not…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSelf training multi target domain adaptive RetinaNet
