Weak-Annotation of HAR Datasets using Vision Foundation Models
Marius Bock, Kristof Van Laerhoven, Michael Moeller

TL;DR
This paper introduces a clustering-based weak annotation method using vision foundation models to efficiently label wearable-based human activity recognition datasets, achieving near-supervised accuracy with minimal human effort.
Contribution
The paper presents a novel annotation pipeline leveraging vision foundation models and clustering to reduce manual labeling effort in HAR datasets, achieving high accuracy.
Findings
Achieves ~90% labeling accuracy with minimal human annotation
Matches fully-supervised classifier performance using weakly annotated data
Reduces annotation time and effort significantly
Abstract
As wearable-based data annotation remains, to date, a tedious, time-consuming task requiring researchers to dedicate substantial time, benchmark datasets within the field of Human Activity Recognition in lack richness and size compared to datasets available within related fields. Recently, vision foundation models such as CLIP have gained significant attention, helping the vision community advance in finding robust, generalizable feature representations. With the majority of researchers within the wearable community relying on vision modalities to overcome the limited expressiveness of wearable data and accurately label their to-be-released benchmark datasets offline, we propose a novel, clustering-based annotation pipeline to significantly reduce the amount of data that needs to be annotated by a human annotator. We show that using our approach, the annotation of centroid clips…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Machine Learning and Data Classification · Advanced Image and Video Retrieval Techniques
MethodsContrastive Language-Image Pre-training
