AirLetters: An Open Video Dataset of Characters Drawn in the Air
Rishit Dagli, Guillaume Berger, Joanna Materzynska, Ingo Bax, and Roland Memisevic

TL;DR
AirLetters is a new video dataset of humans drawing letters in the air, highlighting the challenge for current models to accurately recognize complex articulated motions that humans perform easily.
Contribution
The paper introduces AirLetters, a novel dataset for evaluating video understanding models on articulated motion recognition tasks.
Findings
State-of-the-art models perform poorly on AirLetters
Current models lag behind human performance in recognizing air-drawn letters
Articulated motion recognition remains an open challenge for end-to-end learning
Abstract
We introduce AirLetters, a new video dataset consisting of real-world videos of human-generated, articulated motions. Specifically, our dataset requires a vision model to predict letters that humans draw in the air. Unlike existing video datasets, accurate classification predictions for AirLetters rely critically on discerning motion patterns and on integrating long-range information in the video over time. An extensive evaluation of state-of-the-art image and video understanding models on AirLetters shows that these methods perform poorly and fall far behind a human baseline. Our work shows that, despite recent progress in end-to-end video understanding, accurate representations of complex articulated motions -- a task that is trivial for humans -- remains an open problem for end-to-end learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
