AirLetters: An Open Video Dataset of Characters Drawn in the Air

Rishit Dagli; Guillaume Berger; Joanna Materzynska; Ingo Bax; and Roland Memisevic

arXiv:2410.02921·cs.CV·October 7, 2024

AirLetters: An Open Video Dataset of Characters Drawn in the Air

Rishit Dagli, Guillaume Berger, Joanna Materzynska, Ingo Bax, and Roland Memisevic

PDF

Open Access

TL;DR

AirLetters is a new video dataset of humans drawing letters in the air, highlighting the challenge for current models to accurately recognize complex articulated motions that humans perform easily.

Contribution

The paper introduces AirLetters, a novel dataset for evaluating video understanding models on articulated motion recognition tasks.

Findings

01

State-of-the-art models perform poorly on AirLetters

02

Current models lag behind human performance in recognizing air-drawn letters

03

Articulated motion recognition remains an open challenge for end-to-end learning

Abstract

We introduce AirLetters, a new video dataset consisting of real-world videos of human-generated, articulated motions. Specifically, our dataset requires a vision model to predict letters that humans draw in the air. Unlike existing video datasets, accurate classification predictions for AirLetters rely critically on discerning motion patterns and on integrating long-range information in the video over time. An extensive evaluation of state-of-the-art image and video understanding models on AirLetters shows that these methods perform poorly and fall far behind a human baseline. Our work shows that, despite recent progress in end-to-end video understanding, accurate representations of complex articulated motions -- a task that is trivial for humans -- remains an open problem for end-to-end learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques