PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data

Zheng Zhang; Zheng Ning; Chenliang Xu; Yapeng Tian; Toby Jia-Jun Li

arXiv:2307.15167·cs.HC·July 31, 2023

PEANUT: A Human-AI Collaborative Tool for Annotating Audio-Visual Data

Zheng Zhang, Zheng Ning, Chenliang Xu, Yapeng Tian, Toby Jia-Jun Li

PDF

TL;DR

Peanut is a collaborative human-AI tool that streamlines audio-visual data annotation by splitting tasks and leveraging AI models, significantly reducing effort and time while maintaining accuracy.

Contribution

The paper introduces Peanut, an innovative annotation tool that combines human input with AI to efficiently annotate audio-visual datasets, addressing a key bottleneck in data collection.

Findings

01

Significantly accelerates annotation process

02

Maintains high annotation accuracy

03

Reduces manual effort required

Abstract

Audio-visual learning seeks to enhance the computer's multi-modal perception leveraging the correlation between the auditory and visual modalities. Despite their many useful downstream tasks, such as video retrieval, AR/VR, and accessibility, the performance and adoption of existing audio-visual models have been impeded by the availability of high-quality datasets. Annotating audio-visual datasets is laborious, expensive, and time-consuming. To address this challenge, we designed and developed an efficient audio-visual annotation tool called Peanut. Peanut's human-AI collaborative pipeline separates the multi-modal task into two single-modal tasks, and utilizes state-of-the-art object detection and sound-tagging models to reduce the annotators' effort to process each frame and the number of manually-annotated frames needed. A within-subject user study with 20 participants found that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.