ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence

Dmitriy Rivkin; Gregory Dudek; Nikhil Kakodkar; David Meger; Oliver; Limoyo; Xue Liu; Francois Hogan

arXiv:2302.07931·cs.RO·February 17, 2023

ANSEL Photobot: A Robot Event Photographer with Semantic Intelligence

Dmitriy Rivkin, Gregory Dudek, Nikhil Kakodkar, David Meger, Oliver, Limoyo, Xue Liu, Francois Hogan

PDF

Open Access

TL;DR

This paper presents ANSEL Photobot, a robot photographer that uses language and vision models to semantically understand events and capture relevant photos, outperforming existing methods in human evaluations.

Contribution

It introduces a novel approach combining language and vision models for semantic awareness in robotic photography, enabling event-specific photo documentation.

Findings

01

Generated photo portfolios are rated more appropriate by humans.

02

The method leverages recent advances in language and vision-language models.

03

The approach improves semantic relevance of captured photos.

Abstract

Our work examines the way in which large language models can be used for robotic planning and sampling, specifically the context of automated photographic documentation. Specifically, we illustrate how to produce a photo-taking robot with an exceptional level of semantic awareness by leveraging recent advances in general purpose language (LM) and vision-language (VLM) models. Given a high-level description of an event we use an LM to generate a natural-language list of photo descriptions that one would expect a photographer to capture at the event. We then use a VLM to identify the best matches to these descriptions in the robot's video stream. The photo portfolios generated by our method are consistently rated as more appropriate to the event by human evaluators than those generated by existing methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning