Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake   Monitoring

Jianing Qiu; Frank P.-W. Lo; Xiao Gu; Modou L. Jobarteh; Wenyan Jia,; Tom Baranowski; Matilda Steiner-Asiedu; Alex K. Anderson; Megan A McCrory,; Edward Sazonov; Mingui Sun; Gary Frost; Benny Lo

arXiv:2107.00372·cs.CV·March 2, 2023

Egocentric Image Captioning for Privacy-Preserved Passive Dietary Intake Monitoring

Jianing Qiu, Frank P.-W. Lo, Xiao Gu, Modou L. Jobarteh, Wenyan Jia,, Tom Baranowski, Matilda Steiner-Asiedu, Alex K. Anderson, Megan A McCrory,, Edward Sazonov, Mingui Sun, Gary Frost, Benny Lo

PDF

TL;DR

This paper introduces a privacy-preserving egocentric image captioning method for dietary monitoring, converting images into descriptive text to assess intake while safeguarding privacy in real-world settings.

Contribution

It presents a novel transformer-based architecture for egocentric dietary image captioning and creates a new dataset for real-life dietary assessment.

Findings

01

Effective in converting dietary images to descriptive captions

02

Reduces privacy risks by avoiding direct image sharing

03

First application of image captioning for dietary assessment in real-world scenarios

Abstract

Camera-based passive dietary intake monitoring is able to continuously capture the eating episodes of a subject, recording rich visual information, such as the type and volume of food being consumed, as well as the eating behaviours of the subject. However, there currently is no method that is able to incorporate these visual clues and provide a comprehensive context of dietary intake from passive recording (e.g., is the subject sharing food with others, what food the subject is eating, and how much food is left in the bowl). On the other hand, privacy is a major concern while egocentric wearable cameras are used for capturing. In this paper, we propose a privacy-preserved secure solution (i.e., egocentric image captioning) for dietary assessment with passive monitoring, which unifies food recognition, volume estimation, and scene understanding. By converting images into rich text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.