TL;DR
SeeDS introduces a novel framework combining semantic feature synthesis and diffusion models to improve zero-shot food detection, achieving state-of-the-art results on multiple datasets.
Contribution
The paper proposes the SeeDS framework with two modules that synthesize discriminative and diversified features for zero-shot food detection, addressing semantic complexity and intra-class diversity.
Findings
Achieves state-of-the-art zero-shot food detection performance on ZSFooD and UECFOOD-256 datasets.
Maintains effectiveness on general zero-shot detection datasets like PASCAL VOC and MS COCO.
Demonstrates the benefit of semantic feature synthesis and diffusion models in fine-grained recognition.
Abstract
Food detection is becoming a fundamental task in food computing that supports various multimedia applications, including food recommendation and dietary monitoring. To deal with real-world scenarios, food detection needs to localize and recognize novel food objects that are not seen during training, demanding Zero-Shot Detection (ZSD). However, the complexity of semantic attributes and intra-class feature diversity poses challenges for ZSD methods in distinguishing fine-grained food classes. To tackle this, we propose the Semantic Separable Diffusion Synthesizer (SeeDS) framework for Zero-Shot Food Detection (ZSFD). SeeDS consists of two modules: a Semantic Separable Synthesizing Module (SM) and a Region Feature Denoising Diffusion Model (RFDDM). The SM learns the disentangled semantic representation for complex food attributes from ingredients and cuisines, and synthesizes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMulti-Head Attention · Attention Is All You Need · Diffusion · Softmax · Linear Layer · Synthesizer
