Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation
Haotian Chen, Zhiyong Xiao

TL;DR
Swin-TUNA introduces a parameter-efficient Transformer-based method for food image segmentation, achieving high accuracy with only 4% of parameters and significantly reducing computational resources needed.
Contribution
The paper presents Swin-TUNA, a novel PEFT approach that integrates multiscale adapters into Swin Transformer, enabling high-performance segmentation with minimal parameter updates.
Findings
Achieves state-of-the-art mIoU on FoodSeg103 and UECFoodPix datasets.
Reduces parameter count by 98.7% compared to FoodSAM.
Faster convergence and better generalization in low-data scenarios.
Abstract
In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
