Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

Haotian Chen; Zhiyong Xiao

arXiv:2507.17347·cs.CV·July 29, 2025

Swin-TUNA : A Novel PEFT Approach for Accurate Food Image Segmentation

Haotian Chen, Zhiyong Xiao

PDF

TL;DR

Swin-TUNA introduces a parameter-efficient Transformer-based method for food image segmentation, achieving high accuracy with only 4% of parameters and significantly reducing computational resources needed.

Contribution

The paper presents Swin-TUNA, a novel PEFT approach that integrates multiscale adapters into Swin Transformer, enabling high-performance segmentation with minimal parameter updates.

Findings

01

Achieves state-of-the-art mIoU on FoodSeg103 and UECFoodPix datasets.

02

Reduces parameter count by 98.7% compared to FoodSAM.

03

Faster convergence and better generalization in low-data scenarios.

Abstract

In the field of food image processing, efficient semantic segmentation techniques are crucial for industrial applications. However, existing large-scale Transformer-based models (such as FoodSAM) face challenges in meeting practical deploymentrequirements due to their massive parameter counts and high computational resource demands. This paper introduces TUNable Adapter module (Swin-TUNA), a Parameter Efficient Fine-Tuning (PEFT) method that integrates multiscale trainable adapters into the Swin Transformer architecture, achieving high-performance food image segmentation by updating only 4% of the parameters. The core innovation of Swin-TUNA lies in its hierarchical feature adaptation mechanism: it designs separable convolutions in depth and dimensional mappings of varying scales to address the differences in features between shallow and deep networks, combined with a dynamic balancing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.