Efficient Learning for Product Attributes with Compact Multimodal Models

Mandar Kulkarni

arXiv:2507.19679·cs.CV·July 29, 2025

Efficient Learning for Product Attributes with Compact Multimodal Models

Mandar Kulkarni

PDF

Open Access

TL;DR

This paper presents a label-efficient semi-supervised fine-tuning approach for compact vision-language models in e-commerce product attribute prediction, leveraging unlabeled data with minimal compute overhead.

Contribution

It introduces a novel semi-supervised fine-tuning method using PEFT and DPO that effectively utilizes unlabeled data for compact VLMs in e-commerce applications.

Findings

01

DPO-based fine-tuning outperforms supervised models on multiple verticals.

02

Accuracy improves as more unlabeled data is incorporated.

03

Method achieves efficient convergence with minimal compute.

Abstract

Image-based product attribute prediction in e-commerce is a crucial task with numerous applications. The supervised fine-tuning of Vision Language Models (VLMs) faces significant scale challenges due to the cost of manual or API based annotation. In this paper, we investigate label-efficient semi-supervised fine-tuning strategies for compact VLMs (2B-3B parameters) that leverage unlabeled product listings through Direct Preference Optimization (DPO). Beginning with a small, API-based, annotated, and labeled set, we first employ PEFT to train low-rank adapter modules. To update the adapter weights with unlabeled data, we generate multiple reasoning-and-answer chains per unlabeled sample and segregate these chains into preferred and dispreferred based on self-consistency. We then fine-tune the model with DPO loss and use the updated model for the next iteration. By using PEFT fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Text and Document Classification Technologies · Sentiment Analysis and Opinion Mining