Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit   Recommendation with Preference

Najmeh Forouzandehmehr; Nima Farrokhsiar; Ramin Giahi; Evren; Korpeoglu; Kannan Achan

arXiv:2409.12150·cs.IR·September 19, 2024

Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference

Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren, Korpeoglu, Kannan Achan

PDF

Open Access

TL;DR

This paper introduces a fine-tuning framework for large language models to improve personalized outfit recommendations by integrating visual data, direct feedback, and trend awareness, leading to more cohesive and stylish outfit suggestions.

Contribution

It presents a novel multimodal fine-tuning approach for LLMs that incorporates image captioning and direct preference feedback to enhance fashion recommendation accuracy.

Findings

01

Outperforms base LLM in outfit recommendation tasks

02

Effectively integrates visual and textual data for style understanding

03

Continuously improves recommendations through feedback loop

Abstract

Personalized outfit recommendation remains a complex challenge, demanding both fashion compatibility understanding and trend awareness. This paper presents a novel framework that harnesses the expressive power of large language models (LLMs) for this task, mitigating their "black box" and static nature through fine-tuning and direct feedback integration. We bridge the item visual-textual gap in items descriptions by employing image captioning with a Multimodal Large Language Model (MLLM). This enables the LLM to extract style and color characteristics from human-curated fashion images, forming the basis for personalized recommendations. The LLM is efficiently fine-tuned on the open-source Polyvore dataset of curated fashion images, optimizing its ability to recommend stylish outfits. A direct preference mechanism using negative examples is employed to enhance the LLM's decision-making…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Numerical Analysis Techniques · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis

MethodsBalanced Selection