Decoding Style: Efficient Fine-Tuning of LLMs for Image-Guided Outfit Recommendation with Preference
Najmeh Forouzandehmehr, Nima Farrokhsiar, Ramin Giahi, Evren, Korpeoglu, Kannan Achan

TL;DR
This paper introduces a fine-tuning framework for large language models to improve personalized outfit recommendations by integrating visual data, direct feedback, and trend awareness, leading to more cohesive and stylish outfit suggestions.
Contribution
It presents a novel multimodal fine-tuning approach for LLMs that incorporates image captioning and direct preference feedback to enhance fashion recommendation accuracy.
Findings
Outperforms base LLM in outfit recommendation tasks
Effectively integrates visual and textual data for style understanding
Continuously improves recommendations through feedback loop
Abstract
Personalized outfit recommendation remains a complex challenge, demanding both fashion compatibility understanding and trend awareness. This paper presents a novel framework that harnesses the expressive power of large language models (LLMs) for this task, mitigating their "black box" and static nature through fine-tuning and direct feedback integration. We bridge the item visual-textual gap in items descriptions by employing image captioning with a Multimodal Large Language Model (MLLM). This enables the LLM to extract style and color characteristics from human-curated fashion images, forming the basis for personalized recommendations. The LLM is efficiently fine-tuned on the open-source Polyvore dataset of curated fashion images, optimizing its ability to recommend stylish outfits. A direct preference mechanism using negative examples is employed to enhance the LLM's decision-making…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Numerical Analysis Techniques · Handwritten Text Recognition Techniques · Generative Adversarial Networks and Image Synthesis
MethodsBalanced Selection
