User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang, Guanhong Wang, Wenhao Chai, Jiayu Zhou, and Gaoang Wang

TL;DR
This paper introduces a user-aware prefix-tuning framework for personalized image captioning that efficiently leverages user context and large language models, achieving significant improvements over existing methods.
Contribution
The proposed framework uniquely combines prefix-tuning with frozen large language models to enhance personalized image captioning without extensive retraining.
Findings
Outperforms baseline models on Instagram and YFCC100M datasets
Achieves twofold improvements in BLEU-4 and CIDEr metrics
Reduces training parameters and computational costs
Abstract
Image captioning bridges the gap between vision and language by automatically generating natural language descriptions for images. Traditional image captioning methods often overlook the preferences and characteristics of users. Personalized image captioning solves this problem by incorporating user prior knowledge into the model, such as writing styles and preferred vocabularies. Most existing methods emphasize the user context fusion process by memory networks or transformers. However, these methods ignore the distinct domains of each dataset. Therefore, they need to update the entire caption model parameters when meeting new samples, which is time-consuming and calculation-intensive. To address this challenge, we propose a novel personalized image captioning framework that leverages user context to consider personality factors. Additionally, our framework utilizes the prefix-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Image Retrieval and Classification Techniques
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Discriminative Fine-Tuning · Weight Decay · Linear Layer · Cosine Annealing · Dense Connections · Linear Warmup With Cosine Annealing · Adam · Attention Dropout
