Multi-modal Learnable Queries for Image Aesthetics Assessment
Zhiwei Xiong, Yunfan Zhang, Zhiqi Shen, Peiran Ren, Han Yu

TL;DR
This paper introduces MMLQ, a multi-modal learnable query method that leverages visual and textual features from pre-trained models to improve image aesthetics assessment, achieving state-of-the-art results.
Contribution
The paper proposes a novel multi-modal learnable query framework that effectively combines visual and textual features for enhanced image aesthetics assessment.
Findings
MMLQ outperforms previous methods by 7.7% in SRCC.
MMLQ outperforms previous methods by 8.3% in PLCC.
Achieves new state-of-the-art performance on multi-modal IAA.
Abstract
Image aesthetics assessment (IAA) is attracting wide interest with the prevalence of social media. The problem is challenging due to its subjective and ambiguous nature. Instead of directly extracting aesthetic features solely from the image, user comments associated with an image could potentially provide complementary knowledge that is useful for IAA. With existing large-scale pre-trained models demonstrating strong capabilities in extracting high-quality transferable visual and textual features, learnable queries are shown to be effective in extracting useful features from the pre-trained visual features. Therefore, in this paper, we propose MMLQ, which utilizes multi-modal learnable queries to extract aesthetics-related features from multi-modal pre-trained features. Extensive experimental results demonstrate that MMLQ achieves new state-of-the-art performance on multi-modal IAA,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImage Retrieval and Classification Techniques · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
