CLIP-PCQA: Exploring Subjective-Aligned Vision-Language Modeling for Point Cloud Quality Assessment
Yating Liu, Yujie Zhang, Ziyu Shan, Yiling Xu

TL;DR
CLIP-PCQA introduces a novel vision-language model for point cloud quality assessment that aligns with human subjective evaluation by using descriptive quality labels and a retrieval-based approach, outperforming existing methods.
Contribution
It proposes a language-driven, retrieval-based point cloud quality assessment method leveraging CLIP, incorporating opinion score distribution for better subjective alignment.
Findings
Outperforms state-of-the-art PCQA methods
Utilizes descriptive quality labels for assessment
Incorporates opinion score distribution for improved accuracy
Abstract
In recent years, No-Reference Point Cloud Quality Assessment (NR-PCQA) research has achieved significant progress. However, existing methods mostly seek a direct mapping function from visual data to the Mean Opinion Score (MOS), which is contradictory to the mechanism of practical subjective evaluation. To address this, we propose a novel language-driven PCQA method named CLIP-PCQA. Considering that human beings prefer to describe visual quality using discrete quality descriptions (e.g., "excellent" and "poor") rather than specific scores, we adopt a retrieval-based mapping strategy to simulate the process of subjective assessment. More specifically, based on the philosophy of CLIP, we calculate the cosine similarity between the visual features and multiple textual features corresponding to different quality descriptions, in which process an effective contrastive loss and learnable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
Topics3D Surveying and Cultural Heritage · Remote Sensing and LiDAR Applications · 3D Shape Modeling and Analysis
MethodsADaptive gradient method with the OPTimal convergence rate · Contrastive Language-Image Pre-training
