TL;DR
This paper introduces UAT-MC, a multimodal adversarial training method that aligns visual and textual perturbations to defend recommender systems against evasion-based promotion attacks, improving robustness.
Contribution
It proposes a novel gradient alignment mechanism for synchronized multimodal perturbations, addressing cross-modal mismatch in evasion attacks for recommender systems.
Findings
UAT-MC significantly enhances robustness against promotion attacks.
The method maintains recommendation performance with a favorable defense-accuracy trade-off.
Code is publicly available at https://github.com/gmXian/UAT-MC.
Abstract
Multimodal recommender systems exploit visual and textual signals to alleviate data sparsity, but this also makes them more vulnerable to evasion-based promotion attacks. Existing defenses are largely limited to single-modal settings and mainly focus on poisoning-based threats, leaving evasion-based threats underexplored. In this work, we first identify a cross-modal gradient mismatch under the multi-user promotion setting, where visual and textual perturbations are optimized in inconsistent directions due to the dominance of distinct user groups. This phenomenon dilutes the attack effectiveness and leads robust training to underestimate worst-case risks. To address this issue, we propose Untargeted Adversarial Training with Multimodal Coordination (UAT-MC). UAT-MC tackles the challenge of unknown targeted items in evasion-based attacks (as opposed to poisoning-based attacks) by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
