Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li, Runpeng Yu, Xinchao Wang

TL;DR
This paper introduces Vid-SME, a novel membership inference attack tailored for large video understanding models, effectively identifying whether specific videos were part of the training data by analyzing model confidence and temporal variations.
Contribution
The paper presents the first video-specific membership inference method, Vid-SME, which leverages Sharma-Mittal entropy and temporal frame analysis to improve attack accuracy on video models.
Findings
Vid-SME achieves high true positive rates at low false positive rates.
It effectively captures temporal variations in videos for membership inference.
Experimental results show strong effectiveness on various models.
Abstract
Multimodal large language models (MLLMs) demonstrate remarkable capabilities in handling complex multimodal tasks and are increasingly adopted in video understanding applications. However, their rapid advancement raises serious data privacy concerns, particularly given the potential inclusion of sensitive video content, such as personal recordings and surveillance footage, in their training datasets. Determining improperly used videos during training remains a critical and unresolved challenge. Despite considerable progress on membership inference attacks (MIAs) for text and image data in MLLMs, existing methods fail to generalize effectively to the video domain. These methods suffer from poor scalability as more frames are sampled and generally achieve negligible true positive rates at low false positive rates (TPR@Low FPR), mainly due to their failure to capture the inherent temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
