Mobile Recording Device Recognition Based Cross-Scale and Multi-Level Representation Learning
Chunyan Zeng, Yuhao Zhao, Zhifeng Wang

TL;DR
This paper presents a multi-scale, multi-level deep learning approach for mobile recording device recognition, combining convolutional, LSTM, and transformer models to achieve high accuracy and transferability.
Contribution
The paper introduces a novel multi-level, cross-scale feature learning framework integrating ConvLSTM, BiLSTM, and transformer encoders for improved device recognition.
Findings
Achieved 99.6% accuracy on CCNU_Mobile dataset.
Improved recognition accuracy by 2-12% over baseline.
Demonstrated 87.9% transferability on a new dataset.
Abstract
This paper introduces a modeling approach that employs multi-level global processing, encompassing both short-term frame-level and long-term sample-level feature scales. In the initial stage of shallow feature extraction, various scales are employed to extract multi-level features, including Mel-Frequency Cepstral Coefficients (MFCC) and pre-Fbank log energy spectrum. The construction of the identification network model involves considering the input two-dimensional temporal features from both frame and sample levels. Specifically, the model initially employs one-dimensional convolution-based Convolutional Long Short-Term Memory (ConvLSTM) to fuse spatiotemporal information and extract short-term frame-level features. Subsequently, bidirectional long Short-Term Memory (BiLSTM) is utilized to learn long-term sample-level sequential representations. The transformer encoder then performs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Media Forensic Detection · Image Processing and 3D Reconstruction · Technology and Security Systems
