SIFT -- File Fragment Classification Without Metadata
Shahid Alam

TL;DR
This paper introduces SIFT, a novel method for classifying file fragments in digital forensics without metadata, outperforming existing techniques by at least 8% through unique byte-based features and an adapted TF-IDF weighting scheme.
Contribution
SIFT is the first approach to use single-byte features combined with TF-IDF for file fragment classification, achieving superior accuracy.
Findings
SIFT outperforms state-of-the-art methods by at least 8%.
Uses 256 byte-based features without information loss.
Employs TF-IDF to weight features for improved classification.
Abstract
A vital issue of file carving in digital forensics is type classification of file fragments when the filesystem metadata is missing. Over the past decades, there have been several efforts for developing methods to classify file fragments. In this research, a novel sifting approach, named SIFT (Sifting File Types), is proposed. SIFT outperforms the other state-of-the-art techniques by at least 8%. (1) One of the significant differences between SIFT and others is that SIFT uses a single byte as a separate feature, i.e., a total of 256 (0x00 - 0xFF) features. We also call this a lossless feature (information) extraction, i.e., there is no loss of information. (2) The other significant difference is the technique used to estimate inter-Classes and intra-Classes information gain of a feature. Unlike others, SIFT adapts TF-IDF for this purpose, and computes and assigns weight to each byte…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Digital Media Forensic Detection · Advanced Malware Detection Techniques
