Robust PDF Files Forensics Using Coding Style
Supriya Adhatarao, C\'edric Lauradoux

TL;DR
This paper presents a method to identify the software used to create PDF files based on coding style patterns, achieving high accuracy in distinguishing among different PDF producers and analyzing online PDF services.
Contribution
The study introduces a set of 192 rules for PDF producer identification and demonstrates high accuracy in real-world PDF classification tasks.
Findings
Achieved 100% accuracy for some PDF producers.
Overall detection accuracy of 74% on large datasets.
Applied method to analyze online PDF services.
Abstract
Identifying how a file has been created is often interesting in security. It can be used by both attackers and defenders. Attackers can exploit this information to tune their attacks and defenders can understand how a malicious file has been created after an incident. In this work, we want to identify how a PDF file has been created. This problem is important because PDF files are extremely popular: many organizations publish PDF files online and malicious PDF files are commonly used by attackers. Our approach to detect which software has been used to produce a PDF file is based on coding style: given patterns that are only created by certain PDF producers. We have analyzed the coding style of 900 PDF files produced using 11 PDF producers on 3 different Operating Systems. We have obtained a set of 192 rules which can be used to identify 11 PDF producers. We have tested our detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Digital and Cyber Forensics · Network Security and Intrusion Detection
