Sharpening Your Tools: Updating bulk_extractor for the 2020s
Simson Garfinkel, Jonathan Stewart

TL;DR
This paper details the modernization of bulk_extractor, a digital forensics tool, through code updates, refactoring, and testing, resulting in significantly improved performance and providing guidance for similar tool maintenance.
Contribution
The paper presents a comprehensive update to bulk_extractor, including language modernization, code refactoring, and performance enhancements, with practical recommendations for digital forensics tool developers.
Findings
75% increase in throughput due to multithreading improvements
Successful migration from C++98 to C++17
Guidelines for maintaining digital forensics tools
Abstract
Bulk_extractor is a high-performance digital forensics tool written in C++. Between 2018 and 2022 we updated the program from C++98 to C++17, performed a complete code refactoring, and adopted a unit test framework. The new version typically runs with 75\% more throughput than the previous version, which we attribute to improved multithreading. We provide lessons and recommendations for other digital forensics tool maintainers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Advanced Malware Detection Techniques · Digital Media Forensic Detection
