# Methodology for the Automated Metadata-Based Classification of   Incriminating Digital Forensic Artefacts

**Authors:** Xiaoyu Du, Mark Scanlon

arXiv: 1907.01421 · 2019-07-03

## TL;DR

This paper presents an automated, machine learning-based methodology for prioritizing suspicious digital forensic artefacts, aiming to reduce manual effort in analyzing large volumes of data during investigations.

## Contribution

It introduces a novel supervised learning approach combined with a toolkit for data extraction, enabling automated prioritization of relevant forensic artefacts in investigations.

## Key findings

- Effective prediction of suspicious artefacts demonstrated
- Reduced manual analysis effort in digital investigations
- Integration with existing forensic workflows possible

## Abstract

The ever increasing volume of data in digital forensic investigation is one of the most discussed challenges in the field. Usually, most of the file artefacts on seized devices are not pertinent to the investigation. Manually retrieving suspicious files relevant to the investigation is akin to finding a needle in a haystack. In this paper, a methodology for the automatic prioritisation of suspicious file artefacts (i.e., file artefacts that are pertinent to the investigation) is proposed to reduce the manual analysis effort required. This methodology is designed to work in a human-in-the-loop fashion. In other words, it predicts/recommends that an artefact is likely to be suspicious rather than giving the final analysis result. A supervised machine learning approach is employed, which leverages the recorded results of previously processed cases. The process of features extraction, dataset generation, training and evaluation are presented in this paper. In addition, a toolkit for data extraction from disk images is outlined, which enables this method to be integrated with the conventional investigation process and work in an automated fashion.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1907.01421/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1907.01421/full.md

## References

30 references — full list in the complete paper: https://tomesphere.com/paper/1907.01421/full.md

---
Source: https://tomesphere.com/paper/1907.01421