A multi-task learning model for malware classification with useful file   access pattern from API call sequence

Xin Wang; Siu Ming Yiu

arXiv:1610.05945·cs.SD·October 20, 2016·29 cites

A multi-task learning model for malware classification with useful file access pattern from API call sequence

Xin Wang, Siu Ming Yiu

PDF

Open Access

TL;DR

This paper introduces a multi-task deep learning model that automatically learns malware representations from API call sequences, enabling both accurate classification and generation of useful file access pattern insights, reducing feature engineering and improving interpretability.

Contribution

It is the first to apply a multi-task seq2seq model to malware classification and FAP generation from API call sequences, enhancing interpretability and reducing manual feature engineering.

Findings

01

Competitive classification accuracy achieved.

02

Effective generation of file access patterns.

03

Reduced need for manual feature engineering.

Abstract

Based on API call sequences, semantic-aware and machine learning (ML) based malware classifiers can be built for malware detection or classification. Previous works concentrate on crafting and extracting various features from malware binaries, disassembled binaries or API calls via static or dynamic analysis and resorting to ML to build classifiers. However, they tend to involve too much feature engineering and fail to provide interpretability. We solve these two problems with the recent advances in deep learning: 1) RNN-based autoencoders (RNN-AEs) can automatically learn low-dimensional representation of a malware from its raw API call sequence. 2) Multiple decoders can be trained under different supervisions to give more information, other than the class or family label of a malware. Inspired by the works of document classification and automatic sentence summarization, each API call…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Digital and Cyber Forensics

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence