DEHYDRATOR: Enhancing Provenance Graph Storage via Hierarchical Encoding and Sequence Generation
Jie Ying, Tiantian Zhu, Mingqi Lv, Tieming Chen

TL;DR
Dehydrator is a novel system that significantly reduces storage requirements for provenance graphs by employing hierarchical encoding and neural network-based query support, outperforming traditional databases.
Contribution
It introduces a hierarchical encoding approach and deep neural network support for efficient provenance graph storage and querying, addressing high storage overhead issues.
Findings
Reduces storage space by 84.55%.
Outperforms PostgreSQL, Neo4j, and Leonard in efficiency.
Evaluated on over one billion log entries.
Abstract
As the scope and impact of cyber threats have expanded, analysts utilize audit logs to hunt threats and investigate attacks. The provenance graphs constructed from kernel logs are increasingly considered as an ideal data source due to their powerful semantic expression and attack historic correlation ability. However, storing provenance graphs with traditional databases faces the challenge of high storage overhead, given the high frequency of kernel events and the persistence of attacks. To address this, we propose Dehydrator, an efficient provenance graph storage system. For the logs generated by auditing frameworks, Dehydrator uses field mapping encoding to filter field-level redundancy, hierarchical encoding to filter structure-level redundancy, and finally learns a deep neural network to support batch querying. We have conducted evaluations on seven datasets totaling over one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Software Testing and Debugging Techniques · Software Engineering Research
