From Data Dump to Digestible Chunks: Automated Segmentation and Summarization of Provenance Logs for Communication
Jeremy E. Block, Donald Honeycutt, Brett Benda, Benjamin Rheault, Eric, D. Ragan

TL;DR
This paper introduces an automated method for segmenting and summarizing provenance logs in intelligence analysis, enabling clearer communication of complex analytical processes through textual blurbs.
Contribution
It presents a novel segmentation and summarization pipeline tailored for interaction provenance logs, demonstrated on multiple datasets including classified logs, with validation from domain experts.
Findings
Effective generation of key event cards from logs
Facilitates sharing of analysis progress in complex domains
Highlights need for improved justifications and pattern controls
Abstract
Communicating one's sensemaking during a complex analysis session to explain thought processes is hard, yet most intelligence occurs in collaborative settings. Team members require a deeper understanding of the work being completed by their peers and subordinates, but little research has fully articulated best practices for analytic provenance consumers. This work proposes an automatic summarization technique that separates an analysis session and summarizes interaction provenance as textual blurbs to allow for meta-analysis of work done. Focusing on the domain of intelligence analysis, we demonstrate our segmentation technique using five datasets, including both publicly available and classified interaction logs. We shared our demonstration with a notoriously inaccessible population of expert reviewers with experience as United States Department of Defense analysts. Our findings…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Semantic Web and Ontologies
