Impact of Event Encoding and Dissimilarity Measures on Traffic Crash Characterization Based on Sequence of Events
Yu Song, Madhav V. Chitturi, David A. Noyce

TL;DR
This study evaluates how different encoding schemes and dissimilarity measures affect crash sequence clustering, highlighting the importance of domain-aware methods for accurate traffic crash characterization.
Contribution
It systematically compares encoding and dissimilarity measures for crash sequence analysis, identifying the most effective combination aligned with domain context.
Findings
Transition-rate-based dissimilarity performs best
Consolidated encoding scheme yields highest agreement with benchmark
Selection of encoding and dissimilarity measures critically influences clustering results
Abstract
Crash sequence analysis has been shown in prior studies to be useful for characterizing crashes and identifying safety countermeasures. Sequence analysis is highly domain-specific, but its various techniques have not been evaluated for adaptation to crash sequences. This paper evaluates the impact of encoding and dissimilarity measures on crash sequence analysis and clustering. Sequence data of interstate highway, single-vehicle crashes in the United States, from 2016-2018, were studied. Two encoding schemes and five optimal matching based dissimilarity measures were compared by evaluating the sequence clustering results. The five dissimilarity measures were categorized into two groups based on correlations between dissimilarity matrices. The optimal dissimilarity measure and encoding scheme were identified based on the agreements with a benchmark crash categorization. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraffic Prediction and Management Techniques · Traffic and Road Safety · Data-Driven Disease Surveillance
