Hierarchical Delay Attribution Classification using Unstructured Text in Train Management Systems
Anton Borg, Per Lingvall, Martin Svensson

TL;DR
This paper explores machine learning methods to automate delay attribution coding in train management systems using unstructured text, comparing hierarchical and flat classification approaches.
Contribution
It introduces a hierarchical classification approach for delay attribution and evaluates its performance against flat models and manual coding.
Findings
Hierarchical approach outperforms flat classification.
Machine learning models outperform random classifier.
Performance is below manual classification accuracy.
Abstract
EU directives stipulate a systematic follow-up of train delays. In Sweden, the Swedish Transport Administration registers and assigns an appropriate delay attribution code. However, this delay attribution code is assigned manually, which is a complex task. In this paper, a machine learning-based decision support for assigning delay attribution codes based on event descriptions is investigated. The text is transformed using TF-IDF, and two models, Random Forest and Support Vector Machine, are evaluated against a random uniform classifier and the classification performance of the Swedish Transport Administration. Further, the problem is modeled as both a hierarchical and flat approach. The results indicate that a hierarchical approach performs better than a flat approach. Both approaches perform better than the random uniform classifier but perform worse than the manual classification.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Natural Language Processing Techniques
