TL;DR
This paper presents a novel approach to natural language understanding by modeling crime drama episodes, like CSI, as inference tasks to identify perpetrators using multi-modal data and an LSTM-based model.
Contribution
It introduces a new dataset, formalizes perpetrator identification as a sequence labeling task, and demonstrates the effectiveness of multi-modal inference strategies.
Findings
Incremental inference improves accuracy
Multi-modal data fusion enhances performance
LSTM-based models effectively learn from complex inputs
Abstract
In this paper we argue that crime drama exemplified in television programs such as CSI:Crime Scene Investigation is an ideal testbed for approximating real-world natural language understanding and the complex inferences associated with it. We propose to treat crime drama as a new inference task, capitalizing on the fact that each episode poses the same basic question (i.e., who committed the crime) and naturally provides the answer when the perpetrator is revealed. We develop a new dataset based on CSI episodes, formalize perpetrator identification as a sequence labeling problem, and develop an LSTM-based model which learns from multi-modal data. Experimental results show that an incremental inference strategy is key to making accurate guesses as well as learning from representations fusing textual, visual, and acoustic input.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
