TL;DR
iPay is a multimodal framework combining RGB and skeleton data with adaptive spatial priors for robust payment action recognition in transit surveillance, outperforming prior methods and suitable for edge deployment.
Contribution
The paper introduces iPay, a novel multimodal architecture with a prior-driven discriminator and dual-attention fusion, advancing automated payment action recognition in noisy onboard environments.
Findings
iPay achieves 83.45% recognition accuracy on real onboard footage.
The framework outperforms existing methods in accuracy and efficiency.
Collected over 55 hours of surveillance data with 500+ payment clips.
Abstract
Automated transit payment analysis is vital for scalable fare auditing and passenger analytics, yet practice still relies on limited manual inspection. Prior vision- and skeleton-based methods remain brittle under noisy onboard surveillance and often depend on poorly generalizable handcrafted features. Building on the success of graph convolutional networks in human action recognition, we observe that skeleton features excel at modeling global spatiotemporal dependencies but tend to underemphasize the subtle local relative motions that distinguish payment actions. In contrast, RGB features preserve fine-grained spatial details yet often lack reliable temporal continuity in surveillance footage. To bridge both system-level deployment needs and model-level design challenges, we present iPay, an integrated payment action recognition framework for onboard transit surveillance system. iPay…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
