iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning

Kaicong Huang; Weiheng Oh; Thomas Guggisberg; Ruimin Ke

arXiv:2605.10732·cs.CV·May 12, 2026

iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning

Kaicong Huang, Weiheng Oh, Thomas Guggisberg, Ruimin Ke

PDF

1 Repo

TL;DR

iPay is a multimodal framework combining RGB and skeleton data with adaptive spatial priors for robust payment action recognition in transit surveillance, outperforming prior methods and suitable for edge deployment.

Contribution

The paper introduces iPay, a novel multimodal architecture with a prior-driven discriminator and dual-attention fusion, advancing automated payment action recognition in noisy onboard environments.

Findings

01

iPay achieves 83.45% recognition accuracy on real onboard footage.

02

The framework outperforms existing methods in accuracy and efficiency.

03

Collected over 55 hours of surveillance data with 500+ payment clips.

Abstract

Automated transit payment analysis is vital for scalable fare auditing and passenger analytics, yet practice still relies on limited manual inspection. Prior vision- and skeleton-based methods remain brittle under noisy onboard surveillance and often depend on poorly generalizable handcrafted features. Building on the success of graph convolutional networks in human action recognition, we observe that skeleton features excel at modeling global spatiotemporal dependencies but tend to underemphasize the subtle local relative motions that distinguish payment actions. In contrast, RGB features preserve fine-grained spatial details yet often lack reliable temporal continuity in surveillance footage. To bridge both system-level deployment needs and model-level design challenges, we present iPay, an integrated payment action recognition framework for onboard transit surveillance system. iPay…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ccoopq/iPay
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.