TRec: Learning Hand-Object Interactions through 2D Point Track Motion

Dennis Holzmann; Sven Wachsmuth

arXiv:2601.03667·cs.CV·January 12, 2026

TRec: Learning Hand-Object Interactions through 2D Point Track Motion

Dennis Holzmann, Sven Wachsmuth

PDF

Open Access

TL;DR

This paper introduces TRec, a novel method for hand-object action recognition that uses 2D point tracks as motion cues, improving accuracy without relying on explicit hand or object detection.

Contribution

The work demonstrates that tracking randomly sampled points across frames with CoTracker and using these trajectories in a Transformer model enhances hand-object interaction recognition.

Findings

01

Point tracks improve recognition accuracy.

02

Method works with minimal video input.

03

Lightweight approach without explicit detection.

Abstract

We present a novel approach for hand-object action recognition that leverages 2D point tracks as an additional motion cue. While most existing methods rely on RGB appearance, human pose estimation, or their combination, our work demonstrates that tracking randomly sampled image points across video frames can substantially improve recognition accuracy. Unlike prior approaches, we do not detect hands, objects, or interaction regions. Instead, we employ CoTracker to follow a set of randomly initialized points through each video and use the resulting trajectories, together with the corresponding image frames, as input to a Transformer-based recognition model. Surprisingly, our method achieves notable gains even when only the initial frame and the point tracks are provided, without incorporating the full video sequence. Experimental results confirm that integrating 2D point tracks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning