PCIE_LAM Solution for Ego4D Looking At Me Challenge

Kanokphan Lertniphonphan; Jun Xie; Yaqing Meng; Shijing Wang; Feng; Chen; and Zhepeng Wang

arXiv:2406.12211·cs.CV·June 19, 2024

PCIE_LAM Solution for Ego4D Looking At Me Challenge

Kanokphan Lertniphonphan, Jun Xie, Yaqing Meng, Shijing Wang, Feng, Chen, and Zhepeng Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces the InternLSTM model for the Ego4D Looking At Me Challenge, combining spatial and temporal features with gaze smoothing, achieving first place with high accuracy.

Contribution

The novel InternLSTM architecture effectively integrates spatial and temporal features for gaze estimation in challenging videos, winning the challenge.

Findings

01

Achieved 1st place in the challenge with 0.81 mAP and 0.93 accuracy.

02

Proposed InternLSTM combining InternVL encoder and Bi-LSTM network.

03

Implemented Gaze Smoothing filter to improve output stability.

Abstract

This report presents our team's 'PCIE_LAM' solution for the Ego4D Looking At Me Challenge at CVPR2024. The main goal of the challenge is to accurately determine if a person in the scene is looking at the camera wearer, based on a video where the faces of social partners have been localized. Our proposed solution, InternLSTM, consists of an InternVL image encoder and a Bi-LSTM network. The InternVL extracts spatial features, while the Bi-LSTM extracts temporal features. However, this task is highly challenging due to the distance between the person in the scene and the camera movement, which results in significant blurring in the face image. To address the complexity of the task, we implemented a Gaze Smoothing filter to eliminate noise or spikes from the output. Our approach achieved the 1st position in the looking at me challenge with 0.81 mAP and 0.93 accuracy rate. Code is available…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kanokphanl/ego4d_lam_internlstm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSimulation Techniques and Applications