VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam

Shahed Anzarus Sabab (1; 2; 3; 4; and 5); Mohammad Ridwan Kabir (1; 2,; and 3); Sayed Rizban Hussain (1; 2; and 3); Hasan Mahmud (1; 2; and 3); Md.; Kamrul Hasan (1; 2; and 3); Husne Ara Rubaiyeat (6) ((1) Systems; Software; Lab (SSL); (2) Department of Computer Science; Engineering; (3) Islamic; University of Technology (IUT); Gazipur; Bangladesh; (4) Department of; Computer Science; (5) University of Manitoba; Winnipeg; Canada; (6) National; University; Bangladesh.)

arXiv:2202.02587·cs.HC·July 7, 2022

VIS-iTrack: Visual Intention through Gaze Tracking using Low-Cost Webcam

Shahed Anzarus Sabab (1, 2, 3, 4, and 5), Mohammad Ridwan Kabir (1, 2,, and 3), Sayed Rizban Hussain (1, 2, and 3), Hasan Mahmud (1, 2, and 3), Md., Kamrul Hasan (1, 2, and 3), Husne Ara Rubaiyeat (6) ((1) Systems, Software, Lab (SSL), (2) Department of Computer Science

PDF

TL;DR

This paper presents VIS-iTrack, a low-cost webcam-based system that accurately predicts whether users intend to view text or images by analyzing eye gaze data, enabling more intuitive human-computer interactions.

Contribution

The study introduces a novel approach using low-cost webcam eye tracking combined with machine learning to identify visual intention, with detailed analysis across different age groups.

Findings

01

Support Vector Machine achieved 92.19% accuracy in classifying visual intention.

02

Younger users preferred graphical content, while older users favored textual information.

03

Real-time eye gaze analysis can effectively infer user intention for interactive interface design.

Abstract

Human intention is an internal, mental characterization for acquiring desired information. From interactive interfaces containing either textual or graphical information, intention to perceive desired information is subjective and strongly connected with eye gaze. In this work, we determine such intention by analyzing real-time eye gaze data with a low-cost regular webcam. We extracted unique features (e.g., Fixation Count, Eye Movement Ratio) from the eye gaze data of 31 participants to generate a dataset containing 124 samples of visual intention for perceiving textual or graphical information, labeled as either TEXT or IMAGE, having 48.39% and 51.61% distribution, respectively. Using this dataset, we analyzed 5 classifiers, including Support Vector Machine (SVM) (Accuracy: 92.19%). Using the trained SVM, we investigated the variation of visual intention among 30 participants,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttentive Walk-Aggregating Graph Neural Network · Support Vector Machine