Attention vs LSTM: Improving Word-level BISINDO Recognition

Muchammad Daniyal Kautsar; Afra Majida Hariono; and Ridwan Akmal

arXiv:2409.01975·cs.CV·February 10, 2025

Attention vs LSTM: Improving Word-level BISINDO Recognition

Muchammad Daniyal Kautsar, Afra Majida Hariono, and Ridwan Akmal

PDF

Open Access

TL;DR

This paper compares LSTM and 1D CNN + Transformer models for Indonesian sign language recognition, demonstrating that while LSTM has lower latency, the 1DCNNTrans model offers higher accuracy and stability for complex gestures.

Contribution

It introduces a comparative analysis of LSTM and 1DCNNTrans models for sign language recognition, highlighting the strengths and weaknesses of each in a practical application.

Findings

01

LSTM achieved 94.67% accuracy with lower inference latency.

02

1DCNNTrans achieved 96.12% accuracy and better stability for complex classes.

03

Both models exceeded 90% validation accuracy for sign language gestures.

Abstract

Indonesia ranks fourth globally in the number of deaf cases. Individuals with hearing impairments often find communication challenging, necessitating the use of sign language. However, there are limited public services that offer such inclusivity. On the other hand, advancements in artificial intelligence (AI) present promising solutions to overcome communication barriers faced by the deaf. This study aims to explore the application of AI in developing models for a simplified sign language translation app and dictionary, designed for integration into public service facilities, to facilitate communication for individuals with hearing impairments, thereby enhancing inclusivity in public services. The researchers compared the performance of LSTM and 1D CNN + Transformer (1DCNNTrans) models for sign language recognition. Through rigorous testing and validation, it was found that the LSTM…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInterpreting and Communication in Healthcare

Methodstravel james · Attention Is All You Need · Sigmoid Activation · Tanh Activation · Byte Pair Encoding · Absolute Position Encodings · Softmax · Label Smoothing · Long Short-Term Memory · Linear Layer