Interactive Cycle Model: The Linkage Combination among Automatic Speech   Recognition, Large Language Models and Smart Glasses

Libo Wang

arXiv:2411.10362·cs.HC·January 24, 2025

Interactive Cycle Model: The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses

Libo Wang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an interaction loop model combining automatic speech recognition, large language models, and smart glasses to enhance seamless human-computer interaction, with theoretical evaluation of accuracy, coherence, and latency.

Contribution

It presents a novel integrated model and methodology for multimodal human-computer interaction involving speech, language understanding, and visual display, with performance quantification.

Findings

01

Model effectively integrates ASR, LLMs, and smart glasses.

02

Theoretical evaluation demonstrates feasibility and performance metrics.

03

Open-source implementation provided on Github.

Abstract

This research proposes the interaction loop model "ASR-LLMs-Smart Glasses", which model combines automatic speech recognition, large language model and smart glasses to facilitate seamless human-computer interaction. And the methodology of this research involves decomposing the interaction process into different stages and elements. Speech is captured and processed by ASR, then analyzed and interpreted by LLMs. The results are then transmitted to smart glasses for display. The feedback loop is complete when the user interacts with the displayed data. Mathematical formulas are used to quantify the performance of the model that revolves around core evaluation points: accuracy, coherence, and latency during ASR speech-to-text conversion. The research results are provided theoretically to test and evaluate the feasibility and performance of the model. Detailed architectural details and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

brucewang123456789/GeniusTrail
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis