Interactive Cycle Model: The Linkage Combination among Automatic Speech Recognition, Large Language Models and Smart Glasses
Libo Wang

TL;DR
This paper introduces an interaction loop model combining automatic speech recognition, large language models, and smart glasses to enhance seamless human-computer interaction, with theoretical evaluation of accuracy, coherence, and latency.
Contribution
It presents a novel integrated model and methodology for multimodal human-computer interaction involving speech, language understanding, and visual display, with performance quantification.
Findings
Model effectively integrates ASR, LLMs, and smart glasses.
Theoretical evaluation demonstrates feasibility and performance metrics.
Open-source implementation provided on Github.
Abstract
This research proposes the interaction loop model "ASR-LLMs-Smart Glasses", which model combines automatic speech recognition, large language model and smart glasses to facilitate seamless human-computer interaction. And the methodology of this research involves decomposing the interaction process into different stages and elements. Speech is captured and processed by ASR, then analyzed and interpreted by LLMs. The results are then transmitted to smart glasses for display. The feedback loop is complete when the user interacts with the displayed data. Mathematical formulas are used to quantify the performance of the model that revolves around core evaluation points: accuracy, coherence, and latency during ASR speech-to-text conversion. The research results are provided theoretically to test and evaluate the feasibility and performance of the model. Detailed architectural details and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis
