uTalk: Bridging the Gap Between Humans and AI
Hussam Azzuni, Sharim Jamal, Abdulmotaleb Elsaddik

TL;DR
uTalk presents an innovative platform integrating large language models and visual avatars to enable human-like interactions, content generation, and improved performance in AI-assisted communication.
Contribution
This work introduces uTalk, a user-friendly system combining LLMs, visual models, and real-time optimization for enhanced human-AI interaction and content creation.
Findings
SadTalker's runtime optimized by 27.69% at 25 FPS
System performance improved by 9.8% through integration and parallelization
uTalk enables engaging, avatar-based conversations and content generation
Abstract
Large Language Models (LLMs) have revolutionized various industries by harnessing their power to improve productivity and facilitate learning across different fields. One intriguing application involves combining LLMs with visual models to create a novel approach to Human-Computer Interaction. The core idea of this system is to create a user-friendly platform that enables people to utilize ChatGPT's features in their everyday lives. uTalk is comprised of technologies like Whisper, ChatGPT, Microsoft Speech Services, and the state-of-the-art (SOTA) talking head system SadTalker. Users can engage in human-like conversation with a digital twin and receive answers to any questions. Also, uTalk could generate content by submitting an image and input (text or audio). This system is hosted on Streamlit, where users will be prompted to provide an image to serve as their AI assistant. Then, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 diagnosis using AI
