VoiceMoji: A Novel On-Device Pipeline for Seamless Emoji Insertion in Dictation
Sumit Kumar, Harichandana B S S, and Himanshu Arora

TL;DR
This paper introduces VoiceMoji, an on-device pipeline that intelligently inserts emojis into transcribed speech to enhance emotional expression, using a novel architecture that is efficient and suitable for deployment on resource-constrained devices.
Contribution
It presents the first on-device system for emoji insertion in speech transcription, utilizing a novel Attention-based Char Aware LSTM architecture that handles OOV words and reduces model size.
Findings
Achieves comparable emoji prediction accuracy to neural approaches with 80% fewer parameters.
Operates entirely on-device with a small 4MB footprint.
Demonstrates effective semantic analysis for emoji placement.
Abstract
Most of the speech recognition systems recover only words in the speech and fail to capture emotions. Users have to manually add emoji(s) in text for adding tone and making communication fun. Though there is much work done on punctuation addition on transcribed speech, the area of emotion addition is untouched. In this paper, we propose a novel on-device pipeline to enrich the voice input experience. It involves, given a blob of transcribed text, intelligently processing and identifying structure where emoji insertion makes sense. Moreover, it includes semantic text analysis to predict emoji for each of the sub-parts for which we propose a novel architecture Attention-based Char Aware (ACA) LSTM which handles Out-Of-Vocabulary (OOV) words as well. All these tasks are executed completely on-device and hence can aid on-device dictation systems. To the best of our knowledge, this is the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Attentive Walk-Aggregating Graph Neural Network
