Feasibility of Post-Editing Speech Transcriptions with a Mismatched Crowd
Purushotam Radadia, Shirish Karande

TL;DR
This study investigates whether a mismatched crowd can effectively post-edit speech transcriptions across five languages by selecting from phonetically similar options, demonstrating the crowd's potential in speech transcription correction.
Contribution
It provides evidence that mismatched crowd workers can reliably choose among fine-granular speech options in multiple languages, supporting their use in speech transcription post-editing.
Findings
Crowd can select correct options among phonetically similar choices
Effectiveness observed across five diverse languages
Mismatched crowd shows non-trivial transcription correction ability
Abstract
Manual correction of speech transcription can involve a selection from plausible transcriptions. Recent work has shown the feasibility of employing a mismatched crowd for speech transcription. However, it is yet to be established whether a mismatched worker has sufficiently fine-granular speech perception to choose among the phonetically proximate options that are likely to be generated from the trellis of an ASRU. Hence, we consider five languages, Arabic, German, Hindi, Russian and Spanish. For each we generate synthetic, phonetically proximate, options which emulate post-editing scenarios of varying difficulty. We consistently observe non-trivial crowd ability to choose among fine-granular options.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Tactile and Sensory Interactions · Hate Speech and Cyberbullying Detection
