LLaMA based Punctuation Restoration With Forward Pass Only Decoding
Yutong Pang, Debjyoti Paul, Kevin Jiang, Xuedong Zhang, Xin Lei

TL;DR
This paper demonstrates that LLaMA can be effectively used for punctuation restoration, and introduces Forward Pass Only Decoding (FPOD), a novel method that significantly accelerates inference speed while reducing hallucinations.
Contribution
The paper applies LLaMA to punctuation restoration and proposes FPOD, a new decoding approach that greatly improves inference speed and reduces hallucinations.
Findings
LLaMA outperforms benchmarks in punctuation restoration.
FPOD achieves a 19.8x speedup in inference.
FPOD reduces hallucinations during decoding.
Abstract
This paper introduces two advancements in the field of Large Language Model Annotation with a focus on punctuation restoration tasks. Our first contribution is the application of LLaMA for punctuation restoration, which demonstrates superior performance compared to the established benchmark. Despite its impressive quality, LLaMA faces challenges regarding inference speed and hallucinations. To address this, our second contribution presents Forward Pass Only Decoding (FPOD), a novel decoding approach for annotation tasks. This innovative method results in a substantial 19.8x improvement in inference speed, effectively addressing a critical bottleneck and enhancing the practical utility of LLaMA for large-scale data annotation tasks without hallucinations. The combination of these contributions not only solidifies LLaMA as a powerful tool for punctuation restoration but also…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Distributed systems and fault tolerance · Logic, programming, and type systems
MethodsLLaMA · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Focus
