Punctuation Prediction for Polish Texts using Transformers
Jakub Pokrywka

TL;DR
This paper presents a punctuation prediction method for Polish texts using a fine-tuned HerBERT model, improving text readability by restoring punctuation in speech recognition outputs.
Contribution
The paper introduces a HerBERT-based approach combined with external data for punctuation prediction in Polish, achieving competitive results in Poleval 2022.
Findings
Achieved 71.44 Weighted F1 score
Utilized a single HerBERT model with external data
Demonstrated effectiveness for Polish punctuation prediction
Abstract
Speech recognition systems typically output text lacking punctuation. However, punctuation is crucial for written text comprehension. To tackle this problem, Punctuation Prediction models are developed. This paper describes a solution for Poleval 2022 Task 1: Punctuation Prediction for Polish Texts, which scores 71.44 Weighted F1. The method utilizes a single HerBERT model finetuned to the competition data and an external dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
