Sentence Correction Based on Large-scale Language Modelling

Ji Wen

arXiv:1709.07777·cs.CL·November 3, 2017

Sentence Correction Based on Large-scale Language Modelling

Ji Wen

PDF

Open Access

TL;DR

This paper presents a large-scale language model approach for sentence correction and text restoration, introducing new measurement and optimization techniques to efficiently recover missing words in large datasets.

Contribution

It introduces a novel measurement for missing word detection, a comprehensive candidate lexicon, and effective optimization methods to improve efficiency in sentence correction tasks.

Findings

01

Restores missing text with high accuracy

02

Reduces processing time to 3.6 seconds for 1000 sentences

03

Enhances efficiency of large-scale sentence correction

Abstract

With the further development of informatization, more and more data is stored in the form of text. There are some loss of text during their generation and transmission. The paper aims to establish a language model based on the large-scale corpus to complete the restoration of missing text. In this paper, we introduce a novel measurement to find the missing words, and a way of establishing a comprehensive candidate lexicon to insert the correct choice of words. The paper also introduces some effective optimization methods, which largely improve the efficiency of the text restoration and shorten the time of dealing with 1000 sentences into 3.6 seconds. \keywords{ language model, sentence correction, word imputation, parallel optimization

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Web Data Mining and Analysis