Fine-Tuning Language Models for Scientific Writing Support
Justin M\"ucke, Daria Waldow, Luise Metzger, Philipp Schauz, and Marcel Hoffman, Nicolas Lell, Ansgar Scherp

TL;DR
This paper introduces models to assess scientificness, classify sections, and paraphrase sentences to support scientific writing, demonstrating high accuracy and the benefit of context and large models.
Contribution
It presents a comprehensive system combining scientificness scoring, section classification, and paraphrasing, trained on arXiv data, with improved performance using context and large language models.
Findings
Models achieve less than 2% MSE on scientificness score
Section classification with BERT reaches up to 90% F1-score with context
Large models like T5 outperform others in paraphrasing quality
Abstract
We support scientific writers in determining whether a written sentence is scientific, to which section it belongs, and suggest paraphrasings to improve the sentence. Firstly, we propose a regression model trained on a corpus of scientific sentences extracted from peer-reviewed scientific papers and non-scientific text to assign a score that indicates the scientificness of a sentence. We investigate the effect of equations and citations on this score to test the model for potential biases. Secondly, we create a mapping of section titles to a standard paper layout in AI and machine learning to classify a sentence to its most likely section. We study the impact of context, i.e., surrounding sentences, on the section classification performance. Finally, we propose a paraphraser, which suggests an alternative for a given sentence that includes word substitutions, additions to the sentence,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
MethodsGated Linear Unit · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Byte Pair Encoding · Linear Layer · SentencePiece · Layer Normalization · Multi-Head Attention · Adam · Adafactor
