How Many Pages? Paper Length Prediction from the Metadata
Erion \c{C}ano, Ond\v{r}ej Bojar

TL;DR
This paper introduces a regression-based approach to predict scientific paper length from metadata, presents a large dataset for research, and discusses future directions involving neural networks and language models.
Contribution
It defines the paper length prediction task, creates a large public dataset, and provides baseline experimental results using popular machine learning models.
Findings
Regression models can predict paper length from metadata
A large dataset of publication metadata and page counts is released
Future work includes neural network and language model approaches
Abstract
Being able to predict the length of a scientific paper may be helpful in numerous situations. This work defines the paper length prediction task as a regression problem and reports several experimental results using popular machine learning models. We also create a huge dataset of publication metadata and the respective lengths in number of pages. The dataset will be freely available and is intended to foster research in this domain. As future work, we would like to explore more advanced regressors based on neural networks and big pretrained language models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Data Visualization and Analytics · Topic Modeling
