Common Library 1.0: A Corpus of Victorian Novels Reflecting the Population in Terms of Publication Year and Author Gender
Allen Riddell, Troy J. Bassett, Laura Schneider, Hannah Mills, Amy, Yarnell, Rachel Condon, Joseph Bassett, Sara Duke

TL;DR
This paper introduces the Common Library, a carefully sampled corpus of Victorian novels that reflects the population in terms of publication year and author gender, aiming to improve research in literary history and sociology.
Contribution
The paper presents a novel corpus of Victorian novels that matches population demographics, addressing biases in existing literary corpora and facilitating more accurate sociological and historical analysis.
Findings
The corpus matches the population in publication year and author gender proportions.
Existing corpora are biased towards certain periods and male authors.
The Common Library provides a more representative sample for research.
Abstract
Research in 19th-century book history, sociology of literature, and quantitative literary history is blocked by the absence of a collection of novels which captures the diversity of literary production. We introduce a corpus of 75 Victorian novels sampled from a 15,322-record bibliography of novels published between 1837 and 1901 in the British Isles. This corpus, the Common Library, is distinctive in the following way: the shares of novels in the corpus associated with sociologically important subgroups match the shares in the broader population. For example, the proportion of novels written by women in 1880s in the corpus is approximately the same as in the population. Although we do not, in this particular paper, claim that the corpus is a representative sample in the familiar sense--a sample is representative if "characteristics of interest in the population can be estimated from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsComputational and Text Analysis Methods · Data Analysis and Archiving · Census and Population Estimation
