Biclustering Readings and Manuscripts via Non-negative Matrix Factorization, with Application to the Text of Jude
Joey McCollum, Stephen Brown

TL;DR
This paper presents a novel application of non-negative matrix factorization to cluster manuscripts and readings in textual criticism, effectively addressing contamination and co-dependence issues, demonstrated through analysis of the Jude epistle.
Contribution
It introduces an unsupervised NMF-based method for simultaneous clustering of manuscripts and readings, improving textual family identification in biblical studies.
Findings
Clusters match established textual families
Effectively handles manuscript contamination
Provides interpretable mixture models
Abstract
The text-critical practice of grouping witnesses into families or texttypes often faces two obstacles: Contamination in the manuscript tradition, and co-dependence in identifying characteristic readings and manuscripts. We introduce non-negative matrix factorization (NMF) as a simple, unsupervised, and efficient way to cluster large numbers of manuscripts and readings simultaneously while summarizing contamination using an easy-to-interpret mixture model. We apply this method to an extensive collation of the New Testament epistle of Jude and show that the resulting clusters correspond to human-identified textual families from existing research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies
