Biclustering Readings and Manuscripts via Non-negative Matrix   Factorization, with Application to the Text of Jude

Joey McCollum; Stephen Brown

arXiv:1602.01323·cs.LG·February 4, 2016·2 cites

Biclustering Readings and Manuscripts via Non-negative Matrix Factorization, with Application to the Text of Jude

Joey McCollum, Stephen Brown

PDF

Open Access

TL;DR

This paper presents a novel application of non-negative matrix factorization to cluster manuscripts and readings in textual criticism, effectively addressing contamination and co-dependence issues, demonstrated through analysis of the Jude epistle.

Contribution

It introduces an unsupervised NMF-based method for simultaneous clustering of manuscripts and readings, improving textual family identification in biblical studies.

Findings

01

Clusters match established textual families

02

Effectively handles manuscript contamination

03

Provides interpretable mixture models

Abstract

The text-critical practice of grouping witnesses into families or texttypes often faces two obstacles: Contamination in the manuscript tradition, and co-dependence in identifying characteristic readings and manuscripts. We introduce non-negative matrix factorization (NMF) as a simple, unsupervised, and efficient way to cluster large numbers of manuscripts and readings simultaneously while summarizing contamination using an easy-to-interpret mixture model. We apply this method to an extensive collation of the New Testament epistle of Jude and show that the resulting clusters correspond to human-identified textual families from existing research.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHandwritten Text Recognition Techniques · Text and Document Classification Technologies