Guided Semi-Supervised Non-negative Matrix Factorization on Legal Documents
Pengyu Li, Christine Tseng, Yaxuan Zheng, Joyce A. Chew, Longxiu, Huang, Benjamin Jarman, Deanna Needell

TL;DR
This paper introduces GSSNMF, a novel semi-supervised matrix factorization method that enhances classification and topic modeling of legal documents by integrating labels and seed words, outperforming previous methods.
Contribution
The paper presents GSSNMF, a new guided semi-supervised non-negative matrix factorization technique that jointly improves classification accuracy and topic coherence in legal document analysis.
Findings
GSSNMF outperforms SSNMF and Guided NMF in classification accuracy.
GSSNMF achieves higher topic coherence.
Application to legal documents demonstrates practical effectiveness.
Abstract
Classification and topic modeling are popular techniques in machine learning that extract information from large-scale datasets. By incorporating a priori information such as labels or important features, methods have been developed to perform classification and topic modeling tasks; however, most methods that can perform both do not allow for guidance of the topics or features. In this paper, we propose a method, namely Guided Semi-Supervised Non-negative Matrix Factorization (GSSNMF), that performs both classification and topic modeling by incorporating supervision from both pre-assigned document class labels and user-designed seed words. We test the performance of this method through its application to legal documents provided by the California Innocence Project, a nonprofit that works to free innocent convicted persons and reform the justice system. The results show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Handwritten Text Recognition Techniques · Machine Learning and Data Classification
