Optimal Weighting of Multi-View Data with Low Dimensional Hidden States
Yichao Lu, Dean P. Foster

TL;DR
This paper introduces an unsupervised method to optimally weight multi-view data derived from low-dimensional hidden states, improving feature integration in NLP tasks with multiple data views and limited labeled data.
Contribution
It proposes a novel unsupervised algorithm for optimal feature weighting across multiple views generated from low-dimensional hidden states, applicable to models like HMM and LDA.
Findings
Effective weighting improves supervised learning performance.
Applicable to various models with low-dimensional hidden states.
Enhances utilization of unlabeled data in NLP tasks.
Abstract
In Natural Language Processing (NLP) tasks, data often has the following two properties: First, data can be chopped into multi-views which has been successfully used for dimension reduction purposes. For example, in topic classification, every paper can be chopped into the title, the main text and the references. However, it is common that some of the views are less noisier than other views for supervised learning problems. Second, unlabeled data are easy to obtain while labeled data are relatively rare. For example, articles occurred on New York Times in recent 10 years are easy to grab but having them classified as 'Politics', 'Finance' or 'Sports' need human labor. Hence less noisy features are preferred before running supervised learning methods. In this paper we propose an unsupervised algorithm which optimally weights features from different views when these views are generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Topic Modeling · Natural Language Processing Techniques
