Optimal Weighting of Multi-View Data with Low Dimensional Hidden States

Yichao Lu; Dean P. Foster

arXiv:1209.5477·stat.ML·September 27, 2012·1 cites

Optimal Weighting of Multi-View Data with Low Dimensional Hidden States

Yichao Lu, Dean P. Foster

PDF

Open Access

TL;DR

This paper introduces an unsupervised method to optimally weight multi-view data derived from low-dimensional hidden states, improving feature integration in NLP tasks with multiple data views and limited labeled data.

Contribution

It proposes a novel unsupervised algorithm for optimal feature weighting across multiple views generated from low-dimensional hidden states, applicable to models like HMM and LDA.

Findings

01

Effective weighting improves supervised learning performance.

02

Applicable to various models with low-dimensional hidden states.

03

Enhances utilization of unlabeled data in NLP tasks.

Abstract

In Natural Language Processing (NLP) tasks, data often has the following two properties: First, data can be chopped into multi-views which has been successfully used for dimension reduction purposes. For example, in topic classification, every paper can be chopped into the title, the main text and the references. However, it is common that some of the views are less noisier than other views for supervised learning problems. Second, unlabeled data are easy to obtain while labeled data are relatively rare. For example, articles occurred on New York Times in recent 10 years are easy to grab but having them classified as 'Politics', 'Finance' or 'Sports' need human labor. Hence less noisy features are preferred before running supervised learning methods. In this paper we propose an unsupervised algorithm which optimally weights features from different views when these views are generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Topic Modeling · Natural Language Processing Techniques