# Learning Effective Embeddings From Crowdsourced Labels: An Educational   Case Study

**Authors:** Guowei Xu, Wenbiao Ding, Jiliang Tang, Songfan Yang, Gale Yan Huang,, Zitao Liu

arXiv: 1908.00086 · 2019-08-02

## TL;DR

This paper introduces RLL, a novel framework for learning data representations from limited and inconsistent crowdsourced labels, demonstrating improved performance in educational applications over existing methods.

## Contribution

The paper proposes a new framework, RLL, that effectively handles the challenges of limited and noisy crowdsourced labels in representation learning.

## Key findings

- RLL outperforms state-of-the-art baselines in real-world educational tasks.
- RLL effectively manages label inconsistency and scarcity.
- Detailed analysis confirms the importance of key components in RLL.

## Abstract

Learning representation has been proven to be helpful in numerous machine learning tasks. The success of the majority of existing representation learning approaches often requires a large amount of consistent and noise-free labels. However, labels are not accessible in many real-world scenarios and they are usually annotated by the crowds. In practice, the crowdsourced labels are usually inconsistent among crowd workers given their diverse expertise and the number of crowdsourced labels is very limited. Thus, directly adopting crowdsourced labels for existing representation learning algorithms is inappropriate and suboptimal. In this paper, we investigate the above problem and propose a novel framework of \textbf{R}epresentation \textbf{L}earning with crowdsourced \textbf{L}abels, i.e., "RLL", which learns representation of data with crowdsourced labels by jointly and coherently solving the challenges introduced by limited and inconsistent labels. The proposed representation learning framework is evaluated in two real-world education applications. The experimental results demonstrate the benefits of our approach on learning representation from limited labeled data from the crowds, and show RLL is able to outperform state-of-the-art baselines. Moreover, detailed experiments are conducted on RLL to fully understand its key components and the corresponding performance.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1908.00086/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/1908.00086/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1908.00086/full.md

---
Source: https://tomesphere.com/paper/1908.00086