# Probabilistic Perspectives on Collecting Human Uncertainty in Predictive   Data Mining

**Authors:** Kevin Jasberg, Sergej Sizov

arXiv: 1702.08826 · 2017-03-01

## TL;DR

This paper explores how human uncertainty affects data collection in data mining, proposing models to quantify this uncertainty and demonstrating its impact on personalization and algorithm evaluation.

## Contribution

It introduces two novel approaches for modeling human uncertainty in user responses and compares their effectiveness through experiments and simulations.

## Key findings

- Many users provide responses different from their true cognition.
- Human uncertainty significantly impacts the reliability of algorithm assessments.
- Modeling human uncertainty is crucial for improving personalization and decision-making.

## Abstract

In many areas of data mining, data is collected from humans beings. In this contribution, we ask the question of how people actually respond to ordinal scales. The main problem observed is that users tend to be volatile in their choices, i.e. complex cognitions do not always lead to the same decisions, but to distributions of possible decision outputs. This human uncertainty may sometimes have quite an impact on common data mining approaches and thus, the question of effective modelling this so called human uncertainty emerges naturally.   Our contribution introduces two different approaches for modelling the human uncertainty of user responses. In doing so, we develop techniques in order to measure this uncertainty at the level of user inputs as well as the level of user cognition. With support of comprehensive user experiments and large-scale simulations, we systematically compare both methodologies along with their implications for personalisation approaches. Our findings demonstrate that significant amounts of users do submit something completely different (action) than they really have in mind (cognition). Moreover, we demonstrate that statistically sound evidence with respect to algorithm assessment becomes quite hard to realise, especially when explicit rankings shall be built.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1702.08826/full.md

## Figures

17 figures with captions in the complete paper: https://tomesphere.com/paper/1702.08826/full.md

## References

20 references — full list in the complete paper: https://tomesphere.com/paper/1702.08826/full.md

---
Source: https://tomesphere.com/paper/1702.08826