Unsupervised anomaly detection for discrete sequence healthcare data

Victoria Snorovikhina; Alexey Zaytsev

arXiv:2007.10098·cs.LG·October 13, 2020

Unsupervised anomaly detection for discrete sequence healthcare data

Victoria Snorovikhina, Alexey Zaytsev

PDF

TL;DR

This paper introduces an unsupervised deep learning framework using LSTM and seq2seq models to detect healthcare fraud from patient visit sequences, addressing class imbalance with an EDF normalization method, validated on real Allianz data.

Contribution

It presents a novel unsupervised fraud detection method combining LSTM, seq2seq, and EDF normalization, outperforming existing techniques on healthcare data.

Findings

01

State-of-the-art results in unsupervised healthcare fraud detection

02

EDF normalization improves anomaly score quality

03

Models effectively handle high class imbalance

Abstract

Fraud in healthcare is widespread, as doctors could prescribe unnecessary treatments to increase bills. Insurance companies want to detect these anomalous fraudulent bills and reduce their losses. Traditional fraud detection methods use expert rules and manual data processing. Recently, machine learning techniques automate this process, but hand-labeled data is extremely costly and usually out of date. We propose a machine learning model that automates fraud detection in an unsupervised way. Two deep learning approaches include LSTM neural network for prediction next patient visit and a seq2seq model. For normalization of produced anomaly scores, we propose Empirical Distribution Function (EDF) approach. So, the algorithm works with high class imbalance problems. We use real data on sequences of patients' visits data from Allianz company for the validation. The models provide…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSequence to Sequence · Tanh Activation · Sigmoid Activation · Long Short-Term Memory