Unsupervised anomaly detection for discrete sequence healthcare data
Victoria Snorovikhina, Alexey Zaytsev

TL;DR
This paper introduces an unsupervised deep learning framework using LSTM and seq2seq models to detect healthcare fraud from patient visit sequences, addressing class imbalance with an EDF normalization method, validated on real Allianz data.
Contribution
It presents a novel unsupervised fraud detection method combining LSTM, seq2seq, and EDF normalization, outperforming existing techniques on healthcare data.
Findings
State-of-the-art results in unsupervised healthcare fraud detection
EDF normalization improves anomaly score quality
Models effectively handle high class imbalance
Abstract
Fraud in healthcare is widespread, as doctors could prescribe unnecessary treatments to increase bills. Insurance companies want to detect these anomalous fraudulent bills and reduce their losses. Traditional fraud detection methods use expert rules and manual data processing. Recently, machine learning techniques automate this process, but hand-labeled data is extremely costly and usually out of date. We propose a machine learning model that automates fraud detection in an unsupervised way. Two deep learning approaches include LSTM neural network for prediction next patient visit and a seq2seq model. For normalization of produced anomaly scores, we propose Empirical Distribution Function (EDF) approach. So, the algorithm works with high class imbalance problems. We use real data on sequences of patients' visits data from Allianz company for the validation. The models provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSequence to Sequence · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
