Data-Driven Investigative Journalism For Connectas Dataset
Aniket Jain, Bhavya Sharma, Paridhi Choudhary, Rohan Sangave, William, Yang

TL;DR
This paper demonstrates how machine learning can be applied to analyze government contract data in Colombia to identify corruption and malpractice, involving data cleaning, feature engineering, and anomaly detection models.
Contribution
It introduces a methodology for applying machine learning to corruption detection in government datasets, including data preprocessing and anomaly detection techniques.
Findings
Effective data cleaning and feature engineering for government contract data
Successful implementation of anomaly detection models
Potential to assist in corruption investigations
Abstract
The following paper explores the possibility of using Machine Learning algorithms to detect the cases of corruption and malpractice by governments. The dataset used by the authors contains information about several government contracts in Colombia from year 2007 to 2012. The authors begin with exploring and cleaning the data, followed by which they perform feature engineering before finally implementing Machine Learning models to detect anomalies in the given dataset.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Data-Driven Disease Surveillance · Network Security and Intrusion Detection
