Optimal Categorical Attribute Transformation for Granularity Change in Relational Databases for Binary Decision Problems in Educational Data Mining
Paulo J. L. Adeodato, F\'abio C. Pereira, Rosalvo F. Oliveira Neto

TL;DR
This paper introduces a novel method for transforming categorical data granularity in relational databases using regression, improving binary decision task performance in educational data mining.
Contribution
It proposes a regression-based transformation approach for categorical attributes at lower hierarchy levels, outperforming traditional mode-based methods.
Findings
Higher ranking score performance than mode transformation
Comparable to expert weighing approach
Validated on Brazilian school datasets with 10-fold cross-validation
Abstract
This paper presents an approach for transforming data granularity in hierarchical databases for binary decision problems by applying regression to categorical attributes at the lower grain levels. Attributes from a lower hierarchy entity in the relational database have their information content optimized through regression on the categories histogram trained on a small exclusive labelled sample, instead of the usual mode category of the distribution. The paper validates the approach on a binary decision task for assessing the quality of secondary schools focusing on how logistic regression transforms the students and teachers attributes into school attributes. Experiments were carried out on Brazilian schools public datasets via 10-fold cross-validation comparison of the ranking score produced also by logistic regression. The proposed approach achieved higher performance than the usual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Data Mining Algorithms and Applications · Machine Learning and Data Classification
