S\'election de variables par le GLM-Lasso pour la pr\'ediction du risque palustre
Bienvenue Kouway\`e (SAMM), No\"el Fonton, Fabrice Rossi (SAMM)

TL;DR
This paper introduces an automatic variable selection method using Lasso and GLM to predict malaria risk, reducing reliance on expert pre-treatment and handling high-dimensional data effectively.
Contribution
It presents a novel approach combining Lasso and GLM for automatic variable selection in epidemiology, especially for high-dimensional data without expert pre-treatment.
Findings
Few climatic and environmental variables are key factors in malaria risk.
The method effectively handles high-dimensional data and reduces overfitting.
Selected variables improve malaria risk prediction accuracy.
Abstract
In this study, we propose an automatic learning method for variables selection based on Lasso in epidemiology context. One of the aim of this approach is to overcome the pretreatment of experts in medicine and epidemiology on collected data. These pretreatment consist in recoding some variables and to choose some interactions based on expertise. The approach proposed uses all available explanatory variables without treatment and generate automatically all interactions between them. This lead to high dimension. We use Lasso, one of the robust methods of variable selection in high dimension. To avoid over fitting a two levels cross-validation is used. Because the target variable is account variable and the lasso estimators are biased, variables selected by lasso are debiased by a GLM and used to predict the distribution of the main vector of malaria which is Anopheles. Results show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCOVID-19 epidemiological studies
