Diabetes prediction is an ongoing research problem. The sooner diabetes is
detected in a human, the sooner lives and medical resources can be saved. Predicting
diabetes as early as possible with easy to measures parameters with optimal accuracy is
an ongoing problem. When dealing with large data, feature selection plays an important
role. It not only reduces the computational cost but also increases the performance of a
model. This study ensemble three different types of feature selection techniques: filter,
wrapper and embedded. Ensembling Boruta and LASSO features give optimal results.
Also, effectively handling class imbalance leads to better results.
Keywords: Diabetes Prediction, Ensembling features, Feature selection, SMOTE