||Globally, groundwater plays a major role in supplying drinking water for urban and rural
population and is used for irrigation to grow crops and in many industrial processes. A novel
self-learning random forest (SLRF) model is developed and validated for groundwater yield
zonation within the Yeondong Province in South Korea. This study was conducted with an
inventory data initially divided randomly into 70% for training and 30% for testing and 13
groundwater-conditioning factors. SLRF was optimized using Bayesian optimization method.
We also compared our method to other machine learning methods including support
vector machine (SVM), artificial neural networks (ANN), decision trees (DT), and voting
ensemble models. Model validation was accomplished using several methods, including a
confusion matrix, receiver operating characteristics, cross-validation, and McNemars test.
Our proposed self-learning method improves random forest (RF) generalization performance
by about 23%, with SLRF success rates of 0.76 and prediction rates of 0.83. In
addition, the optimized SLRF performed better [according to a threefold cross-validated
AUC (area under curve) of 0.75] than that using randomly initialized parameters (0.57).
SLRF outperformed all of the other models for the testing dataset (RF, SVM, ANN, DT,
and Voted ANN-RF) when the overall accuracy, prediction rate, and cross-validated AUC
metrics were considered. The SLRF also estimated the contribution of individual groundwater
conditioning factors and showed that the three most influential factors were geology
(1.00), profile curvature (0.97), and TWI (0.95). Overall, SLRF effectively modeled
groundwater potential, even within data-scarce regions.