Earth Science Frontiers ›› 2024, Vol. 31 ›› Issue (3): 371-380.DOI: 10.13745/j.esf.sf.2023.2.40

Previous Articles     Next Articles

Risk assessment of groundwater arsenic in Hetao Basin base on ensemble learning optimization

FU Yu1(), CAO Wengeng2,*(), ZHANG Chunju3, ZHAI Wenhua1, REN Yu2, NAN Tian2, LI Zeyan2   

  1. 1. North China University of Water Resources and Electric Power, Zhengzhou 450046, China
    2. The Institute of Hydrogeology and Environmental Geology, Chinese Academy of Geological Sciences, Shijiazhuang 050061, China
    3. Hefei University of Technology, Hefei 230009, China
  • Received:2022-10-28 Revised:2022-12-27 Online:2024-05-25 Published:2024-05-25

Abstract:

The shallow groundwater arsenic pollution in Hetao Basin seriously exceeds the standard, and its potential pollution risk poses a serious health threat to local residents. At present, the perception of the risk distribution of high arsenic groundwater is still insufficient on the macroscopic scale. Based on 605 shallow groundwater samples and environmental factors such as sedimentary environment, climate, human activities, soil physical and chemical characteristics, and hydrogeological conditions as data sources, Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Support Vector Machine (SVM) were selected as the base learners, and Linear Discriminant Analysis (LDA) was selected as the meta-learner to construct a Stacking ensemble learning model for high arsenic groundwater. The ensemble learning model was used to predict the risk distribution of high arsenic groundwater and identify the key environmental factors affecting the risk distribution of high arsenic groundwater in the region. The research showed that the groundwater arsenic concentration exceeded the standard rate (>10 μg/L) was 49.59%, mainly concentrated in the paleochannel zone and flood fans of the Yellow River. The Stacking ensemble model had higher reliability than the RF model with the best performance in the single model, and the Area Under the ROC Curve (AUC) and accuracy were increased by 1.1% and 3.2%, respectively. The high-risk area reached 5257 km2, accounting for 38.44% of the total area of the study area. The sedimentary environment is the key environmental factor affecting the risk distribution of high arsenic groundwater, contributing up to 25.06% to the accuracy of the model. The results of this study can provide a method and reference for mapping the spatial distribution of high arsenic groundwater pollution and have important implications for the safety of drinking water and human health in the region.

Key words: Stacking ensemble learning, groundwater, high arsenic, risk distribution, Hetao Basin

CLC Number: