Earth Science Frontiers ›› 2025, Vol. 32 ›› Issue (2): 456-468.DOI: 10.13745/j.esf.sf.2024.2.20

Previous Articles     Next Articles

Research on identifying the outliers of the TDS in shallow groundwater based on the random forest model

CHU Yanjia1,2(), HE Baonan1,2, CHEN Zhen1,2, HE Jiangtao1,2,*()   

  1. 1. Key Laboratory of Groundwater Conservation of Ministry of Water Resources (in preparation), China University of Geosciences (Beijing), Beijing 100083,China
    2. School of Water Resources and Environment, China University of Geosciences (Beijing), Beijing 100083, China
  • Received:2023-12-27 Revised:2024-02-27 Online:2025-03-25 Published:2025-03-25

Abstract:

Accurately identifying groundwater hydrochemical outliers caused by human activities is crucial for determining the nature background levels of groundwater chemical components and conducting rational assessment of groundwater pollution. The total dissolved solids (TDS), serving as a comprehensive indicator of groundwater hydrochemistry, its value directly reflect the quality of groundwater. Currently, the hydrochemical diagrams method has achieved favorable results in identifying outliers of TDS in groundwater. However, its fundamental principle is reverse identification based on the assumption that the hydrochemical type anomalies composed of the major ion components inevitably result in TDS anomalies, which will potentially leading to over-recognition during the anomaly identification. Therefore, the random forest model, commencing with a positive identification of the genesis mechanisms of TDS, combined with statistical method was employed to identify anomalies in TDS of shallow groundwater based in the Shaying River Basin. A comparative analysis of anomaly identification effectiveness among various methods was conducted, whose results demonstrated that the machine learning method could effectively identify the outliers of TDS in groundwater, with the identified TDS thresholds aligning with those derived from alternative methods. In contrast, the machine learning method, grounded in TDS genesis mechanisms, effectively identified anomalies by mitigating errors in hydrochemical diagrams. This approach successfully distinguished between high and low outliers, offering an alternative and efficacious method for TDS anomaly identification and expanding research perspectives on groundwater environmental background values.

Key words: groundwater environmental background values, TDS, outliers, machine learning method, the Shaying River Basin

CLC Number: