地学前缘 ›› 2025, Vol. 32 ›› Issue (2): 456-468.DOI: 10.13745/j.esf.sf.2024.2.20

• 非主题来稿选登 • 上一篇    下一篇

基于随机森林模型识别浅层地下水TDS异常的方法研究

褚宴佳1,2(), 何宝南1,2, 陈珍1,2, 何江涛1,2,*()   

  1. 1.中国地质大学(北京)水利部地下水保护重点实验室(筹), 北京 100083
    2.中国地质大学(北京)水资源与环境学院, 北京 100083
  • 收稿日期:2023-12-27 修回日期:2024-02-27 出版日期:2025-03-25 发布日期:2025-03-25
  • 通信作者: *何江涛(1974—),男,博士,教授,博士生导师,主要从事污染水文地质学研究。E-mail: jthe@cugb.edu.cn
  • 作者简介:褚宴佳(1999— ),女,博士研究生,主要研究方向为污染水文地质。E-mail: cugbyjc@163.com
  • 基金资助:
    中国地质调查局国土资源大调查项目(1212011121170)

Research on identifying the outliers of the TDS in shallow groundwater based on the random forest model

CHU Yanjia1,2(), HE Baonan1,2, CHEN Zhen1,2, HE Jiangtao1,2,*()   

  1. 1. Key Laboratory of Groundwater Conservation of Ministry of Water Resources (in preparation), China University of Geosciences (Beijing), Beijing 100083,China
    2. School of Water Resources and Environment, China University of Geosciences (Beijing), Beijing 100083, China
  • Received:2023-12-27 Revised:2024-02-27 Online:2025-03-25 Published:2025-03-25

摘要:

准确识别人类活动引起的地下水水化学异常对于确定地下水水化学组分的背景值,合理开展地下水污染评价至关重要。溶解性总固体(TDS)作为地下水水化学的综合指标,其值的高低直接反映了地下水水质的好坏。目前,水化学图法在地下水TDS的异常值识别中取得了较好的效果,但是,其基本原理是基于主要离子组分构成的水化学类型异常必然导致TDS异常的假设,而进行的反向异常识别,可能存在过度识别的情况。为此,本文以沙颍河流域浅层地下水为研究对象,从TDS成因机制出发,提出了采用随机森林模型结合数理统计的正向识别方法,对研究区内浅层地下水TDS的异常值进行识别,并开展了多种方法异常值识别效果的对比研究。结果表明,机器学习法能够有效地识别出地下水TDS异常值,其识别出的地下水TDS阈值与其他方法较为一致。但相比之下,机器学习法从TDS成因机制角度识别异常,能够有效避免水化学图存在的过度识别问题,而且能够区分高、低异常,为TDS异常识别提供了另外一种有效的思路和方法,丰富了地下水环境背景值的研究思路。

关键词: 地下水环境背景值, TDS, 异常值, 机器学习法, 沙颍河流域

Abstract:

Accurately identifying groundwater hydrochemical outliers caused by human activities is crucial for determining the nature background levels of groundwater chemical components and conducting rational assessment of groundwater pollution. The total dissolved solids (TDS), serving as a comprehensive indicator of groundwater hydrochemistry, its value directly reflect the quality of groundwater. Currently, the hydrochemical diagrams method has achieved favorable results in identifying outliers of TDS in groundwater. However, its fundamental principle is reverse identification based on the assumption that the hydrochemical type anomalies composed of the major ion components inevitably result in TDS anomalies, which will potentially leading to over-recognition during the anomaly identification. Therefore, the random forest model, commencing with a positive identification of the genesis mechanisms of TDS, combined with statistical method was employed to identify anomalies in TDS of shallow groundwater based in the Shaying River Basin. A comparative analysis of anomaly identification effectiveness among various methods was conducted, whose results demonstrated that the machine learning method could effectively identify the outliers of TDS in groundwater, with the identified TDS thresholds aligning with those derived from alternative methods. In contrast, the machine learning method, grounded in TDS genesis mechanisms, effectively identified anomalies by mitigating errors in hydrochemical diagrams. This approach successfully distinguished between high and low outliers, offering an alternative and efficacious method for TDS anomaly identification and expanding research perspectives on groundwater environmental background values.

Key words: groundwater environmental background values, TDS, outliers, machine learning method, the Shaying River Basin

中图分类号: