地学前缘 ›› 2025, Vol. 32 ›› Issue (5): 466-483.DOI: 10.13745/j.esf.sf.2025.9.3

• 地学智能计算 • 上一篇    下一篇

训练集规模优先于相似性:机器学习模型在积石山地震滑坡预测中的比较研究

刘美玉1(), 吴玮1,*(), 王汇2, 罗伟儿1, 吴娟娟1, 郭旭东1   

  1. 1.中国应急管理部国家减灾中心, 北京 100124
    2.北京林业大学 水土保持学院, 北京 100083
  • 收稿日期:2025-02-20 修回日期:2025-07-30 出版日期:2025-09-25 发布日期:2025-10-14
  • 通信作者: 吴玮
  • 作者简介:刘美玉(1992—),女,博士,助理研究员,主要从事自然灾害监测评估研究与应用工作。E-mail: liumeiyu@ndrccc.org.cn
  • 基金资助:
    国家重点研发计划项目(2024YFC3015405)

Training set size takes precedence over similarity: A comparative study of machine learning models for landslide prediction in the Jishishan earthquake

LIU Meiyu1(), WU Wei1,*(), WANG Hui2, LUO Weier1, WU Juanjuan1, GUO Xudong1   

  1. 1. National Disaster Reduction Center of China, Ministry of Emergency Management, Beijing 100124, China
    2. School of Soil and Water Conservation, Beijing Forestry University, Beijing 100083, China
  • Received:2025-02-20 Revised:2025-07-30 Online:2025-09-25 Published:2025-10-14
  • Contact: WU Wei

摘要:

机器学习方法是预测地震诱发滑坡风险的重要方法,可显著提高震后风险评估的效率。为探究不同机器学习模型对地震滑坡危险性预测的效果,本研究以2023年甘肃积石山Ms 6.2地震VII度区为研究区域,利用中国西部地区8次地震诱发滑坡的数据,制作8次地震异构训练集和8次地震传统训练集,并从中挑选出与积石山地震最为相似的4次地震,制作4次地震异构训练集和4次地震传统训练集,对3种主流机器学习模型——随机森林(RF)、人工神经网络(ANN)和极端梯度提升(XGBoost)进行对比评估。结果显示,使用8次地震异构训练集的随机森林模型表现最佳,AUC值最高,预测精度最佳。使用8次地震传统训练集的极端梯度提升和人工神经网络模型也表现良好,AUC值高于其他数据集。反之,基于地震相似性所构建的4次地震数据集训练的模型均显示出较低的准确性,表明样本大小对模型性能的影响大于样本相似性。此外,所有使用传统4次地震数据集的模型均表现出过拟合现象,进一步说明训练集规模的重要性。本研究为提高地震诱发滑坡预测的准确性提供了关于训练数据和模型选择的方法,可为应急响应和灾害风险管理工作提供有力支持。

关键词: 积石山地震, 山体滑坡, 机器学习模型, 准确性评估

Abstract:

Machine learning methods are key tools for predicting earthquake-induced landslide risks, significantly improving post-seismic risk assessment. This study offers practical guidelines for such predictions by evaluating three prominent machine learning models—Random Forest (RF), Artificial Neural Network (ANN), and XGBoost—using data from eight earthquakes in western China. We predict the landslide risk following the Ms 6.2 Jishishan earthquake in Gansu and create four training sets: a heterogeneous set and a traditional set based on all eight earthquakes, and another pair of sets using four earthquakes most similar to Jishishan in magnitude and other factors. Results show that the RF model using the heterogeneous training set from all eight earthquakes achieved the highest AUC value and most accurate predictions. The XGBoost and ANN models using the traditional training set of all eight earthquakes also performed well, with higher AUC values than other sets. Conversely, models trained on the four-earthquake set constructed based on seismic similarity all demonstrated relatively low accuracy, indicating that sample size impacts model performance more than sample similarity. Additionally, all models using the traditional four-earthquake set exhibited overfitting, further demonstrating the importance of the scale of the training set. This study provides insights into training data and model selection for improving the accuracy of earthquake-induced landslide predictions and supports emergency response and disaster risk management efforts.

Key words: Jishishan earthquake, landslide, machine learning model, accuracy assessment

中图分类号: