Earth Science Frontiers ›› 2025, Vol. 32 ›› Issue (5): 466-483.DOI: 10.13745/j.esf.sf.2025.9.3

Previous Articles     Next Articles

Training set size takes precedence over similarity: A comparative study of machine learning models for landslide prediction in the Jishishan earthquake

LIU Meiyu1(), WU Wei1,*(), WANG Hui2, LUO Weier1, WU Juanjuan1, GUO Xudong1   

  1. 1. National Disaster Reduction Center of China, Ministry of Emergency Management, Beijing 100124, China
    2. School of Soil and Water Conservation, Beijing Forestry University, Beijing 100083, China
  • Received:2025-02-20 Revised:2025-07-30 Online:2025-09-25 Published:2025-10-14
  • Contact: WU Wei

Abstract:

Machine learning methods are key tools for predicting earthquake-induced landslide risks, significantly improving post-seismic risk assessment. This study offers practical guidelines for such predictions by evaluating three prominent machine learning models—Random Forest (RF), Artificial Neural Network (ANN), and XGBoost—using data from eight earthquakes in western China. We predict the landslide risk following the Ms 6.2 Jishishan earthquake in Gansu and create four training sets: a heterogeneous set and a traditional set based on all eight earthquakes, and another pair of sets using four earthquakes most similar to Jishishan in magnitude and other factors. Results show that the RF model using the heterogeneous training set from all eight earthquakes achieved the highest AUC value and most accurate predictions. The XGBoost and ANN models using the traditional training set of all eight earthquakes also performed well, with higher AUC values than other sets. Conversely, models trained on the four-earthquake set constructed based on seismic similarity all demonstrated relatively low accuracy, indicating that sample size impacts model performance more than sample similarity. Additionally, all models using the traditional four-earthquake set exhibited overfitting, further demonstrating the importance of the scale of the training set. This study provides insights into training data and model selection for improving the accuracy of earthquake-induced landslide predictions and supports emergency response and disaster risk management efforts.

Key words: Jishishan earthquake, landslide, machine learning model, accuracy assessment

CLC Number: