地学前缘 ›› 2019, Vol. 26 ›› Issue (4): 45-54.DOI: 10.13745/j.esf.sf.2019.7.3

• 岩石大数据研究 • 上一篇    下一篇

基于机器学习的洋岛玄武岩主量元素预测稀土元素

洪瑾,甘成势,刘洁   

  1. 1. 中山大学 地球科学与工程学院, 广东 广州 510275
    2. 广东省地质过程与矿产资源探查重点实验室, 广东 广州 510275
  • 收稿日期:2018-04-12 修回日期:2018-05-14 出版日期:2019-07-25 发布日期:2019-07-25
  • 通讯作者: 刘洁(1967—),女,教授,博士生导师,主要从事岩石物理及地球动力学研究。
  • 作者简介:洪瑾(1991—),男,博士研究生,构造地质学专业。
  • 基金资助:
    国家重点研发计划项目(2016YFC0600506);国家自然科学基金项目(41574087)

Prediction of REEs in OIB by major elements based on machine learning

HONG Jin,GAN Chengshi,LIU Jie   

  1. 1. School of Earth Sciences and Engineering, Sun Yat-sen University, Guangzhou 510275, China
    2. Guangdong Provincial Key Laboratory of Mineral Resources & Geological Processes, Guangzhou 510275, China
  • Received:2018-04-12 Revised:2018-05-14 Online:2019-07-25 Published:2019-07-25
  • Supported by:
     

摘要: 地学共享数据库(如GEOROC、PetDB等)可为地球科学研究提供重要基础数据。然而,这些数据库均存在一个明显缺陷:样品的9种主量元素(SiO2、TiO2、Al2O3、CaO、MgO、MnO、K2O、Na2O和P2O5)均有准确数据,但稀土元素(rare earth elements,REE)数据大量缺失。鉴于REE在地球化学领域的重要作用,我们尝试为数据库缺失的REE值提供一个补全方案,即利用机器学习中的随机森林方法实现由9种主量元素预测REE值。以洋岛玄武岩(ocean island basalt,OIB)为例,把从GEOROC库中搜集到的1 283组OIB数据按8∶2的比例分为两组,其中80%的数据作为训练数据集用于建模,20%的数据作为测试数据集验证模型。比较了随机森林和多元线性回归方法对相同数据进行建模和预测的效果差异,发现无论是回归建模还是预测,随机森林方法都优于多元线性回归,且随着输入参数与输出参数之间关系的复杂化,这种优势更加明显。随机森林对测试数据集的预测效果整体较好,只是随着REE原子序数的增大,预测效果逐渐减弱。这一方面可能是因为原子序数大的REE与主量元素的关系更弱;另一方面可能是由于原子序数大的REE与主量元素的关系更加复杂。其次,随机森林方法预测的REE配分曲线与实际配分曲线吻合度较高,且预测所得配分曲线的区分能力较强,能够反映实际配分曲线之间的相对差异,这一点对推断地球化学过程尤为重要。随机森林方法随着训练数据的增多,其建立的模型也将越稳定,预测结果也会更精确。因此,随着数据库的不断完善,对数据库中REE值的预测也将更为可信、可行。

 

关键词: 机器学习, 随机森林, 洋岛玄武岩, 主量元素, 稀土元素

Abstract: Geoscience shared databases (GEOROC, PetDB, etc.) provide important basic data for geoscience research. However, there is an obvious defect in these databases, i.e., in database samples, the nine major elements (SiO2, TiO2, Al2O3, CaO, MgO, MnO, K2O, Na2O and P2O5) are mostly present, but rare earth element (REE) data are often missing. In view of the important role of REE in geochemistry, here we attempt to provide a solution for supplementing the missing REE data by using random forest method of machine learning to predict REE values by major elements. Taking Ocean Island Basalt (OIB) as an example, 1283 OIB samples collected from the GEOROC database were divided into two groups: 80% of the data were used as training data for modeling and the remaining 20% were test data for model validation. Comparing the modeling and prediction results using random forest and multivariable linear regression methods on the same data, we found that the random forest method was superior in both aspects with clear advantage; however, the relationship between input and output parameters was not simple. The random forest method predicted the test data very well for light REEs, but prediction power decreased gradually with increasing atomic number, possibly due to a weaker or more complex relationship between heavy rare earth and major elements. The predicted REE distribution pattern by the random forest method matched the actual REE distribution pattern, with good distinguishing power to reflect the relative difference between the actual distribution patterns, which is particularly important to infer the geochemical process. With increasing training data, the model established by the random forest method will be more stable thus to provide more accurate prediction results. Ultimately, REE value prediction will be more reliable and feasible with continuous improvement of databases.

Key words: machine learning, random forest, oceanic island basalt, major elements, rare earth elements

中图分类号: