地学前缘 ›› 2025, Vol. 32 ›› Issue (4): 108-121.DOI: 10.13745/j.esf.sf.2025.4.73
谢淼1(), 柳炳利2,3,*(
), 李芸和2,3, 王政尧2,3, 曹昌杰2,3, 吴艺骁4
收稿日期:
2025-01-15
修回日期:
2025-04-29
出版日期:
2025-07-25
发布日期:
2025-08-04
通信作者:
*柳炳利(1981—),男,副教授,长期从事数学地质研究。E-mail: 作者简介:
谢 淼(1999—),女,博士研究生,地球化学专业。E-mail: xiemiao0825@163.com
基金资助:
XIE Miao1(), LIU Bingli2,3,*(
), LI Yunhe2,3, WANG Zhengyao2,3, CAO Changjie2,3, WU Yixiao4
Received:
2025-01-15
Revised:
2025-04-29
Online:
2025-07-25
Published:
2025-08-04
摘要:
深度学习模型因其在数据特征提取方面的强大能力而在成矿预测领域得到了广泛应用。然而,基于监督学习的深度学习方法常常面临着训练样本不足和正负样本不均衡的问题,尤其是成矿事件的稀有性易导致模型的稳健性与泛化能力不足。为了解决这一问题,本文使用了3种不同的数据增强方法:一是使用滑动窗口的数据增强方法,以“已知正负样本”为中心,采用多次滑动的方式完成增强;二是使用生成式模型,如生成对抗网络(generative adversarial networks,GAN);三是带梯度惩罚的Wasserstein生成对抗网络(Wasserstein generative adversarial network with gradient penalty,WGAN-GP),利用真实样本训练网络,基于训练完备的生成器实现增强。3种不同的数据增强方法能够在样本量扩充的同时,尽可能地保留地质意义。为了验证数据增强的有效性,本文使用真实样本与生成样本之间的FID(Frechet inception distance)值和卷积神经网络(convolutional neural network,CNN)进行评估。结果表明,基于WGAN-GP增强后的数据集在CNN模型具有更强的泛化能力,绘制的甘南地区金矿成矿远景图为未来的矿产资源勘查工作提供了重要的启示。
中图分类号:
谢淼, 柳炳利, 李芸和, 王政尧, 曹昌杰, 吴艺骁. 样本不平衡条件下的甘南地区金矿定量预测方法[J]. 地学前缘, 2025, 32(4): 108-121.
XIE Miao, LIU Bingli, LI Yunhe, WANG Zhengyao, CAO Changjie, WU Yixiao. Quantitative prediction method of gold deposits in Gannan area under unbalanced sample conditions[J]. Earth Science Frontiers, 2025, 32(4): 108-121.
图1 研究区地质简图(据甘肃省地质调查院改) a—区域大地构造分布图;b—研究区地质图。
Fig.1 Simplified geologic map of the study area. Modified from the Gansu Provincial Geological Survey Institute.
层类别 | 输入 | 输出 | 卷积核大小 |
---|---|---|---|
Conv2d_1 | [m,9,40,40] | [m,32,20,20] | 5×5 |
BatchNorm2d | [m,32,40,40] | [m,32,40,40] | |
ReLU | [m,32,40,40] | [m,32,40,40] | |
MaxPooL_1 | [m,32,40,40] | [m,32,20,20] | 2×2 |
Conv2d_2 | [m,32,20,20] | [m,64,20,20] | 3×3 |
BatchNorm2d | [m,64,20,20] | [m,64,20,20] | |
ReLU | [m,64,20,20] | [m,64,20,20] | |
MaxPooL_2 | [m,64,20,20] | [m,64,10,10] | 2×2 |
Conv2d_3 | [m,64,10,10] | [m,128,10,10] | 3×3 |
BatchNorm2d | [m,128,10,10] | [m,128,10,10] | |
ReLU | [m,128,10,10] | [m,128,10,10] | |
Linear_1 | [m,12800] | [m,512] | |
Linear_2 | [m,512] | [m,2] |
表1 CNN网络结构
Table 1 CNN network structure
层类别 | 输入 | 输出 | 卷积核大小 |
---|---|---|---|
Conv2d_1 | [m,9,40,40] | [m,32,20,20] | 5×5 |
BatchNorm2d | [m,32,40,40] | [m,32,40,40] | |
ReLU | [m,32,40,40] | [m,32,40,40] | |
MaxPooL_1 | [m,32,40,40] | [m,32,20,20] | 2×2 |
Conv2d_2 | [m,32,20,20] | [m,64,20,20] | 3×3 |
BatchNorm2d | [m,64,20,20] | [m,64,20,20] | |
ReLU | [m,64,20,20] | [m,64,20,20] | |
MaxPooL_2 | [m,64,20,20] | [m,64,10,10] | 2×2 |
Conv2d_3 | [m,64,10,10] | [m,128,10,10] | 3×3 |
BatchNorm2d | [m,128,10,10] | [m,128,10,10] | |
ReLU | [m,128,10,10] | [m,128,10,10] | |
Linear_1 | [m,12800] | [m,512] | |
Linear_2 | [m,512] | [m,2] |
类别 | 生成器学习率衰减 | 判别器学习率衰减 | 批次大小 | 迭代次数 | 优化器 |
---|---|---|---|---|---|
正样本 | 0.99 | 0.98 | 32 | 2 500 | Adam |
负样本 | 0.99 | 0.98 | 32 | 2 500 | Adam |
表2 GAN使用参数
Table 2 GAN network parameter
类别 | 生成器学习率衰减 | 判别器学习率衰减 | 批次大小 | 迭代次数 | 优化器 |
---|---|---|---|---|---|
正样本 | 0.99 | 0.98 | 32 | 2 500 | Adam |
负样本 | 0.99 | 0.98 | 32 | 2 500 | Adam |
类别 | 惩罚系数 | 生成器学习率衰减 | 判别器学习率衰减 | 批次大小 | 迭代次数 | 优化器 |
---|---|---|---|---|---|---|
正样本 | 7.5 | 0.975 | 0.97 | 32 | 2 500 | Adam |
负样本 | 6.5 | 0.975 | 0.97 | 32 | 2 500 | Adam |
表3 WGAN-GP网络使用参数
Table 3 WGAN-GP network parameter
类别 | 惩罚系数 | 生成器学习率衰减 | 判别器学习率衰减 | 批次大小 | 迭代次数 | 优化器 |
---|---|---|---|---|---|---|
正样本 | 7.5 | 0.975 | 0.97 | 32 | 2 500 | Adam |
负样本 | 6.5 | 0.975 | 0.97 | 32 | 2 500 | Adam |
模型 | 正样本FID值 | 负样本FID值 | FID平均值 |
---|---|---|---|
滑动窗口 | 90.08 | 15.98 | 53.03 |
GAN | 279.76 | 197.32 | 238.54 |
WGAN-GP | 165.87 | 36.68 | 101.27 |
表4 各模型的最优FID值
Table 4 The optimal FID values for each model
模型 | 正样本FID值 | 负样本FID值 | FID平均值 |
---|---|---|---|
滑动窗口 | 90.08 | 15.98 | 53.03 |
GAN | 279.76 | 197.32 | 238.54 |
WGAN-GP | 165.87 | 36.68 | 101.27 |
增强倍数 | 训练集准确率/% | 测试集准确率/% | 召回率/% | 精确率/% | Kappa系数/% | F1分数/% |
---|---|---|---|---|---|---|
×4 | 98.67 | 89.47 | 89.47 | 91.18 | 78.38 | 89.25 |
×8 | 98.79 | 90.64 | 90.64 | 91.72 | 80.84 | 89.87 |
×12 | 99.26 | 85.96 | 85.96 | 88.84 | 70.99 | 85.49 |
表5 不同扩增倍数下CNN模型分类性能对比
Table 5 Comparison of classification performance of CNN models under different enhancement factors
增强倍数 | 训练集准确率/% | 测试集准确率/% | 召回率/% | 精确率/% | Kappa系数/% | F1分数/% |
---|---|---|---|---|---|---|
×4 | 98.67 | 89.47 | 89.47 | 91.18 | 78.38 | 89.25 |
×8 | 98.79 | 90.64 | 90.64 | 91.72 | 80.84 | 89.87 |
×12 | 99.26 | 85.96 | 85.96 | 88.84 | 70.99 | 85.49 |
模型 | 训练集 准确率/% | 测试集 准确率/% | 召回率/% | 精确度/% | Kappa系数/% | F1分数/% | 受试者工作特征 曲线下面积 |
---|---|---|---|---|---|---|---|
滑动窗口_CNN | 99.87 | 87.72 | 87.71 | 88.76 | 74.86 | 87.52 | 0.92 |
GAN_CNN | 98.12 | 89.47 | 89.47 | 91.18 | 78.38 | 87.39 | 0.93 |
WGAN-GP_CNN | 98.39 | 94.74 | 94.73 | 95.2 | 89.29 | 94.7 | 0.98 |
表6 数据增强8倍各模型分类性能对比
Table 6 Comparison of classification performance among models with 8-fold data augmentation
模型 | 训练集 准确率/% | 测试集 准确率/% | 召回率/% | 精确度/% | Kappa系数/% | F1分数/% | 受试者工作特征 曲线下面积 |
---|---|---|---|---|---|---|---|
滑动窗口_CNN | 99.87 | 87.72 | 87.71 | 88.76 | 74.86 | 87.52 | 0.92 |
GAN_CNN | 98.12 | 89.47 | 89.47 | 91.18 | 78.38 | 87.39 | 0.93 |
WGAN-GP_CNN | 98.39 | 94.74 | 94.73 | 95.2 | 89.29 | 94.7 | 0.98 |
[1] |
张振杰, 成秋明, 杨玠, 等. 机器学习与成矿预测: 以闽西南铁多金属矿预测为例[J]. 地学前缘, 2021, 28(3): 221-235.
DOI |
[2] |
左仁广. 勘查地球化学数据挖掘与弱异常识别[J]. 地学前缘, 2019, 26(4): 67-75.
DOI |
[3] |
左仁广. 基于数据科学的矿产资源定量预测的理论与方法探索[J]. 地学前缘, 2021, 28(3): 49-55.
DOI |
[4] |
ZUO R G, XIONG Y H, WANG J, et al. Deep learning and its application in geochemical mapping[J]. Earth-Science Reviews, 2019, 192: 1-14.
DOI |
[5] | XIONG Y H, ZUO R G, CARRANZA E J M. Mapping mineral prospectivity through big data analytics and a deep learning algorithm[J]. Ore Geology Reviews, 2018, 102: 811-817. |
[6] | SUN T, LI H, WU K X, et al. Data-driven predictive modelling of mineral prospectivity using machine learning and deep learning methods: a case study from southern Jiangxi Province, China[J]. Minerals, 2020, 10(2): 102. |
[7] | LI S, CHEN J P, XIANG J. Applications of deep convolutional neural networks in prospecting prediction based on two-dimensional geological big data[J]. Neural Computing and Applications, 2020, 32(7): 2037-2053. |
[8] | CHEN G X, HUANG N, WU G P, et al. Mineral prospectivity mapping based on wavelet neural network and Monte Carlo simulations in the Nanling W-Sn metallogenic province[J]. Ore Geology Reviews, 2022, 143: 104765. |
[9] |
王成彬, 王明果, 王博, 等. 融合知识图谱的矿产资源定量预测[J]. 地学前缘, 2024, 31(4): 26-36.
DOI |
[10] |
曹胜桃, 胡瑞忠, 周永章, 等. 基于大数据关联规则算法的卡林型金矿床元素富集规律及找矿方法研究[J]. 地学前缘, 2024, 31(4): 58-72.
DOI |
[11] | CHEN G X, CHENG Q M, PUETZ S. Special issue: data-driven discovery in geosciences: opportunities and challenges[J]. Mathematical Geosciences, 2023, 55(3): 287-293. |
[12] | ZUO R, PENG Y, LI T, XIONG Y. Challenges of geological prospecting big data mining and integration using deep learning algorithms[J]. Earth Science, 2021, 46(1): 350-358. |
[13] | FADAEE M, BISAZZA A, MONZ C. Data augmentation for low-resource neural machine translation[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers). Vancouver, Canada: Stroudsburg, PA, USAACL, 2017: 567-573. |
[14] |
王琳, 季晓慧, 杨眉, 等. 基于数据增强和集成学习的矿物图像识别[J]. 地学前缘, 2024, 31(4): 87-94.
DOI |
[15] | YANG N, ZHANG Z K, YANG J H, et al. Applications of data augmentation in mineral prospectivity prediction based on convolutional neural networks[J]. Computers & Geosciences, 2022, 161: 105075. |
[16] | HARIHARAN S, TIRODKAR S, PORWAL A, et al. Random forest-based prospectivity modelling of greenfield terrains using sparse deposit data: an example from the tanami region, Western Australia[J]. Natural Resources Research, 2017, 26(4): 489-507. |
[17] | LI T F, XIA Q L, ZHAO M Y, et al. Prospectivity mapping for tungsten polymetallic mineral resources, Nanling metallogenic belt, South China: use of random forest algorithm from a perspective of data imbalance[J]. Natural Resources Research, 2020, 29(1): 203-227. |
[18] | PRADO E M G, DESOUZA FILHO C R, CARRANZA E J M, et al. Modeling of Cu-Au prospectivity in the Carajás mineral province (Brazil) through machine learning: dealing with imbalanced training data[J]. Ore Geology Reviews, 2020, 124: 103611. |
[19] | PARSA M. A data augmentation approach to XGboost-based mineral potential mapping: an example of carbonate-hosted Zn-Pb mineral systems of Western Iran[J]. Journal of Geochemical Exploration, 2021, 228: 106811. |
[20] | MA D A, TANG P, ZHAO L J. Sifting GAN: generating and sifting labeled samples to improve the remote sensing image scene classification baseline in vitro[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(7): 1046-1050. |
[21] |
张利军, 鲁文豪, 张建东, 等. 基于深度学习的镜下岩石、矿物薄片识别[J]. 地学前缘, 2024, 31(3): 498-510.
DOI |
[22] | MORENO-BAREA F J, STRAZZERA F, JEREZ J M, et al. Forward noise adjustment scheme for data augmentation[C]// 2018 IEEE Symposium Series on Computational Intelligence (SSCI). Bangalore, India: IEEE, 2018: 728-734. |
[23] | DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. (2017-11-29)[2024-12-15]. http://arxiv.org/pdf/1708.04552. |
[24] | LI S, CHEN J P, LIU C, et al. Mineral prospectivity prediction via convolutional neural networks based on geological big data[J]. Journal of Earth Science, 2021, 32(2): 327-347. |
[25] | LI Q K, CHEN G X, LUO L. Mineral prospectivity mapping using attention-based convolutional neural network[J]. Ore Geology Reviews, 2023, 156: 105381. |
[26] | SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks fromoverfitting[J]. Journal of Machine Learning Research, 2014, 15: 1929-1958. |
[27] | WU Y X, LIU B L, GAO Y X, et al. Mineral prospecting mapping with conditional generative adversarial network augmented data[J]. Ore Geology Reviews, 2023, 163: 105787. |
[28] | 第鹏飞, 汤庆艳, 刘聪, 等. 西秦岭夏河—合作地区早子沟和加甘滩金矿床石英微量元素特征及意义[J]. 现代地质, 2021, 35(6): 1608-1621. |
[29] | 李康宁, 贾儒雅, 李鸿睿, 等. 西秦岭甘肃夏河—合作地区与中酸性侵入岩有关的金铜多金属成矿系统及找矿预测[J]. 地质通报, 2020, 39(8): 1191-1203. |
[30] | 蒲万峰, 李鸿睿, 袁臻, 等. 甘肃省玛曲县大水金矿“三位一体” 找矿预测地质模型[J]. 地质通报, 2020, 39(8): 1163-1172. |
[31] | 第鹏飞, 汤庆艳, 刘东晓, 等. 西秦岭甘南地区金矿床黄铁矿微量元素地球化学特征及意义: 以加甘滩和早子沟金矿为例[J]. 稀土, 2023, 44(4): 140-154. |
[32] | LIUJ J, LIU C H, CARRANZA E J M, et al. Geological characteristics and ore-forming process of the gold deposits in the western Qinling region, China[J]. Journal of Asian Earth Sciences, 2015, 103: 40-69. |
[33] |
刘家军, 刘冲昊, 王建平, 等. 西秦岭地区金矿类型及其成矿作用[J]. 地学前缘, 2019, 26(5): 1-16.
DOI |
[34] | 陈耀宇. 甘南地区金矿找矿标志与找矿模型: 大水、 早子沟、 拉尔玛金矿床对比分析[J]. 矿产与地质, 2020, 34(1): 7-18. |
[35] | 李康宁, 张江苏, 徐进, 等. 西秦岭甘南加甘滩金矿床流体包裹体及氢-氧-硫-铅同位素特征[J]. 地质通报, 2023, 42(6): 941-952. |
[36] | 朱赖民, 张国伟, 李犇, 等. 秦岭造山带重大地质事件、 矿床类型和成矿大陆动力学背景[J]. 矿物岩石地球化学通报, 2008, 27(4): 384-390. |
[37] | 陈衍景. 秦岭印支期构造背景、 岩浆活动及成矿作用[J]. 中国地质, 2010, 37(4): 854-865. |
[38] | 陈衍景, 张静, 张复新, 等. 西秦岭地区卡林—类卡林型金矿床及其成矿时间、 构造背景和模式[J]. 地质论评, 2004, 50(2): 134-152. |
[39] | 翟裕生, 姚书振, 蔡克勤. 矿床学[M]. 3版. 北京: 地质出版社, 2011. |
[40] | 张家瑞, 高永伟, 张忠平, 等. 甘肃西秦岭地区重要金矿预测模型的建立及资源潜力预测[J]. 西北地质, 2024, 57(5): 88-105. |
[41] | XIE X J, MU X Z, REN T X. Geochemical mapping in China[J]. Journal of Geochemical Exploration, 1997, 60(1): 99-113. |
[42] | XIE X J, WANG X Q, ZHANG Q, et al. Multi-scale geochemical mapping in China[J]. Geochemistry: Exploration, Environment, Analysis, 2008, 8(3/4): 333-341. |
[43] | WANG X Q, ZHANG Q, ZHOU G H. National-scale geochemical mapping projects in China[J]. Geostandards and Geoanalytical Research, 2007, 31(4): 311-320. |
[44] | AITCHISON J. The statistical analysis of compositional data[M]. London: Chapman and Hall, 1986. |
[45] | ZUO R G, WANG Z Y. Effects of random negative training samples on mineral prospectivity mapping[J]. Natural Resources Research, 2020, 29(6): 3443-3455. |
[46] | LU Y, TAO X P, ZENG N Y, et al. Enhanced CNN classification capability for small rice disease datasets using progressive WGAN-GP: algorithms and applications[J]. Remote Sensing, 2023, 15(7): 1789. |
[47] | GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014, 27:2672-2680. |
[48] | GOODFELLOW I, YOSHUA B, AARON C. Deep learning[M]. Cambridge, MA: MIT Press, 2016. |
[49] | RADFORD A, METZ L, CHINTALAS S. Unsupervised representation learning with deep convolution generative adversarial networks[EB/OL]. (2016-01-07)[2024-12-15]. http://arxiv.org/pdf/1511.06434. |
[50] | SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANS[J]. Advances in Neural Information Processing Systems, 2016, 29. DOI: 10.4855/arxiv.1511.06434. |
[51] | GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[J]. Advances in Neural Information Processing Systems, 2017, 30: 5767-5777. |
GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[EB/OL]. (2017-03-31)[2025-04-26]. https://arxiv.org/abs/1704.00028v3. | |
[52] | LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324. |
[53] |
HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507.
DOI PMID |
[54] | KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 1097-1105. |
[55] | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. (2015-03-02)[2025-01-20]. http://arxiv.org/pdf/1502.03167. |
[56] | HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017, 30: 6626-6637. |
[1] | 李楠, 尹世滔, 柳炳利, 肖克炎, 王成辉, 代鸿章, 宋相龙. 知识-数据联合驱动的可解释智能矿产预测研究:以四川可尔因矿集区为例[J]. 地学前缘, 2025, 32(4): 60-77. |
[2] | 李云涛, 丁文龙, 韩俊, 黄诚, 王来源, 孟庆修. 顺北地区走滑断裂带奥陶系碳酸盐岩裂缝分布预测与主控因素研究[J]. 地学前缘, 2024, 31(5): 263-287. |
[3] | 王琳, 季晓慧, 杨眉, 何明跃, 张招崇, 曾姗, 王玉柱. 基于数据增强和集成学习的矿物图像识别[J]. 地学前缘, 2024, 31(4): 87-94. |
[4] | 王成彬, 王明果, 王博, 陈建国, 马小刚, 蒋恕. 融合知识图谱的矿产资源定量预测[J]. 地学前缘, 2024, 31(4): 26-36. |
[5] | 张利军, 鲁文豪, 张建东, 彭光雄, 卜建财, 唐凯, 谢渐成, 徐质彬, 杨海燕. 基于深度学习的镜下岩石、矿物薄片识别[J]. 地学前缘, 2024, 31(3): 498-510. |
[6] | 冯军, 张琪, 罗建民. 深度挖掘数据潜在价值提高找矿靶区定量优选精度[J]. 地学前缘, 2022, 29(4): 403-411. |
[7] | 王功文, 张寿庭, 燕长海, 庞振山, 王宏伟, 冯占奎, 董宏, 程红涛, 何亚清, 李瑞喜, 张智强, 黄蕾蕾, 郭娜娜. 栾川矿集区地学大数据挖掘和三维/四维建模的资源-环境联合预测与定量评价[J]. 地学前缘, 2021, 28(3): 139-155. |
[8] | 孔维豪, 肖克炎, 陈建平, 孙莉, 李楠. 降低矿产资源定量预测不确定性的双向预测方法[J]. 地学前缘, 2021, 28(3): 128-138. |
[9] | 安文通, 陈建平, 朱鹏飞. 基于成矿过程数值模拟的隐伏矿双向预测研究[J]. 地学前缘, 2021, 28(3): 97-111. |
[10] | 夏庆霖, 赵梦余, 王孝臣, 冷帅, 李童斐, 熊双才. 基于地质异常的内蒙古新达来草原覆盖区钼铜多金属矿产定量预测[J]. 地学前缘, 2021, 28(3): 56-66. |
[11] | 左仁广. 基于数据科学的矿产资源定量预测的理论与方法探索[J]. 地学前缘, 2021, 28(3): 49-55. |
[12] | 郭艳军, 周哲, 林贺洵, 刘小辉, 陈丹丘, 祝佳琪, 伍峻琦. 基于深度学习的智能矿物识别方法研究[J]. 地学前缘, 2020, 27(5): 39-47. |
[13] | 葛粲,汪方跃,顾海欧,管怀峰,李修钰,袁峰. 基于卷积神经网络和火山岩大数据的构造源区判别[J]. 地学前缘, 2019, 26(4): 22-32. |
[14] | 陈建平, 于萍萍, 史蕊, 于淼, 张顺昌. 区域隐伏矿体三维定量预测评价方法研究[J]. 地学前缘, 2014, 21(5): 211-220. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||