样本不平衡条件下的甘南地区金矿定量预测方法

doi:10.13745/j.esf.sf.2025.4.73

地学前缘 ›› 2025, Vol. 32 ›› Issue (4): 108-121.DOI: 10.13745/j.esf.sf.2025.4.73

样本不平衡条件下的甘南地区金矿定量预测方法

谢淼¹(), 柳炳利²^,³^,^*(), 李芸和²^,³, 王政尧²^,³, 曹昌杰²^,³, 吴艺骁⁴

1.中国地质科学院地球物理地球化学勘查研究所, 河北廊坊 065000
2.成都理工大学数学地质四川省重点实验室, 四川成都 610059
3.成都理工大学数学科学学院, 四川成都 610000
4.中国地质大学(北京)地球科学与资源学院, 北京 100083

收稿日期:2025-01-15 修回日期:2025-04-29 出版日期:2025-07-25 发布日期:2025-08-04
通信作者: *柳炳利(1981—),男,副教授,长期从事数学地质研究。E-mail: liubingli-82@163.com
作者简介:谢淼(1999—),女,博士研究生,地球化学专业。E-mail: xiemiao0825@163.com
基金资助:
国家重点研发计划项目(2023YFC2906403);国家重点研发计划项目(2022YFC2905002);四川省自然科学基金(2024NSFSC0009);中国地质调查局委托业务(DD20243233);紫金矿业集团横向委托项目(4502-FW-2024-00055)

Quantitative prediction method of gold deposits in Gannan area under unbalanced sample conditions

XIE Miao¹(), LIU Bingli²^,³^,^*(), LI Yunhe²^,³, WANG Zhengyao²^,³, CAO Changjie²^,³, WU Yixiao⁴

1. Institute of Geophysical and Geochemical Exploration, Chinese Academy of Geological Sciences, Langfang 065000, China
2. Geomathematics Key Laboratory of Sichuan Province, Chengdu University of Technology, Chengdu 610059, China
3. College of Mathematics and Sciences, Chengdu University of Technology, Chengdu 610000, China
4. School of Earth Sciences and Resources, China University of Geosciences (Beijing), Beijing 100083, China

Received:2025-01-15 Revised:2025-04-29 Online:2025-07-25 Published:2025-08-04

摘要/Abstract

摘要：

深度学习模型因其在数据特征提取方面的强大能力而在成矿预测领域得到了广泛应用。然而,基于监督学习的深度学习方法常常面临着训练样本不足和正负样本不均衡的问题,尤其是成矿事件的稀有性易导致模型的稳健性与泛化能力不足。为了解决这一问题,本文使用了3种不同的数据增强方法:一是使用滑动窗口的数据增强方法,以“已知正负样本”为中心,采用多次滑动的方式完成增强;二是使用生成式模型,如生成对抗网络(generative adversarial networks,GAN);三是带梯度惩罚的Wasserstein生成对抗网络(Wasserstein generative adversarial network with gradient penalty,WGAN-GP),利用真实样本训练网络,基于训练完备的生成器实现增强。3种不同的数据增强方法能够在样本量扩充的同时,尽可能地保留地质意义。为了验证数据增强的有效性,本文使用真实样本与生成样本之间的FID(Frechet inception distance)值和卷积神经网络(convolutional neural network,CNN)进行评估。结果表明,基于WGAN-GP增强后的数据集在CNN模型具有更强的泛化能力,绘制的甘南地区金矿成矿远景图为未来的矿产资源勘查工作提供了重要的启示。

关键词: 样本不均衡, 数据增强, 卷积神经网络, 定量预测

Abstract:

Deep learning models have been widely applied in mineral prospectivity mapping (MPM) due to their powerful ability to extract features from data. However, supervised deep learning methods often face challenges such as insufficient training samples and class imbalance between positive and negative samples. The inherent rarity of mineralization events further compromises model robustness and generalization ability. To address these issues, this study employs three distinct data augmentation methods:1. Sliding Window Augmentation: This method uses known positive and negative samples as centers and performs multiple sliding operations to generate augmented samples; 2. Generative Adversarial Network (GAN) Augmentation: Generative models, specifically GANs, are utilized. The networks are trained on real samples, and augmentation is achieved using the trained generator; 3. Wasserstein Generative Adversarial Network with Gradient Penalty (WGAN-GP) Augmentation: Similarly, the WGAN-GP framework is trained on real samples, and its trained generator is used for sample augmentation. These three data augmentation methods effectively expand the sample size while maximally preserving the geological significance of the samples. To validate the effectiveness of augmentation, this study employs the Fréchet Inception Distance (FID) between real and generated samples alongside evaluation using a Convolutional Neural Network (CNN). The results demonstrate that the CNN model trained on the WGAN-GP-augmented dataset exhibits superior generalization ability. Furthermore, the mineral prospectivity map for gold deposits generated for the Gannan area provides important insights for future mineral resource exploration efforts.

Key words: sample imbalance, data augmentation, convolutional neural network, quantitative prediction

中图分类号:

P628
TP183

谢淼, 柳炳利, 李芸和, 王政尧, 曹昌杰, 吴艺骁. 样本不平衡条件下的甘南地区金矿定量预测方法[J]. 地学前缘, 2025, 32(4): 108-121.

XIE Miao, LIU Bingli, LI Yunhe, WANG Zhengyao, CAO Changjie, WU Yixiao. Quantitative prediction method of gold deposits in Gannan area under unbalanced sample conditions[J]. Earth Science Frontiers, 2025, 32(4): 108-121.

图/表 15

图1 研究区地质简图(据甘肃省地质调查院改) a—区域大地构造分布图;b—研究区地质图。

Fig.1 Simplified geologic map of the study area. Modified from the Gansu Provincial Geological Survey Institute.

图2 38种地球化学元素R型聚类分析图

Fig.2 R-type cluster analysis pedigree of 38 geochemical elements

图3 研究使用的部分数据 a—Au元素分布图;b—断裂距离图。

Fig.3 Part of the data used in the study

图4 正负样本分布图

Fig.4 Positive and negative sample distribution

图5 数据集划分流程图(据文献[46])

Fig.5 The process of dataset split. Adapted from [46].

表1 CNN网络结构

Table 1 CNN network structure

层类别	输入	输出	卷积核大小
Conv2d_1	[m,9,40,40]	[m,32,20,20]	5×5
BatchNorm2d	[m,32,40,40]	[m,32,40,40]
ReLU	[m,32,40,40]	[m,32,40,40]
MaxPooL_1	[m,32,40,40]	[m,32,20,20]	2×2
Conv2d_2	[m,32,20,20]	[m,64,20,20]	3×3
BatchNorm2d	[m,64,20,20]	[m,64,20,20]
ReLU	[m,64,20,20]	[m,64,20,20]
MaxPooL_2	[m,64,20,20]	[m,64,10,10]	2×2
Conv2d_3	[m,64,10,10]	[m,128,10,10]	3×3
BatchNorm2d	[m,128,10,10]	[m,128,10,10]
ReLU	[m,128,10,10]	[m,128,10,10]
Linear_1	[m,12800]	[m,512]
Linear_2	[m,512]	[m,2]

图6 GAN结构

Fig.6 GAN network structure

表2 GAN使用参数

Table 2 GAN network parameter

类别	生成器学习率衰减	判别器学习率衰减	批次大小	迭代次数	优化器
正样本	0.99	0.98	32	2 500	Adam
负样本	0.99	0.98	32	2 500	Adam

表3 WGAN-GP网络使用参数

Table 3 WGAN-GP network parameter

类别	惩罚系数	生成器学习率衰减	判别器学习率衰减	批次大小	迭代次数	优化器
正样本	7.5	0.975	0.97	32	2 500	Adam
负样本	6.5	0.975	0.97	32	2 500	Adam

图7 实验流程图

Fig.7 Flow chart of experiment

图8 GAN和WGAN-GP在不同epoch的FID值

Fig.8 FID values of GAN and WGAN-GP at different epochs

表4 各模型的最优FID值

Table 4 The optimal FID values for each model

模型	正样本FID值	负样本FID值	FID平均值
滑动窗口	90.08	15.98	53.03
GAN	279.76	197.32	238.54
WGAN-GP	165.87	36.68	101.27

表5 不同扩增倍数下CNN模型分类性能对比

Table 5 Comparison of classification performance of CNN models under different enhancement factors

增强倍数	训练集准确率/%	测试集准确率/%	召回率/%	精确率/%	Kappa系数/%	F1分数/%
×4	98.67	89.47	89.47	91.18	78.38	89.25
×8	98.79	90.64	90.64	91.72	80.84	89.87
×12	99.26	85.96	85.96	88.84	70.99	85.49

表6 数据增强8倍各模型分类性能对比

Table 6 Comparison of classification performance among models with 8-fold data augmentation

模型	训练集准确率/%	测试集准确率/%	召回率/%	精确度/%	Kappa系数/%	F1分数/%	受试者工作特征曲线下面积
滑动窗口_CNN	99.87	87.72	87.71	88.76	74.86	87.52	0.92
GAN_CNN	98.12	89.47	89.47	91.18	78.38	87.39	0.93
WGAN-GP_CNN	98.39	94.74	94.73	95.2	89.29	94.7	0.98

图9 成矿远景预测图 a—滑动窗口_CNN;b—GAN_CNN;c—WGAN-GP_CNN。

Fig.9 Metallogenic prospects prediction map

参考文献 56

[1]	张振杰, 成秋明, 杨玠, 等. 机器学习与成矿预测: 以闽西南铁多金属矿预测为例[J]. 地学前缘, 2021, 28(3): 221-235. DOI
[2]	左仁广. 勘查地球化学数据挖掘与弱异常识别[J]. 地学前缘, 2019, 26(4): 67-75. DOI
[3]	左仁广. 基于数据科学的矿产资源定量预测的理论与方法探索[J]. 地学前缘, 2021, 28(3): 49-55. DOI
[4]	ZUO R G, XIONG Y H, WANG J, et al. Deep learning and its application in geochemical mapping[J]. Earth-Science Reviews, 2019, 192: 1-14. DOI
[5]	XIONG Y H, ZUO R G, CARRANZA E J M. Mapping mineral prospectivity through big data analytics and a deep learning algorithm[J]. Ore Geology Reviews, 2018, 102: 811-817.
[6]	SUN T, LI H, WU K X, et al. Data-driven predictive modelling of mineral prospectivity using machine learning and deep learning methods: a case study from southern Jiangxi Province, China[J]. Minerals, 2020, 10(2): 102.
[7]	LI S, CHEN J P, XIANG J. Applications of deep convolutional neural networks in prospecting prediction based on two-dimensional geological big data[J]. Neural Computing and Applications, 2020, 32(7): 2037-2053.
[8]	CHEN G X, HUANG N, WU G P, et al. Mineral prospectivity mapping based on wavelet neural network and Monte Carlo simulations in the Nanling W-Sn metallogenic province[J]. Ore Geology Reviews, 2022, 143: 104765.
[9]	王成彬, 王明果, 王博, 等. 融合知识图谱的矿产资源定量预测[J]. 地学前缘, 2024, 31(4): 26-36. DOI
[10]	曹胜桃, 胡瑞忠, 周永章, 等. 基于大数据关联规则算法的卡林型金矿床元素富集规律及找矿方法研究[J]. 地学前缘, 2024, 31(4): 58-72. DOI
[11]	CHEN G X, CHENG Q M, PUETZ S. Special issue: data-driven discovery in geosciences: opportunities and challenges[J]. Mathematical Geosciences, 2023, 55(3): 287-293.
[12]	ZUO R, PENG Y, LI T, XIONG Y. Challenges of geological prospecting big data mining and integration using deep learning algorithms[J]. Earth Science, 2021, 46(1): 350-358.
[13]	FADAEE M, BISAZZA A, MONZ C. Data augmentation for low-resource neural machine translation[C]// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2:Short Papers). Vancouver, Canada: Stroudsburg, PA, USAACL, 2017: 567-573.
[14]	王琳, 季晓慧, 杨眉, 等. 基于数据增强和集成学习的矿物图像识别[J]. 地学前缘, 2024, 31(4): 87-94. DOI
[15]	YANG N, ZHANG Z K, YANG J H, et al. Applications of data augmentation in mineral prospectivity prediction based on convolutional neural networks[J]. Computers & Geosciences, 2022, 161: 105075.
[16]	HARIHARAN S, TIRODKAR S, PORWAL A, et al. Random forest-based prospectivity modelling of greenfield terrains using sparse deposit data: an example from the tanami region, Western Australia[J]. Natural Resources Research, 2017, 26(4): 489-507.
[17]	LI T F, XIA Q L, ZHAO M Y, et al. Prospectivity mapping for tungsten polymetallic mineral resources, Nanling metallogenic belt, South China: use of random forest algorithm from a perspective of data imbalance[J]. Natural Resources Research, 2020, 29(1): 203-227.
[18]	PRADO E M G, DESOUZA FILHO C R, CARRANZA E J M, et al. Modeling of Cu-Au prospectivity in the Carajás mineral province (Brazil) through machine learning: dealing with imbalanced training data[J]. Ore Geology Reviews, 2020, 124: 103611.
[19]	PARSA M. A data augmentation approach to XGboost-based mineral potential mapping: an example of carbonate-hosted Zn-Pb mineral systems of Western Iran[J]. Journal of Geochemical Exploration, 2021, 228: 106811.
[20]	MA D A, TANG P, ZHAO L J. Sifting GAN: generating and sifting labeled samples to improve the remote sensing image scene classification baseline in vitro[J]. IEEE Geoscience and Remote Sensing Letters, 2019, 16(7): 1046-1050.
[21]	张利军, 鲁文豪, 张建东, 等. 基于深度学习的镜下岩石、矿物薄片识别[J]. 地学前缘, 2024, 31(3): 498-510. DOI
[22]	MORENO-BAREA F J, STRAZZERA F, JEREZ J M, et al. Forward noise adjustment scheme for data augmentation[C]// 2018 IEEE Symposium Series on Computational Intelligence (SSCI). Bangalore, India: IEEE, 2018: 728-734.
[23]	DEVRIES T, TAYLOR G W. Improved regularization of convolutional neural networks with cutout[EB/OL]. (2017-11-29)[2024-12-15]. http://arxiv.org/pdf/1708.04552.
[24]	LI S, CHEN J P, LIU C, et al. Mineral prospectivity prediction via convolutional neural networks based on geological big data[J]. Journal of Earth Science, 2021, 32(2): 327-347.
[25]	LI Q K, CHEN G X, LUO L. Mineral prospectivity mapping using attention-based convolutional neural network[J]. Ore Geology Reviews, 2023, 156: 105381.
[26]	SRIVASTAVA N, HINTON G, KRIZHEVSKY A, et al. Dropout: a simple way to prevent neural networks fromoverfitting[J]. Journal of Machine Learning Research, 2014, 15: 1929-1958.
[27]	WU Y X, LIU B L, GAO Y X, et al. Mineral prospecting mapping with conditional generative adversarial network augmented data[J]. Ore Geology Reviews, 2023, 163: 105787.
[28]	第鹏飞, 汤庆艳, 刘聪, 等. 西秦岭夏河—合作地区早子沟和加甘滩金矿床石英微量元素特征及意义[J]. 现代地质, 2021, 35(6): 1608-1621.
[29]	李康宁, 贾儒雅, 李鸿睿, 等. 西秦岭甘肃夏河—合作地区与中酸性侵入岩有关的金铜多金属成矿系统及找矿预测[J]. 地质通报, 2020, 39(8): 1191-1203.
[30]	蒲万峰, 李鸿睿, 袁臻, 等. 甘肃省玛曲县大水金矿“三位一体” 找矿预测地质模型[J]. 地质通报, 2020, 39(8): 1163-1172.
[31]	第鹏飞, 汤庆艳, 刘东晓, 等. 西秦岭甘南地区金矿床黄铁矿微量元素地球化学特征及意义: 以加甘滩和早子沟金矿为例[J]. 稀土, 2023, 44(4): 140-154.
[32]	LIUJ J, LIU C H, CARRANZA E J M, et al. Geological characteristics and ore-forming process of the gold deposits in the western Qinling region, China[J]. Journal of Asian Earth Sciences, 2015, 103: 40-69.
[33]	刘家军, 刘冲昊, 王建平, 等. 西秦岭地区金矿类型及其成矿作用[J]. 地学前缘, 2019, 26(5): 1-16. DOI
[34]	陈耀宇. 甘南地区金矿找矿标志与找矿模型: 大水、早子沟、拉尔玛金矿床对比分析[J]. 矿产与地质, 2020, 34(1): 7-18.
[35]	李康宁, 张江苏, 徐进, 等. 西秦岭甘南加甘滩金矿床流体包裹体及氢-氧-硫-铅同位素特征[J]. 地质通报, 2023, 42(6): 941-952.
[36]	朱赖民, 张国伟, 李犇, 等. 秦岭造山带重大地质事件、矿床类型和成矿大陆动力学背景[J]. 矿物岩石地球化学通报, 2008, 27(4): 384-390.
[37]	陈衍景. 秦岭印支期构造背景、岩浆活动及成矿作用[J]. 中国地质, 2010, 37(4): 854-865.
[38]	陈衍景, 张静, 张复新, 等. 西秦岭地区卡林—类卡林型金矿床及其成矿时间、构造背景和模式[J]. 地质论评, 2004, 50(2): 134-152.
[39]	翟裕生, 姚书振, 蔡克勤. 矿床学[M]. 3版. 北京: 地质出版社, 2011.
[40]	张家瑞, 高永伟, 张忠平, 等. 甘肃西秦岭地区重要金矿预测模型的建立及资源潜力预测[J]. 西北地质, 2024, 57(5): 88-105.
[41]	XIE X J, MU X Z, REN T X. Geochemical mapping in China[J]. Journal of Geochemical Exploration, 1997, 60(1): 99-113.
[42]	XIE X J, WANG X Q, ZHANG Q, et al. Multi-scale geochemical mapping in China[J]. Geochemistry: Exploration, Environment, Analysis, 2008, 8(3/4): 333-341.
[43]	WANG X Q, ZHANG Q, ZHOU G H. National-scale geochemical mapping projects in China[J]. Geostandards and Geoanalytical Research, 2007, 31(4): 311-320.
[44]	AITCHISON J. The statistical analysis of compositional data[M]. London: Chapman and Hall, 1986.
[45]	ZUO R G, WANG Z Y. Effects of random negative training samples on mineral prospectivity mapping[J]. Natural Resources Research, 2020, 29(6): 3443-3455.
[46]	LU Y, TAO X P, ZENG N Y, et al. Enhanced CNN classification capability for small rice disease datasets using progressive WGAN-GP: algorithms and applications[J]. Remote Sensing, 2023, 15(7): 1789.
[47]	GOODFELLOW I, POUGET-ABADIE J, MIRZA M, et al. Generative adversarial nets[J]. Advances in Neural Information Processing Systems, 2014, 27:2672-2680.
[48]	GOODFELLOW I, YOSHUA B, AARON C. Deep learning[M]. Cambridge, MA: MIT Press, 2016.
[49]	RADFORD A, METZ L, CHINTALAS S. Unsupervised representation learning with deep convolution generative adversarial networks[EB/OL]. (2016-01-07)[2024-12-15]. http://arxiv.org/pdf/1511.06434.
[50]	SALIMANS T, GOODFELLOW I, ZAREMBA W, et al. Improved techniques for training GANS[J]. Advances in Neural Information Processing Systems, 2016, 29. DOI: 10.4855/arxiv.1511.06434.
[51]	GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[J]. Advances in Neural Information Processing Systems, 2017, 30: 5767-5777.
	GULRAJANI I, AHMED F, ARJOVSKY M, et al. Improved training of Wasserstein GANs[EB/OL]. (2017-03-31)[2025-04-26]. https://arxiv.org/abs/1704.00028v3.
[52]	LECUN Y, BOTTOU L, BENGIO Y, et al. Gradient-based learning applied to document recognition[J]. Proceedings of the IEEE, 1998, 86(11): 2278-2324.
[53]	HINTON G E, SALAKHUTDINOV R R. Reducing the dimensionality of data with neural networks[J]. Science, 2006, 313(5786): 504-507. DOI PMID
[54]	KRIZHEVSKY A, SUTSKEVER I, HINTON G. ImageNet classification with deep convolutional neural networks[J]. Advances in Neural Information Processing Systems, 2012, 25(2): 1097-1105.
[55]	IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift[EB/OL]. (2015-03-02)[2025-01-20]. http://arxiv.org/pdf/1502.03167.
[56]	HEUSEL M, RAMSAUER H, UNTERTHINER T, et al. GANs trained by a two time-scale update rule converge to a local Nash equilibrium[J]. Advances in Neural Information Processing Systems, 2017, 30: 6626-6637.

样本不平衡条件下的甘南地区金矿定量预测方法

Quantitative prediction method of gold deposits in Gannan area under unbalanced sample conditions

RichHTML

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

图/表 15

参考文献 56

相关文章 14

编辑推荐

Metrics

本文评价

[1]	李楠, 尹世滔, 柳炳利, 肖克炎, 王成辉, 代鸿章, 宋相龙. 知识-数据联合驱动的可解释智能矿产预测研究:以四川可尔因矿集区为例[J]. 地学前缘, 2025, 32(4): 60-77.
[2]	李云涛, 丁文龙, 韩俊, 黄诚, 王来源, 孟庆修. 顺北地区走滑断裂带奥陶系碳酸盐岩裂缝分布预测与主控因素研究[J]. 地学前缘, 2024, 31(5): 263-287.
[3]	王琳, 季晓慧, 杨眉, 何明跃, 张招崇, 曾姗, 王玉柱. 基于数据增强和集成学习的矿物图像识别[J]. 地学前缘, 2024, 31(4): 87-94.
[4]	王成彬, 王明果, 王博, 陈建国, 马小刚, 蒋恕. 融合知识图谱的矿产资源定量预测[J]. 地学前缘, 2024, 31(4): 26-36.
[5]	张利军, 鲁文豪, 张建东, 彭光雄, 卜建财, 唐凯, 谢渐成, 徐质彬, 杨海燕. 基于深度学习的镜下岩石、矿物薄片识别[J]. 地学前缘, 2024, 31(3): 498-510.
[6]	冯军, 张琪, 罗建民. 深度挖掘数据潜在价值提高找矿靶区定量优选精度[J]. 地学前缘, 2022, 29(4): 403-411.
[7]	王功文, 张寿庭, 燕长海, 庞振山, 王宏伟, 冯占奎, 董宏, 程红涛, 何亚清, 李瑞喜, 张智强, 黄蕾蕾, 郭娜娜. 栾川矿集区地学大数据挖掘和三维/四维建模的资源-环境联合预测与定量评价[J]. 地学前缘, 2021, 28(3): 139-155.
[8]	孔维豪, 肖克炎, 陈建平, 孙莉, 李楠. 降低矿产资源定量预测不确定性的双向预测方法[J]. 地学前缘, 2021, 28(3): 128-138.
[9]	安文通, 陈建平, 朱鹏飞. 基于成矿过程数值模拟的隐伏矿双向预测研究[J]. 地学前缘, 2021, 28(3): 97-111.
[10]	夏庆霖, 赵梦余, 王孝臣, 冷帅, 李童斐, 熊双才. 基于地质异常的内蒙古新达来草原覆盖区钼铜多金属矿产定量预测[J]. 地学前缘, 2021, 28(3): 56-66.
[11]	左仁广. 基于数据科学的矿产资源定量预测的理论与方法探索[J]. 地学前缘, 2021, 28(3): 49-55.
[12]	郭艳军, 周哲, 林贺洵, 刘小辉, 陈丹丘, 祝佳琪, 伍峻琦. 基于深度学习的智能矿物识别方法研究[J]. 地学前缘, 2020, 27(5): 39-47.
[13]	葛粲，汪方跃，顾海欧，管怀峰，李修钰，袁峰. 基于卷积神经网络和火山岩大数据的构造源区判别[J]. 地学前缘, 2019, 26(4): 22-32.
[14]	陈建平, 于萍萍, 史蕊, 于淼, 张顺昌. 区域隐伏矿体三维定量预测评价方法研究[J]. 地学前缘, 2014, 21(5): 211-220.