地学前缘 ›› 2025, Vol. 32 ›› Issue (4): 95-107.DOI: 10.13745/j.esf.sf.2025.4.55
徐凯1,2,3,4(), 徐城阳1, 吴冲龙1,2,3,4, 蔡婧云1, 孔春芳1,2,3,4,*(
)
收稿日期:
2025-01-16
修回日期:
2025-04-23
出版日期:
2025-07-25
发布日期:
2025-08-04
通信作者:
*孔春芳(1973—),女,博士,副教授,主要从事遥感与地理信息系统应用等方面的工作。E-mail: 作者简介:
徐 凯(1972—),男,博士,副教授,主要从事数据挖掘与知识发现、基于大数据的智能找矿、定量遥感与地学信息工程等方面的教学与研究工作。E-mail: xukai@cug.edu.cn
基金资助:
XU Kai1,2,3,4(), XU Chengyang1, WU Chonglong1,2,3,4, CAI Jingyun1, KONG Chunfang1,2,3,4,*(
)
Received:
2025-01-16
Revised:
2025-04-23
Online:
2025-07-25
Published:
2025-08-04
摘要:
黔西北拥有丰富的铅锌矿资源,但由于矿体埋藏较深,找矿难度大。利用机器学习进行的数据驱动的成矿预测正在成为深部隐伏铅锌矿找矿勘探的有力工具。然而,基于机器学习的找矿预测面临着一些普遍的问题,特别是成矿样本少导致训练样本不足和训练样本不平衡等问题。为此,本文提出了一种K均值聚类(K-means Clustering)改进条件表格生成对抗网络(Conditional Tabular Generative Adversarial Network,CTGAN)的见矿样本扩充方法来解决这些问题。具体来说,首先根据K均值聚类后各簇集样本间欧氏距离判断其疏密情况,在稀疏簇集扩充更多的样本以增加其密度实现见矿样本集的扩充。然后,对抗网络生成具有高度抽象的新类别标签,并将新类别标签用于条件生成,从而提高扩充样本的质量。最后,利用扩充后的正样本和随机欠采样的负样本建立数据量充足且平衡的有标签样本集,训练和验证Category Boosting(CatBoost)分类器,建立基于KC-CTGAN-CatBoost成矿预测模型。实验结果表明,相比于未经过KC-CTGAN见矿样本扩充的数据集构建的成矿预测模型,在准确度、召回率、精度和F1-score上分别提高了8.7%、7.4%、10.2%和8.8%,证明KC-CTGAN见矿样本扩充方法的有效性,并提高了成矿预测模型的性能。预测结果将更好地为深部隐伏铅锌矿体的找矿勘探提供更精确的靶区。
中图分类号:
徐凯, 徐城阳, 吴冲龙, 蔡婧云, 孔春芳. 基于样本扩充的黔西北垭都-蟒硐矿区铅锌矿成矿预测研究[J]. 地学前缘, 2025, 32(4): 95-107.
XU Kai, XU Chengyang, WU Chonglong, CAI Jingyun, KONG Chunfang. Metallogenic prediction of lead-zinc ore based on sample expansion in Yadu-Mangdong of Northwestern Guizhou[J]. Earth Science Frontiers, 2025, 32(4): 95-107.
总体质量分数/% | 列形状分数/% | 列对趋势分数/% | |
---|---|---|---|
SMOTE | 85.38 | 83.68 | 84.12 |
ADASYN | 69.75 | 75.52 | 61.97 |
TVAE | 86.75 | 90.75 | 82.76 |
CTGAN | 75.11 | 85.66 | 64.56 |
KC-CTGAN | 87.89 | 91.3 | 84.49 |
表1 不同过采样方法的合成数据指标
Table 1 Synthetic data indicators for different oversampling methods
总体质量分数/% | 列形状分数/% | 列对趋势分数/% | |
---|---|---|---|
SMOTE | 85.38 | 83.68 | 84.12 |
ADASYN | 69.75 | 75.52 | 61.97 |
TVAE | 86.75 | 90.75 | 82.76 |
CTGAN | 75.11 | 85.66 | 64.56 |
KC-CTGAN | 87.89 | 91.3 | 84.49 |
准确率 | 召回率 | 精度 | F1-score | |
---|---|---|---|---|
原始数据 | 0.809 | 0.822 | 0.795 | 0.808 |
SMOTE | 0.889 | 0.878 | 0.879 | 0.878 |
ADASYN | 0.895 | 0.894 | 0.897 | 0.895 |
TVAE | 0.884 | 0.878 | 0.890 | 0.883 |
CTGAN | 0.891 | 0.896 | 0.887 | 0.891 |
KC-CTGAN | 0.896 | 0.896 | 0.897 | 0.896 |
表2 基于不同过采样方法的CatBoost模型性能指标
Table 2 The performance indicators of CatBoost based on different oversampling methods
准确率 | 召回率 | 精度 | F1-score | |
---|---|---|---|---|
原始数据 | 0.809 | 0.822 | 0.795 | 0.808 |
SMOTE | 0.889 | 0.878 | 0.879 | 0.878 |
ADASYN | 0.895 | 0.894 | 0.897 | 0.895 |
TVAE | 0.884 | 0.878 | 0.890 | 0.883 |
CTGAN | 0.891 | 0.896 | 0.887 | 0.891 |
KC-CTGAN | 0.896 | 0.896 | 0.897 | 0.896 |
准确率 | 召回率 | 精度 | F1-score | |
---|---|---|---|---|
LR | 0.767 | 0.767 | 0.772 | 0.769 |
SVM | 0.818 | 0.828 | 0.821 | 0.819 |
MLP | 0.77 | 0.77 | 0.776 | 0.772 |
RF | 0.788 | 0.788 | 0.789 | 0.788 |
AdaBoost | 0.776 | 0.763 | 0.761 | 0.781 |
CatBoost | 0.809 | 0.822 | 0.795 | 0.808 |
表3 KC-CTGAN过采样前不同分类器性能对比
Table 3 Comparison of performance of different classifiers before KC-CTGAN oversampling
准确率 | 召回率 | 精度 | F1-score | |
---|---|---|---|---|
LR | 0.767 | 0.767 | 0.772 | 0.769 |
SVM | 0.818 | 0.828 | 0.821 | 0.819 |
MLP | 0.77 | 0.77 | 0.776 | 0.772 |
RF | 0.788 | 0.788 | 0.789 | 0.788 |
AdaBoost | 0.776 | 0.763 | 0.761 | 0.781 |
CatBoost | 0.809 | 0.822 | 0.795 | 0.808 |
准确率 | 召回率 | 精度 | F1-score | |
---|---|---|---|---|
LR | 0.788 | 0.788 | 0.793 | 0.790 |
SVM | 0.762 | 0.762 | 0.774 | 0.767 |
MLP | 0.813 | 0.813 | 0.815 | 0.813 |
RF | 0.847 | 0.847 | 0.849 | 0.847 |
AdaBoost | 0.839 | 0.828 | 0.790 | 0.808 |
CatBoost | 0.896 | 0.896 | 0.897 | 0.896 |
表4 KC-CTGAN过采样后不同分类器性能对比
Table 4 Comparison of performance of different classifiers after KC-CTGAN oversampling
准确率 | 召回率 | 精度 | F1-score | |
---|---|---|---|---|
LR | 0.788 | 0.788 | 0.793 | 0.790 |
SVM | 0.762 | 0.762 | 0.774 | 0.767 |
MLP | 0.813 | 0.813 | 0.815 | 0.813 |
RF | 0.847 | 0.847 | 0.849 | 0.847 |
AdaBoost | 0.839 | 0.828 | 0.790 | 0.808 |
CatBoost | 0.896 | 0.896 | 0.897 | 0.896 |
[1] | 吴冲龙, 刘刚. 大数据与地质学的未来发展[J]. 地质通报, 2019, 38(7): 1081-1088. |
[2] | 吴冲龙. 大数据和地质信息学能促进地质学定量化进入新阶段吗?[J]. 地球科学, 2022, 47(10): 3913-3914. |
[3] | BERGEN K J, JOHNSON P A, MAARTEN V, et al. Machine learning for data-driven discovery in solid earth geoscience[J]. Science, 2019, 363(6433): eaau0323. |
[4] |
周永章, 肖凡. 管窥人工智能与大数据地球科学研究新进展[J]. 地学前缘, 2024, 31(4): 1-6.
DOI |
[5] |
成秋明. 什么是数学地球科学及其前沿领域?[J]. 地学前缘, 2021, 28(3): 6-25.
DOI |
[6] | MAEPA F, SMITH R S, TESSEMA A. Support vector machine and artificial neural network modelling of orogenic gold prospectivity mapping in the Swayze greenstone belt, Ontario, Canada[J]. Ore Geology Reviews, 2021, 130: 103968. |
[7] | DAYA SAGAR B S, CHENG Q M, MCKINLEY J, et al. Encyclopedia of mathematical geosciences[M]. Berlin: Springer, 2020. |
[8] | 吴冲龙, 刘刚, 张夏林, 等. 地质科学大数据及其利用的若干问题探讨[J]. 科学通报, 2016, 61(16): 1797-1807. |
[9] | 吴冲龙, 周琦, 徐凯, 等. 用于大数据预测的大塘坡式锰矿找矿过程复盘研究[J]. 贵州地质, 2022, 39(3): 189-204. |
[10] | 吴冲龙, 刘刚, 周琦, 等. 地质科学大数据统合应用的基本问题[J]. 地质科技通报, 2020, 39(4): 1-11. |
[11] | 周永章, 左仁广, 刘刚, 等. 数学地球科学跨越发展的十年: 大数据、 人工智能算法正在改变地质学[J]. 矿物岩石地球化学通报, 2021, 40(3): 556-573, 777. |
[12] | XIONG Y, ZUO R. GIS-based rare events logistic regression for mineral prospectivity mapping[J]. Computers & Geosciences, 2018, 111: 18-25. |
[13] | XIAO F, CHEN W, WANG J, et al. A hybrid logistic regression: gene expression programming model and its application to mineral prospectivity mapping[J]. Natural Resources Research, 2022, 31(4): 2041-2064. |
[14] | YANG N, ZHANG Z K, YANG J H, et al. Mineral prospectivity prediction by integration of convolutional autoencoder network and random forest[J]. Natural Resources Research, 2022, 31(3): 1103-1119. |
[15] | RODRIGUEZ-GALIANO V, SANCHEZ-CASTILLO M, CHICA-OLMO M, et al. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines[J]. Ore Geology Reviews, 2015, 71: 804-818. |
[16] | SUN T, LI H, WU K X, et al. Data-driven predictive modelling of mineral prospectivity using machine learning and deep learning methods: a case study from Southern Jiangxi Province, China[J]. Minerals, 2020, 10(2): 102. |
[17] | ZUO R G, CARRANZA E J M. Support vector machine: a tool for mapping mineral prospectivity[J]. Computers & Geosciences, 2011, 37(12): 1967-1975. |
[18] | SENANAYAKE I P, KIEM A S, HANCOCK G R, et al. A spatial data-driven approach for mineral prospectivity mapping[J]. Remote Sensing, 2023, 15(16): 4074. |
[19] | YIN J N, LI N. Ensemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping[J]. Ore Geology Reviews, 2022, 145: 104916. |
[20] | CHEN G X, CHENG Q M, PUETZ S. Data-driven discovery in geosciences: opportunities and challenges[J]. Mathematical Geosciences, 2023, 55(3): 287-293. |
[21] | CHEN M M, XIAO F. Projection pursuit random forest for mineral prospectivity mapping[J]. Mathematical Geosciences, 2023, 55(7): 963-987. |
[22] | SKABAR A. Mineral potential mapping using Bayesian learning for multilayerperceptrons[J]. Mathematical Geology, 2007, 39(5): 439-451. |
[23] | LIN N, CHEN Y L, LIU H Q, et al. A comparative study of machine learning models with hyperparameter optimization algorithm for mapping mineral prospectivity[J]. Minerals, 2021, 11(2):159. |
[24] | CHEN Y L, WU W. Mapping mineral prospectivity using an extreme learning machine regression[J]. Ore Geology Reviews, 2017, 80: 200-213. |
[25] | HAJIHOSSEINLOU M, MAGHSOUDI A, GHEZELBASH R. Stacking: a novel data-driven ensemble machine learning strategy for prediction and mapping of Pb-Zn prospectivity in Varcheh district, west Iran[J]. Expert Systems with Applications, 2024, 237: 121668. |
[26] | BRANDMEIER M, CABRERA ZAMORA I G, NYKÄNEN V, et al. Boosting for mineral prospectivity modeling: a new GIS toolbox[J]. Natural Resources Research, 2020, 29: 71-88. |
[27] | ZHAO J, CHI H Q, SHAO Y Q, et al. Application of AdaBoost Algorithms in Fe mineral prospectivity prediction: a case study in Hongyuntan-Chilongfeng Mineral district, Xinjiang Province, China[J]. Natural Resources Research, 2022, 31: 2001-2022. |
[28] | FAN M J, XIAO K Y, SUN L, et al. Metallogenic prediction based on geological-model driven and data-driven multisource information fusion: a case study of gold deposits in Xiong’ershan area, Henan Province, China[J]. Ore Geology Reviews, 2023, 156: 105390. |
[29] | LI T, ZUO R G, ZHAO X F, et al. Mapping prospectivity for regolith-hosted REE deposits via convolutional neural network with generative adversarial network augmented data[J]. Ore Geology Reviews, 2022, 142: 104693. |
[30] | LI Q K, CHEN G X, LUO L. Mineral prospectivity mapping using attention-based convolutional neural network[J]. Ore Geology Reviews, 2023, 156: 105381. |
[31] | LI C, XIAO K Y, SUN L, et al. CNN-Transformers for mineral prospectivity mapping in the Maodeng-Baiyinchagan area, Southern Great Xing’an Range[J]. Ore Geology Reviews, 2024, 167: 106007. |
[32] | YANG F F, WANG Z Y, ZUO R G, et al. Quantification of uncertainty associated with evidence layers in mineral prospectivity mapping using direct sampling and convolutional neural network[J]. Natural Resources Research, 2023, 32(1): 79-98. |
[33] |
HE H J, ZHU H L, YANG X K, et al. Mineral prospectivity prediction based on convolutional neural network and ensemble learning[J]. Scientific reports, 2024, 14(1): 22654.
DOI PMID |
[34] | LIU Z K, YU S Y, DENG H, et al. 3D mineral prospectivity modeling in the Sanshandao goldfield, China using the convolutional neural network with attention mechanism[J]. Ore Geology Reviews, 2024, 164: 105861. |
[35] | XU K, ZHAO S Y, WU C L, et al. Manganese mineral prospectivity based on deep convolutional neural networks in Songtao of northeastern Guizhou[J]. Earth Science Informatics, 2024, 17(2): 1681-1697. |
[36] | LUO Z J, ZUO R G, XIONG Y H, et al. Detection of geochemical anomalies related to mineralization using the GANomaly network[J]. Applied Geochemistry, 2021, 131: 105043. |
[37] | CHEN Q Y, CUI Z S, LIU G, et al. Deep convolutional generative adversarial networks for modeling complex hydrological structures in Monte-Carlo simulation[J]. Journal of Hydrology, 2022, 610: 127970. |
[38] | WU Y X, LIU B L, GAO Y X, et al. Mineral prospecting mapping with conditional generative adversarial network augmented data[J]. Ore Geology Reviews, 2023, 163: 105787. |
[39] | ZUO R G, LUO Z J, XIONG Y H, et al. A geologically constrained variational autoencoder for mineral prospectivity mapping[J]. Natural Resources Research, 2022, 31(3): 1121-1133. |
[40] | PARSA M, CARRANZA E J M. Modulating the impacts of stochastic uncertainties linked to deposit locations in data-driven predictive mapping of mineral prospectivity[J]. Natural Resources Research, 2021, 30: 3081-3097. |
[41] | 左仁广, 彭勇, 李童, 等. 基于深度学习的地质找矿大数据挖掘与集成的挑战[J]. 地球科学, 2021, 46(1): 350-358. |
[42] | 徐凯, 袁良军, 杨炳南, 等. 黔东北伴生-次生矿物遥感数据组合式挖掘与隐伏锰矿信息提取[J]. 地质科技通报, 2020, 39(4): 37-43. |
[43] | HE H B, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284. |
[44] | CHAWLA N V, BOWYER K W, HALL L O. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357. |
[45] | ALHUDHAIF A. A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic samplingapproach[J]. PeerJ Computer Science, 2021, 7: e523. |
[46] | DOUZAS G, BACAO F, LAST F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J]. Information Sciences, 2018, 465: 1-20. |
[47] | ANDRESINI G, APPICE A, DE ROSE L, et al. GAN augmentation to deal with imbalance in imaging-based intrusion detection[J]. Future Generation Computer Systems, 2021, 123: 108-127. |
[48] | HABIBI O, CHEMMAKHA M, LAZAAR M. Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection[J]. Engineering Applications of Artificial Intelligence, 2023, 118: 105669. |
[49] | 吴大文, 蔡京辰, 何良伦, 等. 黔西北猪拱塘铅锌矿床找矿新发现及找矿潜力分析[J]. 中国地质调查, 2023, 10(3): 25-33. |
[50] | 王明志, 韩润生, 周威, 等. 黔西北矿集区亮岩铅锌矿区成矿构造解析[J]. 地质力学学报, 2019, 25(2): 187-197. |
[51] | 何良伦, 赵锋, 柏光辉, 等. 贵州省猪拱塘超大型铅锌矿床的发现及其找矿意义[J]. 中国地质调查, 2019, 6(3): 29-36. |
[52] | 金中国, 黄智龙. 黔西北垭都-蟒硐断裂带铅锌成矿地质特征及找矿潜力分析[J]. 地质与勘探, 2009, 45(2): 20-26. |
[53] | YAN J Y, LÜ Q T, LUO F, et al. A gravity and magnetic study of lithospheric architecture and structures of South China with implications for the distribution of plutons and mineral systems of the main metallogenic belts[J]. Journal of Asian Earth Sciences, 2021, 221: 104938. |
[54] | CAERS J K, SCHEIDT C, YIN Z, et al. Efficacy of information in mineral exploration drilling[J]. Natural Resources Research, 2022, 31(3): 1157-1173. |
[55] | PROKHORENKOVA L, GUSEV G, VOROBEV A, et al. CatBoost: unbiased boosting with categorical features[J]. Advances in neural information processing systems, 2018, 31: 6639-6649. |
[56] | FUSHIKI T. Estimation of prediction error by using K-fold cross-validation[J]. Statistics and Computing, 2011, 21(2): 137-146. |
[1] | 邓军, 王长明, 李文昌, 杨立强, 王庆飞. 三江特提斯复合造山与成矿作用研究态势及启示[J]. 地学前缘, 20140101, 21(1): 52-64. |
[2] | 孔春芳, 田倩, 刘健, 蔡国荣, 赵杰, 徐凯. 基于集成学习模型与贝叶斯优化算法的成矿预测[J]. 地学前缘, 2025, 32(4): 122-139. |
[3] | 陈国雄, 张越鹏, 罗磊, 夏庆霖, 成秋明. 数据驱动斑岩型矿床时空预测模型[J]. 地学前缘, 2025, 32(4): 46-59. |
[4] | 黄继先, 李苇琪, 邓浩, 万世军, 李晓, 毛先成. 基于勘查大数据的控矿作用空间非平稳性定量研究:以三山岛金矿床为例[J]. 地学前缘, 2025, 32(4): 317-328. |
[5] | 张辉善, 宋玉财, 李文昌, 马中平, 张晶, 洪俊, 刘磊, 吕鹏瑞, 王志华, 张海迪, 杨博, Naghmah HAIDER, Yasir Shaheen KHALIL, Asad Ali NAREJO. 巴基斯坦铅、锌地球化学分布特征与成矿潜力及对特提斯带沉积岩容矿铅锌找矿勘查的启示[J]. 地学前缘, 2025, 32(1): 105-126. |
[6] | 吴发富, 赵凯, 宋松, 罗军强, 张辉善, 于文明, 刘江涛, 程湘, 刘浩, 曾雄伟, 何垚砚, 向鹏, 王建雄, 胡鹏. 摩洛哥大阿特拉斯构造带东段铅、锌地球化学分布与找矿远景区优选[J]. 地学前缘, 2025, 32(1): 162-182. |
[7] | 袁峰, 李晓晖, 田卫东, 周官群, 汪金菊, 葛粲, 国显正, 郑超杰. 三维成矿预测关键问题[J]. 地学前缘, 2024, 31(4): 119-128. |
[8] | 张前龙, 周永章, 郭兰萱, 原桂强, 虞鹏鹏, 王汉雨, 朱彪彪, 韩枫, 龙师尧. 找矿知识图谱的智能化应用:以钦杭成矿带斑岩铜矿为例[J]. 地学前缘, 2024, 31(4): 7-15. |
[9] | 谷浩, 杨泽强, 高猛, 唐相伟, 王东晓, 刘奎松, 杨树人, 郭跃闪, 王云, 王功文. 河南围山城金银矿集区三维地质建模与成矿预测[J]. 地学前缘, 2024, 31(3): 245-259. |
[10] | 陈欣, 王辉, 毛景文, 于淼, 乔建峰, 王治安. 东昆仑夏日哈木矿区热液型铅锌矿体成因及地质意义[J]. 地学前缘, 2023, 30(2): 347-369. |
[11] | 孔志岗, 张斌臣, 吴越, 张长青, 刘益, 张锋, 李杨林. 四川大梁子富锗铅锌矿床的控矿构造样式及成矿机制研究[J]. 地学前缘, 2022, 29(1): 143-159. |
[12] | 贾然, 王浩然, 王功文, 王皓, 许荣达, 冯占奎, 宋要武, 王肖凌, 庞宗. 河南栾川西沟铅锌银金矿床三维地质建模与深部找矿预测评价[J]. 地学前缘, 2021, 28(3): 156-169. |
[13] | 李楠, 曹瑞, 叶会寿, 李强, 王义天, 吕喜平, 郭娜, 苏元祥, 郝建瑞, 肖扬, 张帅, 楚文楷. 内蒙古浩尧尔忽洞金矿三维建模与深部成矿预测[J]. 地学前缘, 2021, 28(3): 170-189. |
[14] | 唐利, 张寿庭, 王亮, 裴秋明, 方乙, 曹华文, 邹灏, 尹少波. 浅覆盖区隐伏萤石矿找矿预测:以内蒙古赤峰俄力木台为例[J]. 地学前缘, 2021, 28(3): 208-220. |
[15] | 谢桂青, 毛景文, 张长青, 李伟, 宋世伟, 章荣清. 华南地区三叠纪矿床地质特征、成矿规律和矿床模型[J]. 地学前缘, 2021, 28(3): 252-270. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||