Metallogenic prediction of lead-zinc ore based on sample expansion in Yadu-Mangdong of Northwestern Guizhou

doi:10.13745/j.esf.sf.2025.4.55

Abstract

Abstract:

It has rich lead-zinc mineral resources in Northwest Guizhou. Due to the deep burial of ore bodies, it is difficult to prospecting. Data-driven mineral prospectivity prediction using machine learning (ML) is becoming a powerful tool for exploring deep hidden lead-zinc deposits. However, ML-based prospectivity prediction faces several common issues, particularly insufficient training samples and class imbalance caused by the scarcity of mineralized samples. To address these problems, this paper proposes a K-means clustering-improved conditional tabular generative adversarial network (KC-CTGAN) method for mineralized sample augmentation. Specifically, the density is first judged according to the Euclidean distance between samples in each cluster after K-mean clustering, and expanding more samples in the sparse clusters to increase their density to realize the expansion of the mineralized sample set. Then, the adversarial network generates (GAN) generates new category labels with high abstraction and uses the new category labels for conditional generation, thus improving the quality of augmented samples. Finally, the augmented positive samples and randomly undersampled negative samples are used to construct a sufficiently large and balanced labeled datasets to train a Category Boosting (CatBoost) classifier, and establish a mineral prospectivity prediction model based on KC-CTGAN-CatBoost. The performance of the proposed model was verified by using comparative tests and such as accuracy, recall, precision, F1-score. Experimental results demonstrate that compared to the prediction model constructed without KC-CTGAN-based sample augmentation, the proposed model achieves improvements of 8.7%, 7.4%, 10.2%, and 8.8% in accuracy, recall, precision, and F1-score, respectively, proving the effectiveness of the KC-CTGAN augmentation method in enhancing the performance of the mineral prospectivity prediction model. The prediction results will provide more precise target areas for the exploration of deep-seated concealed lead-zinc ore bodies.

Key words: sample augmentation, conditional table generative adversarial network, lead-zinc ore, mineralization prediction

CLC Number:

XU Kai, XU Chengyang, WU Chonglong, CAI Jingyun, KONG Chunfang. Metallogenic prediction of lead-zinc ore based on sample expansion in Yadu-Mangdong of Northwestern Guizhou[J]. Earth Science Frontiers, 2025, 32(4): 95-107.

Figures/Tables 12

Fig.1 Geographical location map of the study area

Fig.2 Distribution pattern of mine drilling data

Fig.3 The framework of the KC-CTGAN

Fig.4 Flowchart of the KC-CTGAN-CatBoost

Fig.5 K-means clustering effect evaluation. a—Change plot of SSE; b—Change plot of SC.

Fig.6 Synthetic data indicators of the expanded number of mine samples

Table 1 Synthetic data indicators for different oversampling methods

	总体质量分数/%	列形状分数/%	列对趋势分数/%
SMOTE	85.38	83.68	84.12
ADASYN	69.75	75.52	61.97
TVAE	86.75	90.75	82.76
CTGAN	75.11	85.66	64.56
KC-CTGAN	87.89	91.3	84.49

Fig.7 The metallogenic prediction process based on KC-CTGAN-CatBoost

Table 2 The performance indicators of CatBoost based on different oversampling methods

	准确率	召回率	精度	F1-score
原始数据	0.809	0.822	0.795	0.808
SMOTE	0.889	0.878	0.879	0.878
ADASYN	0.895	0.894	0.897	0.895
TVAE	0.884	0.878	0.890	0.883
CTGAN	0.891	0.896	0.887	0.891
KC-CTGAN	0.896	0.896	0.897	0.896

Table 3 Comparison of performance of different classifiers before KC-CTGAN oversampling

	准确率	召回率	精度	F1-score
LR	0.767	0.767	0.772	0.769
SVM	0.818	0.828	0.821	0.819
MLP	0.77	0.77	0.776	0.772
RF	0.788	0.788	0.789	0.788
AdaBoost	0.776	0.763	0.761	0.781
CatBoost	0.809	0.822	0.795	0.808

Table 4 Comparison of performance of different classifiers after KC-CTGAN oversampling

	准确率	召回率	精度	F1-score
LR	0.788	0.788	0.793	0.790
SVM	0.762	0.762	0.774	0.767
MLP	0.813	0.813	0.815	0.813
RF	0.847	0.847	0.849	0.847
AdaBoost	0.839	0.828	0.790	0.808
CatBoost	0.896	0.896	0.897	0.896

Fig.8 Distribution of metallogenic potential in Yadu-Mangdong area

References 56

[1]	吴冲龙, 刘刚. 大数据与地质学的未来发展[J]. 地质通报, 2019, 38(7): 1081-1088.
[2]	吴冲龙. 大数据和地质信息学能促进地质学定量化进入新阶段吗?[J]. 地球科学, 2022, 47(10): 3913-3914.
[3]	BERGEN K J, JOHNSON P A, MAARTEN V, et al. Machine learning for data-driven discovery in solid earth geoscience[J]. Science, 2019, 363(6433): eaau0323.
[4]	周永章, 肖凡. 管窥人工智能与大数据地球科学研究新进展[J]. 地学前缘, 2024, 31(4): 1-6. DOI
[5]	成秋明. 什么是数学地球科学及其前沿领域?[J]. 地学前缘, 2021, 28(3): 6-25. DOI
[6]	MAEPA F, SMITH R S, TESSEMA A. Support vector machine and artificial neural network modelling of orogenic gold prospectivity mapping in the Swayze greenstone belt, Ontario, Canada[J]. Ore Geology Reviews, 2021, 130: 103968.
[7]	DAYA SAGAR B S, CHENG Q M, MCKINLEY J, et al. Encyclopedia of mathematical geosciences[M]. Berlin: Springer, 2020.
[8]	吴冲龙, 刘刚, 张夏林, 等. 地质科学大数据及其利用的若干问题探讨[J]. 科学通报, 2016, 61(16): 1797-1807.
[9]	吴冲龙, 周琦, 徐凯, 等. 用于大数据预测的大塘坡式锰矿找矿过程复盘研究[J]. 贵州地质, 2022, 39(3): 189-204.
[10]	吴冲龙, 刘刚, 周琦, 等. 地质科学大数据统合应用的基本问题[J]. 地质科技通报, 2020, 39(4): 1-11.
[11]	周永章, 左仁广, 刘刚, 等. 数学地球科学跨越发展的十年: 大数据、人工智能算法正在改变地质学[J]. 矿物岩石地球化学通报, 2021, 40(3): 556-573, 777.
[12]	XIONG Y, ZUO R. GIS-based rare events logistic regression for mineral prospectivity mapping[J]. Computers & Geosciences, 2018, 111: 18-25.
[13]	XIAO F, CHEN W, WANG J, et al. A hybrid logistic regression: gene expression programming model and its application to mineral prospectivity mapping[J]. Natural Resources Research, 2022, 31(4): 2041-2064.
[14]	YANG N, ZHANG Z K, YANG J H, et al. Mineral prospectivity prediction by integration of convolutional autoencoder network and random forest[J]. Natural Resources Research, 2022, 31(3): 1103-1119.
[15]	RODRIGUEZ-GALIANO V, SANCHEZ-CASTILLO M, CHICA-OLMO M, et al. Machine learning predictive models for mineral prospectivity: an evaluation of neural networks, random forest, regression trees and support vector machines[J]. Ore Geology Reviews, 2015, 71: 804-818.
[16]	SUN T, LI H, WU K X, et al. Data-driven predictive modelling of mineral prospectivity using machine learning and deep learning methods: a case study from Southern Jiangxi Province, China[J]. Minerals, 2020, 10(2): 102.
[17]	ZUO R G, CARRANZA E J M. Support vector machine: a tool for mapping mineral prospectivity[J]. Computers & Geosciences, 2011, 37(12): 1967-1975.
[18]	SENANAYAKE I P, KIEM A S, HANCOCK G R, et al. A spatial data-driven approach for mineral prospectivity mapping[J]. Remote Sensing, 2023, 15(16): 4074.
[19]	YIN J N, LI N. Ensemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping[J]. Ore Geology Reviews, 2022, 145: 104916.
[20]	CHEN G X, CHENG Q M, PUETZ S. Data-driven discovery in geosciences: opportunities and challenges[J]. Mathematical Geosciences, 2023, 55(3): 287-293.
[21]	CHEN M M, XIAO F. Projection pursuit random forest for mineral prospectivity mapping[J]. Mathematical Geosciences, 2023, 55(7): 963-987.
[22]	SKABAR A. Mineral potential mapping using Bayesian learning for multilayerperceptrons[J]. Mathematical Geology, 2007, 39(5): 439-451.
[23]	LIN N, CHEN Y L, LIU H Q, et al. A comparative study of machine learning models with hyperparameter optimization algorithm for mapping mineral prospectivity[J]. Minerals, 2021, 11(2):159.
[24]	CHEN Y L, WU W. Mapping mineral prospectivity using an extreme learning machine regression[J]. Ore Geology Reviews, 2017, 80: 200-213.
[25]	HAJIHOSSEINLOU M, MAGHSOUDI A, GHEZELBASH R. Stacking: a novel data-driven ensemble machine learning strategy for prediction and mapping of Pb-Zn prospectivity in Varcheh district, west Iran[J]. Expert Systems with Applications, 2024, 237: 121668.
[26]	BRANDMEIER M, CABRERA ZAMORA I G, NYKÄNEN V, et al. Boosting for mineral prospectivity modeling: a new GIS toolbox[J]. Natural Resources Research, 2020, 29: 71-88.
[27]	ZHAO J, CHI H Q, SHAO Y Q, et al. Application of AdaBoost Algorithms in Fe mineral prospectivity prediction: a case study in Hongyuntan-Chilongfeng Mineral district, Xinjiang Province, China[J]. Natural Resources Research, 2022, 31: 2001-2022.
[28]	FAN M J, XIAO K Y, SUN L, et al. Metallogenic prediction based on geological-model driven and data-driven multisource information fusion: a case study of gold deposits in Xiong’ershan area, Henan Province, China[J]. Ore Geology Reviews, 2023, 156: 105390.
[29]	LI T, ZUO R G, ZHAO X F, et al. Mapping prospectivity for regolith-hosted REE deposits via convolutional neural network with generative adversarial network augmented data[J]. Ore Geology Reviews, 2022, 142: 104693.
[30]	LI Q K, CHEN G X, LUO L. Mineral prospectivity mapping using attention-based convolutional neural network[J]. Ore Geology Reviews, 2023, 156: 105381.
[31]	LI C, XIAO K Y, SUN L, et al. CNN-Transformers for mineral prospectivity mapping in the Maodeng-Baiyinchagan area, Southern Great Xing’an Range[J]. Ore Geology Reviews, 2024, 167: 106007.
[32]	YANG F F, WANG Z Y, ZUO R G, et al. Quantification of uncertainty associated with evidence layers in mineral prospectivity mapping using direct sampling and convolutional neural network[J]. Natural Resources Research, 2023, 32(1): 79-98.
[33]	HE H J, ZHU H L, YANG X K, et al. Mineral prospectivity prediction based on convolutional neural network and ensemble learning[J]. Scientific reports, 2024, 14(1): 22654. DOI PMID
[34]	LIU Z K, YU S Y, DENG H, et al. 3D mineral prospectivity modeling in the Sanshandao goldfield, China using the convolutional neural network with attention mechanism[J]. Ore Geology Reviews, 2024, 164: 105861.
[35]	XU K, ZHAO S Y, WU C L, et al. Manganese mineral prospectivity based on deep convolutional neural networks in Songtao of northeastern Guizhou[J]. Earth Science Informatics, 2024, 17(2): 1681-1697.
[36]	LUO Z J, ZUO R G, XIONG Y H, et al. Detection of geochemical anomalies related to mineralization using the GANomaly network[J]. Applied Geochemistry, 2021, 131: 105043.
[37]	CHEN Q Y, CUI Z S, LIU G, et al. Deep convolutional generative adversarial networks for modeling complex hydrological structures in Monte-Carlo simulation[J]. Journal of Hydrology, 2022, 610: 127970.
[38]	WU Y X, LIU B L, GAO Y X, et al. Mineral prospecting mapping with conditional generative adversarial network augmented data[J]. Ore Geology Reviews, 2023, 163: 105787.
[39]	ZUO R G, LUO Z J, XIONG Y H, et al. A geologically constrained variational autoencoder for mineral prospectivity mapping[J]. Natural Resources Research, 2022, 31(3): 1121-1133.
[40]	PARSA M, CARRANZA E J M. Modulating the impacts of stochastic uncertainties linked to deposit locations in data-driven predictive mapping of mineral prospectivity[J]. Natural Resources Research, 2021, 30: 3081-3097.
[41]	左仁广, 彭勇, 李童, 等. 基于深度学习的地质找矿大数据挖掘与集成的挑战[J]. 地球科学, 2021, 46(1): 350-358.
[42]	徐凯, 袁良军, 杨炳南, 等. 黔东北伴生-次生矿物遥感数据组合式挖掘与隐伏锰矿信息提取[J]. 地质科技通报, 2020, 39(4): 37-43.
[43]	HE H B, GARCIA E A. Learning from imbalanced data[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 21(9): 1263-1284.
[44]	CHAWLA N V, BOWYER K W, HALL L O. SMOTE: synthetic minority over-sampling technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
[45]	ALHUDHAIF A. A novel multi-class imbalanced EEG signals classification based on the adaptive synthetic samplingapproach[J]. PeerJ Computer Science, 2021, 7: e523.
[46]	DOUZAS G, BACAO F, LAST F. Improving imbalanced learning through a heuristic oversampling method based on k-means and SMOTE[J]. Information Sciences, 2018, 465: 1-20.
[47]	ANDRESINI G, APPICE A, DE ROSE L, et al. GAN augmentation to deal with imbalance in imaging-based intrusion detection[J]. Future Generation Computer Systems, 2021, 123: 108-127.
[48]	HABIBI O, CHEMMAKHA M, LAZAAR M. Imbalanced tabular data modelization using CTGAN and machine learning to improve IoT Botnet attacks detection[J]. Engineering Applications of Artificial Intelligence, 2023, 118: 105669.
[49]	吴大文, 蔡京辰, 何良伦, 等. 黔西北猪拱塘铅锌矿床找矿新发现及找矿潜力分析[J]. 中国地质调查, 2023, 10(3): 25-33.
[50]	王明志, 韩润生, 周威, 等. 黔西北矿集区亮岩铅锌矿区成矿构造解析[J]. 地质力学学报, 2019, 25(2): 187-197.
[51]	何良伦, 赵锋, 柏光辉, 等. 贵州省猪拱塘超大型铅锌矿床的发现及其找矿意义[J]. 中国地质调查, 2019, 6(3): 29-36.
[52]	金中国, 黄智龙. 黔西北垭都-蟒硐断裂带铅锌成矿地质特征及找矿潜力分析[J]. 地质与勘探, 2009, 45(2): 20-26.
[53]	YAN J Y, LÜ Q T, LUO F, et al. A gravity and magnetic study of lithospheric architecture and structures of South China with implications for the distribution of plutons and mineral systems of the main metallogenic belts[J]. Journal of Asian Earth Sciences, 2021, 221: 104938.
[54]	CAERS J K, SCHEIDT C, YIN Z, et al. Efficacy of information in mineral exploration drilling[J]. Natural Resources Research, 2022, 31(3): 1157-1173.
[55]	PROKHORENKOVA L, GUSEV G, VOROBEV A, et al. CatBoost: unbiased boosting with categorical features[J]. Advances in neural information processing systems, 2018, 31: 6639-6649.
[56]	FUSHIKI T. Estimation of prediction error by using K-fold cross-validation[J]. Statistics and Computing, 2011, 21(2): 137-146.