An Improved Light GBM using Bayesian Optimization for Vulnerability Exploitation Prediction

Authors

  • Boryanka T. Mashi Federal University Dutsin-ma Author
  • Ibrahim S. Ahmad Bayero University Kano Author
  • Habeebah A. Kakudi Bayero University Kano Author
  • Jesse J. Tanimu Bayero University Kano Author

DOI:

https://doi.org/10.70882/josrar.2024.v1i1.17

Keywords:

Exploitability, Prediction, Bayesian Optimization, Machine Learning, Light Gradient Boosting Machine

Abstract

Despite the significant advances in software security research, exploitability prediction remains elusive due to the uncertainty of which vulnerability to be prioritized. Though many studies have been done on vulnerability prediction, some problems still persist such as efficient parameter optimization, which has significant effect on the algorithm performance and efficiency. To address these challenges, we proposed an Improved Light Gradient Boosting Machine (LGBM) model using Bayesian Optimization (BO) Method. Three experiments were conducted to compare prediction accuracy and computational cost of time and memory on LGBM, LGBM with Grid Search and LGBM with Bayesian Optimization models. The results demonstrated that our Improved BO- LGBM model has better prediction accuracy and lower computational cost than the comparative models. BO-LIGHT GBM rendered AUC of 83% measuring the model performance, accuracy of 81%, while in terms of time and memory consumption has definitely taken the lead of 0.23 min executional time and 32MiB system memory. Our results suggest promising future applications of our improved BO_ LGBM model for the prediction of vulnerability exploitation, that could be relevant for IT organizations and vendors or any organization that has limited computational resources in its premises if employed. 

Author Biographies

  • Boryanka T. Mashi, Federal University Dutsin-ma

    Department Of Computer Science, MSc., Assistant Lecturer

  • Ibrahim S. Ahmad, Bayero University Kano

    Faculty of Computing, PhD.

  • Habeebah A. Kakudi, Bayero University Kano

    Faculty of Computing, PhD.

  • Jesse J. Tanimu, Bayero University Kano

    Faculty of Computing, MSc.

References

Abbadi, M. A., Bustanji, A. M., & Alkasassbeh, M. (2020). Robust Intelligent Malware Detection using Light GBM Algorithm. International Journal of Innovative Technology and Exploring Engineering, 9(6):1253-1260. DOI: 10.35940/ijitee.F4043.049620

Agarwal, V. (2015). Research on data preprocessing and categorization technique for smartphone review analysis. International Journal of Computer Applications, 131(4), 30-36. https://doi.org/10.5120/ijca2015907309.

Taha, A. A., & Malebary, S. J. (2020). An intelligent approach to credit card fraud detection using an optimized Light Gradient Boosting Machine. IEEE Access, 8(1), 25579–25587. https://doi.org/10.1109/ACCESS.2020.2971354

Betrò, B. (1991). Bayesian methods in global optimization. Journal of Global Optimization, 1(1), 1–14.

Bhatt, N., Adarsh, A., & Yadavalli, V. S. S. (2020). Exploitability prediction of software vulnerabilities. August, 1–16. https://doi.org/10.1002/qre.2754

B

ilge, L., & Dimitras, T. (2012). Before we knew it: an empirical study of zero-day attacks in the real world. In: Proceedings of the 2012 ACM Conference on Computer and CommunicationsSecurity, 833–844.

Bozorgi, M., Saul, L. K., Savage, S., & Voelker, G. M. (2010). Beyond Heuristics: Learning to Classify Vulnerabilities and Predict Exploits.

Bullough, B. L., Yanchenko, A. K., Smith, C. L., & Zipkin, J. R. (2017). Predicting exploitation of disclosed software vulnerabilities using open-source data. IWSPA 2017 - Proceedings of the 3rd ACM International Workshop on Security and Privacy Analytics, Co-Located with CODASPY 2017, 45–53. https://doi.org/10.1145/3041008.3041009

Chen, T., Guestrin, C., Ke, G., Meng, Q., & Finley, T. (2017). LightGBM: A highly efficient gradient boosting decision tree. Proceedings of the 31st International Conference on Neural Information Processing Systems, 3149–3157.

DeCastro-García, N., Muñoz Castañeda, Á. L., Escudero García, D., & Carriegos, M. V. (2019). Effect of the sampling of a dataset in the hyperparameter optimization phase over the efficiency of a machine learning algorithm. *Advances in Complex Systems and Their Applications to Cybersecurity*, 2019, Article 6278908. https://doi.org/10.1155/2019/6278908

Dewancker, I., McCourt, M., & Clark, S. (2016). Bayesian Optimization Primer. SigOpt.

Edkrantz, M., & Said, A. (2015). Predicting cyber vulnerability exploits with machine learning. In SCAI.

Ehrenfeld, J. M. (2017). Wannacry, cybersecurity and health information technology: A time to act. Journal of Medical Systems, 41(4), 104.

Elgeldawi, E., Sayed, A., Galal, A. R., & Zaki, A. M. (2021). Hyperparameter tunning for Machine Learning Algorithms Used for Arabic Sentiment Analysis. IIormatics, https://doi.org/10.33990/informatics 8040079, 8,79.

Fang, Y., Liu, Y., Huang, C., & Liu, L. (2020). FastEmbed: Predicting vulnerability exploitation possibility based on ensemble machine learning algorithm. PLoS ONE 15(2): e0228439. .0228439. PLOS ONE, 15(2), 1–28.

Feurer, M., & Hutter, F. (2019). Hyperparameter Optimization. In F. Hutter, L. Kotthoff, & J. Vanschoren (Eds.), Authomated Machine Learning Methods,Systems,Challenges (pp. 3–35). Springer International Publishing. https://doi.org/10.1007/978-3-030-05318-5_1

Flashpoint. (2021). Beyond CVE and NVD: The Full Vulnerability Intelligence Picture. Retrieved from https://flashpoint.io

Frei, S., May, M., Fiedler, U., & Plattner, B. (2006). Large-scale vulnerability analysis. In Proc. of LSAD’06 ACM, 131–138.

Hoque, M. S., Jamil, N., Amin, N., & Lam, K.-Y. (2021). An improved vulnerability exploitation prediction model with novel cost function and custom trained word vector embedding. Sensors, 21(12), Article 4220. https://doi.org/10.3390/s21124220

James, B., & Bengio, Y. (2012). Optimization, Random Search for Hyper-Parameter. Journal of Machine Learning Research, 13, 281–305.

Ju, Y. U. N., Sun, G., Chen, Q., & Zhang, M. I. N. (2019). A Model Combining Convolutional Neural Network and LightGBM Algorithm for Ultra-Short-Term Wind Power Forecasting. IEEE Access, 7, 28309–28318. https://doi.org/10.1109/ACCESS.2019.2901920

Khan, A., Ali, M., & Rahman, F. (2021). Application of LightGBM in security prediction models. International Journal of Cyber Security and Digital Forensics, 10(2), 123–130.

Luca, A., & Fabio, M. (2012). No Title A Preliminary Analysis of Vulnerability Scores for Attacks in Wild. ACM 978-1-4503-1661-3/12/10.

Mingzhu, T., Qi, Z., Steven, X. D., Huawei, W., Linlin, L., Wen, L., & Bin, H. (2020). An Improved LightGBM Algorithm for Online Fault Detection of Wind Turbine Gearboxes. Energies, 1–16. https://doi.org/doi:10.3390/en13040807

Mohammed, A., Eric, N., Krishna, D., Senguttuvan, M., Jana, S., & Paulo, S. (2017). Proactive Identification of Exploits in the Wild Through Vulnerability Mentions Online. International Conference on Cyber Conflict.

National Institute of Standards and Technology. (n.d.). National Vulnerability Database FAQ. NIST.gov. Retrieved from https://nvd.nist.gov/

Nazgol, T., Palash, G., Mohammed, A., Paulo, S., & Kristina, L. (2018). DarkEmbed: Exploit Prediction with Neural Language Models. The Thirtieth AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-18), 7849–7854.

Sabottke, C., Suciu, O., Dumitraş, T., Sabottke, C., & Dumitras, T. (2015). Vulnerability Disclosure in the Age of Social Media: Exploiting Twitter for Predicting Real-World Exploits This paper is included in the Proceedings of the Vulnerability Disclosure in the Age of Social Media.

Shahriari, B., Swersky, K., Wang, Z., Adams, R., & Freitas, N. de. (2016). Taking the Human Out of the Loop: A Review of Bayesian Optimization. IEEE Access, 104(1), 148–175. https://doi.org/10.1109/JPROC.2015.2494218

Singh, P., Kumar, V., & Mehta, R. (2023). LightGBM hyperparameter tuning for exploit detection in resource-constrained environments. Journal of Network and Computer Applications, 45(1), 25–33.

Suciu, O., Nelson, C., Lyu, Z., Bao, T., & Dumitras, T. (2022). Expected exploitability: Predicting the development of functional vulnerability exploits. In *Proceedings of the 31st USENIX Security Symposium* (pp. 377-394). USENIX Association. https://www.usenix.org/conference/usenixsecurity22/presentation/suciu20.

Wang, J. A., & Guo, M. (2009). OVM: An ontology for vulnerability management. In Proceedings of the 5th Annual Workshop on Cyber Security and Information Intelligence Research: Cyber Security and Information Intelligence Challenges and Strategies (pp. 34:1–34:4). ACM. https://doi.org/10.1145/1558607.1558646

Wang, Y., Chen, L., & Li, J. (2022). Optimizing LightGBM for cyber vulnerability prediction. Computers & Security, 40(3), 56–64.

Wang, Y., & Wang, T. (2020). Application of improved LightGBM model in blood glucose prediction. Applied Sciences, 10(9), 3227. https://doi.org/10.3390/app10093227

Downloads

Published

2024-11-22

How to Cite

Mashi, B. T., Ahmad, I. S., Kakudi, H. A., & Tanimu, J. J. (2024). An Improved Light GBM using Bayesian Optimization for Vulnerability Exploitation Prediction. Journal of Science Research and Reviews, 1(1), 49-62. https://doi.org/10.70882/josrar.2024.v1i1.17