Linear Regression Approach to Solving Multicollinearity and Overfitting in Predictive Analysis

Authors

  • Edeh John Otse Department of Computer Science, Federal University Dustin-ma Katsina State Author
  • Georgina N. Obunadike Federal University Dustin-ma Katsina State Author
  • Ahmad Abubakar Department of Software Engineering, Faculty of Computing, FUDMA Author

DOI:

https://doi.org/10.70882/josrar.2025.v2i1.35

Keywords:

Linear Regression, Multicollinearity, Overfitting, Predictive Analysis, Exploratory Data Analysis

Abstract

Multicollinearity and overfitting are ubiquitous problems in predictive analysis, especially in linear regression models, which significantly hinder the precision and interpretability of predicted results providing critical insights for data-driven decision-making in diverse industries. This research examines a linear regression approach to address the dual challenges of multicollinearity and overfitting in predictive analysis. The dataset, sourced from the National Center for Disease Control (NCDC), was analyzed using multiple regression techniques, including Linear Regression, Ridge Regression, LASSO Regression, and Elastic Net Regression. The study aimed to assess and compare the efficacy of these methods in mitigating multicollinearity (measured by Variance Inflation Factor) and reducing overfitting through Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) metrics. Data was analyzed both with all features and after applying feature selection. Results demonstrated that while all models effectively addressed multicollinearity and overfitting, Elastic Net Regression exhibited superior performance, offering the best generalization capabilities with minimal MSE and RMSE discrepancies between internal and external data. These findings highlight the potential of advanced regularization techniques in improving predictive accuracy and interpretability, particularly in high-dimensional data contexts such as those involving COVID-19 outcomes. The study underscores the importance of further research into enhanced machine learning techniques and the inclusion of broader datasets to refine predictive models for practical decision-making across sectors.

References

Abdulmumini A. K., Obunadike G.N., & Jiya E. A. (2022). Predictive Model For Child Delivery. Fudma Journal of Sciences, 6(1), 141 - 145. https://doi.org/10.33003/fjs-2022-0601-885

Belsley, D.A., (1991) Conditioning diagnostics: Collinearity and weak data in regression, John Wiley & Sons, Inc., New York.

Chakraborty, M., Shakir Mahmud, M., Gates, T. J., & Sinha, S. (2023). Analysis and prediction of human mobility in the United States during the early stages of the COVID-19 pandemic using regularized linear models. Transportation research record, 2677(4), 380-395.

Chan, J.Y.-L.; Leow, S.M.H.; Bea, K.T.; Cheng,W.K.; Phoong, S.W.; Hong, Z.-W.; Chen, Y.-L. (2022) Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics, 10, 1283. https://doi.org/10.3390/math10081283

Fox, J. (2015). Applied regression analysis and generalized linear models. Sage publications.

Herawati, N., Nisa, K., Setiawan, E., & Nusyirwan, T. (2018). Regularized Multiple Regression Methods to Deal with Severe Multicollinearity. International Journal of Statistics and Applications, 8(4), 167-172. https://doi.org/10.5923/j.statistics.20180804.02

Iliyasu U., Obunadike G.N., & Jiya E. A. (2023) Rainfall Prediction Models for Katsina State, Nigeria: Machine Learning Approach. International Journal of Science for Global Sustainability, Vol. 9 No2, pp 151 – 157. DOI: https://doi.org/10.57233/ijsgs.v9i2.473

Khan, M.A., Raza, A., Awais, M., & Iqbal, M. (2022) A survey of machine learning-based methods for covid-19medical image analysis. Medical &Biological Engineering & Computing, 60(1), 1-21. https://doi.org/10.1007/s11517-021-02525-2

Kumar, A., Jain, M., Gupta, A., Chaudhary,P., & Gupta R. (2022) development of machine learning model to Predict COVID-19 mortality : Application of Ensemble Model and Regarding Feature Impacts, PMC. https://doi.org/10.1007/s11517-022-02387-6

Mason, C.H.; Perreault, W.D., Jr. (1991) Collinearity, power, and interpretation of multiple regression analysis. J. Mark. Res. 28, 268–280.

Noora, S. (2020). Detecting Multicollinearity in Regression Analysis. American Journal of Applied Mathematics and Statistics, 8(2), 39-42. https://doi.org/10.12691/ajams-8-2-1

Ogundokun, R. O., Lukman, A. F., Kibria, G. B., Awotunde, J. B., & Aladeitan, B. B. (2020). Predictive modelling of COVID-19 confirmed cases in Nigeria. Infectious Disease Modelling, 5, 543-548

Olutunde, T., Ani, C. L., & Adesue, G. A. (2024). Leveraging Machine Learning for Personalized Dietary Recommendations, Nutritional Patterns, and Health Outcome Predictions. Journal of Science Research and Reviews, 1(2), 43-56. https://doi.org/10.70882/josrar.2024.v1i2.40

Shuaibu N., Obunadike G. N., & Jamilu B. A. (2024). Crop Yield Prediction Using Selected Machine Learning Algorithms. FUDMA Journal of Sciences, 8(1), 61 - 68. https://doi.org/10.33003/fjs-2024-0801-2220

Downloads

Published

2025-03-14

How to Cite

Otse, E. J., Obunadike, G. N., & Abubakar , A. . (2025). Linear Regression Approach to Solving Multicollinearity and Overfitting in Predictive Analysis. Journal of Science Research and Reviews, 2(1), 108-117. https://doi.org/10.70882/josrar.2025.v2i1.35