Linear Regression Approach to Solving Multicollinearity and Overfitting in Predictive Analysis
DOI:
https://doi.org/10.70882/josrar.2025.v2i1.35Keywords:
Linear Regression, Multicollinearity, Overfitting, Predictive Analysis, Exploratory Data AnalysisAbstract
Multicollinearity and overfitting are ubiquitous problems in predictive analysis, especially in linear regression models, which significantly hinder the precision and interpretability of predicted results providing critical insights for data-driven decision-making in diverse industries. This research examines a linear regression approach to address the dual challenges of multicollinearity and overfitting in predictive analysis. The dataset, sourced from the National Center for Disease Control (NCDC), was analyzed using multiple regression techniques, including Linear Regression, Ridge Regression, LASSO Regression, and Elastic Net Regression. The study aimed to assess and compare the efficacy of these methods in mitigating multicollinearity (measured by Variance Inflation Factor) and reducing overfitting through Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) metrics. Data was analyzed both with all features and after applying feature selection. Results demonstrated that while all models effectively addressed multicollinearity and overfitting, Elastic Net Regression exhibited superior performance, offering the best generalization capabilities with minimal MSE and RMSE discrepancies between internal and external data. These findings highlight the potential of advanced regularization techniques in improving predictive accuracy and interpretability, particularly in high-dimensional data contexts such as those involving COVID-19 outcomes. The study underscores the importance of further research into enhanced machine learning techniques and the inclusion of broader datasets to refine predictive models for practical decision-making across sectors.
References
Abdulmumini A. K., Obunadike G.N., & Jiya E. A. (2022). Predictive Model For Child Delivery. Fudma Journal of Sciences, 6(1), 141 - 145. https://doi.org/10.33003/fjs-2022-0601-885
Belsley, D.A., (1991) Conditioning diagnostics: Collinearity and weak data in regression, John Wiley & Sons, Inc., New York.
Chakraborty, M., Shakir Mahmud, M., Gates, T. J., & Sinha, S. (2023). Analysis and prediction of human mobility in the United States during the early stages of the COVID-19 pandemic using regularized linear models. Transportation research record, 2677(4), 380-395.
Chan, J.Y.-L.; Leow, S.M.H.; Bea, K.T.; Cheng,W.K.; Phoong, S.W.; Hong, Z.-W.; Chen, Y.-L. (2022) Mitigating the Multicollinearity Problem and Its Machine Learning Approach: A Review. Mathematics, 10, 1283. https://doi.org/10.3390/math10081283
Fox, J. (2015). Applied regression analysis and generalized linear models. Sage publications.
Herawati, N., Nisa, K., Setiawan, E., & Nusyirwan, T. (2018). Regularized Multiple Regression Methods to Deal with Severe Multicollinearity. International Journal of Statistics and Applications, 8(4), 167-172. https://doi.org/10.5923/j.statistics.20180804.02
Iliyasu U., Obunadike G.N., & Jiya E. A. (2023) Rainfall Prediction Models for Katsina State, Nigeria: Machine Learning Approach. International Journal of Science for Global Sustainability, Vol. 9 No2, pp 151 – 157. DOI: https://doi.org/10.57233/ijsgs.v9i2.473
Khan, M.A., Raza, A., Awais, M., & Iqbal, M. (2022) A survey of machine learning-based methods for covid-19medical image analysis. Medical &Biological Engineering & Computing, 60(1), 1-21. https://doi.org/10.1007/s11517-021-02525-2
Kumar, A., Jain, M., Gupta, A., Chaudhary,P., & Gupta R. (2022) development of machine learning model to Predict COVID-19 mortality : Application of Ensemble Model and Regarding Feature Impacts, PMC. https://doi.org/10.1007/s11517-022-02387-6
Mason, C.H.; Perreault, W.D., Jr. (1991) Collinearity, power, and interpretation of multiple regression analysis. J. Mark. Res. 28, 268–280.
Noora, S. (2020). Detecting Multicollinearity in Regression Analysis. American Journal of Applied Mathematics and Statistics, 8(2), 39-42. https://doi.org/10.12691/ajams-8-2-1
Ogundokun, R. O., Lukman, A. F., Kibria, G. B., Awotunde, J. B., & Aladeitan, B. B. (2020). Predictive modelling of COVID-19 confirmed cases in Nigeria. Infectious Disease Modelling, 5, 543-548
Olutunde, T., Ani, C. L., & Adesue, G. A. (2024). Leveraging Machine Learning for Personalized Dietary Recommendations, Nutritional Patterns, and Health Outcome Predictions. Journal of Science Research and Reviews, 1(2), 43-56. https://doi.org/10.70882/josrar.2024.v1i2.40
Shuaibu N., Obunadike G. N., & Jamilu B. A. (2024). Crop Yield Prediction Using Selected Machine Learning Algorithms. FUDMA Journal of Sciences, 8(1), 61 - 68. https://doi.org/10.33003/fjs-2024-0801-2220
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Edeh John Otse, Georgina N. Obunadike, Ahmad Abubakar (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
- Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- NonCommercial — You may not use the material for commercial purposes.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.