skip to main content

Machine Learning Models Based on Random Forest Feature Selection and Bayesian Optimization for Predicting Daily Global Solar Radiation

1Moulay Ismail University, Faculty of Science, Department of Physics, Team of Renewable energy and energy efficiency, BP 11201, Zitoune, Meknes, Morocco

2Moulay Ismail University, Faculty of Science and Technique, Mining, Water and Environmental Engineering Laboratory, BP 509, Boutalamine, Errachidia, Morocco

3Moulay Ismail University, ENSAM, Laboratory of Mathematical and Computational Modeling, Marjane II, BP 15290, Al Mansour, 50000, Meknes, Morocco

4 Moulay Ismail University, Faculty of Science, Department of Geology, Laboratory of Water Sciences and environmental engineering, BP 11201, Zitoune, Meknes, Morocco

View all affiliations
Received: 16 Sep 2021; Revised: 15 Nov 2021; Accepted: 28 Nov 2021; Available online: 6 Dec 2021; Published: 1 Feb 2022.
Editor(s): H Hadiyanto
Open Access Copyright (c) 2022 The Authors. Published by Centre of Biomass and Renewable Energy (CBIORE)
Creative Commons License This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Citation Format:
Prediction of daily global solar radiation  with simple and highly accurate models would be beneficial for solar energy conversion systems. In this paper, we proposed a hybrid machine learning methodology integrating two feature selection methods and a Bayesian optimization algorithm to predict H in the city of Fez, Morocco. First, we identified the most significant predictors using two Random Forest methods of feature importance: Mean Decrease in Impurity (MDI) and Mean Decrease in Accuracy (MDA). Then, based on the feature selection results, ten models were developed and compared: (1) five standalone machine learning (ML) models including Classification and Regression Trees (CART), Random Forests (RF), Bagged Trees Regression (BTR), Support Vector Regression (SVR), and Multi-Layer Perceptron (MLP); and (2) the same models tuned by the Bayesian optimization (BO) algorithm: CART-BO, RF-BO, BTR-BO, SVR-BO, and MLP-BO. Both MDI and MDA techniques revealed that extraterrestrial solar radiation and sunshine duration fraction were the most influential features. The BO approach improved the predictive accuracy of MLP, CART, SVR, and BTR models and prevented the CART model from overfitting. The best improvements were obtained using the MLP model, where RMSE and MAE were reduced by 17.6% and 17.2%, respectively. Among the studied models, the SVR-BO algorithm provided the best trade-off between prediction accuracy (RMSE=0.4473kWh/m²/day, MAE=0.3381kWh/m²/day, and R²=0.9465), stability (with a 0.0033kWh/m²/day increase in RMSE), and computational cost.
Fulltext View|Download
Keywords: Feature selection; Mean Decrease in Accuracy; Mean Decrease in Impurity; Bayesian optimization; Solar radiation

Article Metrics:

  1. Ağbulut, Ü., Gürel, A. E., & Biçen, Y. (2021). Prediction of daily global solar radiation using different machine learning algorithms: Evaluation and comparison. Renewable and Sustainable Energy Reviews, 135, 110114;
  2. Ahmad, M. W., Mourshed, M., & Rezgui, Y. (2018). Tree-based ensemble methods for predicting PV power generation and their comparison with support vector regression. Energy, 164, 465–474; doi: org/10.1016/
  3. Almaraashi, M. (2018). Investigating the impact of feature selection on the prediction of solar radiation in different locations in Saudi Arabia. Applied Soft Computing, 66, 250–263; doi: org/10.1016/j.asoc.2018.02.029
  4. Alsamamra, H., Ruiz-Arias, J. A., Pozo-Vázquez, D., & Tovar-Pescador, J. (2009). A comparative study of ordinary and residual kriging techniques for mapping global solar radiation over southern Spain. Agricultural and Forest Meteorology, 149(8), 1343–1357; doi: org/10.1016/j.agrformet.2009.03.005
  5. Alsina, E. F., Bortolini, M., Gamberi, M., & Regattieri, A. (2016). Artificial neural network optimisation for monthly average daily global solar radiation prediction. Energy Conversion and Management, 120, 320–329; doi: org/10.1016/j.enconman.2016.04.101
  6. Antonopoulos, V. Z., Papamichail, D. M., Aschonitis, V. G., & Antonopoulos, A. V. (2019). Solar radiation estimation methods using ANN and empirical models. Computers and Electronics in Agriculture, 160, 160–167;
  7. Bamehr, S., & Sabetghadam, S. (2021). Estimation of global solar radiation data based on satellite-derived atmospheric parameters over the urban area of Mashhad, Iran. Environmental Science and Pollution Research, 28(6), 7167–7179; doi: org/10.1007/s11356-020-11003-8
  8. Benali, L., Notton, G., Fouilloy, A., Voyant, C., & Dizene, R. (2019). Solar radiation forecasting using artificial neural network and random forest methods: Application to normal beam, horizontal diffuse and global components. Renewable Energy, 132, 871–884; doi: org/10.1016/j.renene.2018.08.044
  9. Bounoua, Z., Chahidi, L. O., & Mechaqrane, A. (2021). Estimation of daily global solar radiation using empirical and machine-learning methods: A case study of five Moroccan locations. Sustainable Materials and Technologies, 28, e00261;
  10. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140; doi: org/10.1007/BF00058655
  11. Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32; doi: org/10.1023/A:1010933404324
  12. Breiman, L. (2017). Classification and regression trees. Routledge
  13. Calandra, R., Seyfarth, A., Peters, J., & Deisenroth, M. P. (2016). Bayesian optimization for learning gaits under uncertainty. Annals of Mathematics and Artificial Intelligence, 76(1–2), 5–23; doi: org/10.1007/s10472-015-9463-9
  14. Chen, J.-L., Li, G.-S., & Wu, S.-J. (2013). Assessing the potential of support vector machine for estimating daily solar radiation using sunshine duration. Energy Conversion and Management, 75, 311–318; doi: org/10.1016/j.enconman.2013.06.034
  15. Cheng, H., Ding, X., Zhou, W., & Ding, R. (2019). A hybrid electricity price forecasting model with Bayesian optimization for German energy exchange. International Journal of Electrical Power & Energy Systems, 110, 653–666; doi: org/10.1016/j.ijepes.2019.03.056
  16. Dee, D. P., Uppala, S. M., Simmons, A. J., Berrisford, P., Poli, P., Kobayashi, S., Andrae, U., Balmaseda, M. A., Balsamo, G., & Bauer, d P. (2011). The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Quarterly Journal of the Royal Meteorological Society, 137(656), 553–597; doi: org/10.1002/qj.828
  17. Dhamala, J., Bajracharya, P., Arevalo, H. J., Sapp, J. L., Horácek, B. M., Wu, K. C., Trayanova, N. A., & Wang, L. (2020). Embedding high-dimensional Bayesian optimization via generative modeling: Parameter personalization of cardiac electrophysiological models. Medical Image Analysis, 62, 101670; doi: org/10.1016/
  18. El Mghouchi, Y., Chham, E., Zemmouri, E. M., & El Bouardi, A. (2019). Assessment of different combinations of meteorological parameters for predicting daily global solar radiation using artificial neural networks. Building and Environment, 149, 607–622; org/10.1016/j.buildenv.2018.12.055
  19. Fan, J., Wang, X., Wu, L., Zhou, H., Zhang, F., Yu, X., Lu, X., & Xiang, Y. (2018). Comparison of Support Vector Machine and Extreme Gradient Boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: A case study in China. Energy Conversion and Management, 164, 102–111; doi: org/10.1016/j.enconman.2018.02.087
  20. Fan, J., Wu, L., Zhang, F., Cai, H., Zeng, W., Wang, X., & Zou, H. (2019a). Empirical and machine learning models for predicting daily global solar radiation from sunshine duration: A review and case study in China. Renewable and Sustainable Energy Reviews, 100, 186–212; doi: org/10.1016/j.rser.2018.10.018
  21. Feng, Y., Hao, W., Li, H., Cui, N., Gong, D., & Gao, L. (2020). Machine learning models to quantify and map daily global solar radiation and photovoltaic power. Renewable and Sustainable Energy Reviews, 118, 109393; doi: org/10.1016/j.rser.2019.109393
  22. Fisher, A., Rudin, C., & Dominici, F. (2019). All Models are Wrong, but Many are Useful: Learning a Variable’s Importance by Studying an Entire Class of Prediction Models Simultaneously. Journal of Machine Learning Research, 20(177), 1–81
  23. Gelaro, R., McCarty, W., Suárez, M. J., Todling, R., Molod, A., Takacs, L., Randles, C. A., Darmenov, A., Bosilovich, M. G., & Reichle, R. (2017). The modern-era retrospective analysis for research and applications, version 2 (MERRA-2). Journal of Climate, 30(14), 5419–5454; doi: org/10.1175/JCLI-D-16-0758.1
  24. Halawa, E., GhaffarianHoseini, A., & Li, D. H. W. (2014). Empirical correlations as a means for estimating monthly average daily global radiation: A critical overview. Renewable Energy, 72, 149–153; doi: org/10.1016/j.renene.2014.07.004
  25. Hassan, M. A., Khalil, A., Kaseb, S., & Kassem, M. A. (2017). Exploring the potential of tree-based ensemble methods in solar radiation modeling. Applied Energy, 203, 897–916;
  26. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. Springer Science & Business Media;
  27. Ibrahim, I. A., & Khatib, T. (2017). A novel hybrid model for hourly global solar radiation prediction using random forests technique and firefly algorithm. Energy Conversion and Management, 138, 413–425; doi: org/10.1016/j.enconman.2017.02.006
  28. Joy, T. T., Rana, S., Gupta, S., & Venkatesh, S. (2019). A flexible transfer learning framework for Bayesian optimization with convergence guarantee. Expert Systems with Applications, 115, 656–672; doi: org/10.1016/j.eswa.2018.08.023
  29. Kalogirou, S. A. (2013). Solar energy engineering: Processes and systems. Academic Press
  30. Kumar, R., Aggarwal, R. K., & Sharma, J. D. (2015). Comparison of regression and artificial neural network models for estimation of global solar radiations. Renewable and Sustainable Energy Reviews, 52, 1294–1299; doi: org/10.1016/j.rser.2015.08.021
  31. Lahouar, A., & Slama, J. B. H. (2015). Day-ahead load forecast using random forest and expert input selection. Energy Conversion and Management, 103, 1040–1051; doi: org/10.1016/j.enconman.2015.07.041
  32. Li, Y., Zou, C., Berecibar, M., Nanini-Maury, E., Chan, J. C.-W., van den Bossche, P., Van Mierlo, J., & Omar, N. (2018). Random forest regression for online capacity estimation of lithium-ion batteries. Applied Energy, 232, 197–210; doi: org/10.1016/j.apenergy.2018.09.182
  33. Lotfinejad, M. M., Hafezi, R., Khanali, M., Hosseini, S. S., Mehrpooya, M., & Shamshirband, S. (2018). A comparative assessment of predicting daily solar radiation using bat neural network (BNN), generalized regression neural network (GRNN), and neuro-fuzzy (NF) system: A case study. Energies, 11(5), 1188; doi: org/10.3390/en11051188
  34. Marzouq, M., Bounoua, Z., El Fadili, H., Mechaqrane, A., Zenkouar, K., & Lakhliai, Z. (2019). New daily global solar irradiation estimation model based on automatic selection of input parameters using evolutionary artificial neural networks. Journal of Cleaner Production, 209, 1105–1118; org/10.1016/j.jclepro.2018.10.254
  35. Merrouni, A. A., Elalaoui, F. E., Mezrhab, A., Mezrhab, A., & Ghennioui, A. (2018). Large scale PV sites selection by combining GIS and Analytical Hierarchy Process. Case study: Eastern Morocco. Renewable Energy, 119, 863–873; doi: org/10.1016/j.renene.2017.10.044
  36. Molnar, C. (2020). Interpretable Machine Learning. Lulu. com
  37. Moreno, A., Gilabert, M. A., & Martínez, B. (2011). Mapping daily global solar irradiation over Spain: A comparative study of selected approaches. Solar Energy, 85(9), 2072–2084; org/10.1016/j.solener.2011.05.017
  38. Olatomiwa, L., Mekhilef, S., Shamshirband, S., & Petković, D. (2015). Adaptive neuro-fuzzy approach for solar radiation prediction in Nigeria. Renewable and Sustainable Energy Reviews, 51, 1784–1791; doi: org/10.1016/j.rser.2015.05.068
  39. Paulescu, M., Stefu, N., Calinoiu, D., Paulescu, E., Pop, N., Boata, R., & Mares, O. (2016). AAngström–Prescott equation: Physical basis, empirical models and sensitivity analysis. Renewable and Sustainable Energy Reviews, 62, 495–506; doi: org/10.1016/j.rser.2016.04.012
  40. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., & Cournapeau, D. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825–2830
  41. Piri, J., Shamshirband, S., Petković, D., Tong, C. W., & ur Rehman, M. H. (2015). Prediction of the solar radiation on the earth using support vector regression technique. Infrared Physics & Technology, 68, 179–185; doi: org/10.1016/j.infrared.2014.12.006
  42. Quej, V. H., Almorox, J., Arnaldo, J. A., & Saito, L. (2017a). ANFIS, SVM and ANN soft-computing techniques to estimate daily global solar radiation in a warm sub-humid environment. Journal of Atmospheric and Solar-Terrestrial Physics, 155, 62–70; doi: org/10.1016/j.jastp.2017.02.002
  43. Rohani, A., Taki, M., & Abdollahpour, M. (2018). A novel soft computing model (Gaussian process regression with K-fold cross validation) for daily and monthly solar radiation forecasting (Part: I). Renewable Energy, 115, 411–422; doi: org/10.1016/j.renene.2017.08.061
  44. Ruiz-Arias, J. A., Pozo-Vázquez, D., Santos-Alamillos, F. J., Lara-Fanego, V., & Tovar-Pescador, J. (2011). A topographic geostatistical approach for mapping monthly mean values of daily global solar radiation: A case study in southern Spain. Agricultural and Forest Meteorology, 151(12), 1812–1822; doi: org/10.1016/j.agrformet.2011.07.021
  45. Sameen, M. I., Pradhan, B., & Lee, S. (2020). Application of convolutional neural networks featuring Bayesian optimization for landslide susceptibility assessment. Catena, 186, 104249; doi: org/10.1016/j.catena.2019.104249
  46. Scornet, E. (2020). Trees, forests, and impurity-based variable importance. ArXiv Preprint ArXiv:2001.04295
  47. Shamshirband, S., Mohammadi, K., Yee, L., Petković, D., & Mostafaeipour, A. (2015). A comparative evaluation for identifying the suitability of extreme learning machine to predict horizontal global solar radiation. Renewable and Sustainable Energy Reviews, 52, 1031–1042; doi: org/10.1016/j.rser.2015.07.173
  48. Shi, J. Q., & Choi, T. (2011). Gaussian process regression analysis for functional data. Chapman and Hall/CRC
  49. Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical bayesian optimization of machine learning algorithms. Advances in Neural Information Processing Systems, 2951–2959
  50. Sun, H., Gui, D., Yan, B., Liu, Y., Liao, W., Zhu, Y., Lu, C., & Zhao, N. (2016). Assessing the potential of random forest method for estimating solar radiation using air pollution index. Energy Conversion and Management, 119, 121–129; doi: org/10.1016/j.enconman.2016.04.051
  51. Tao, H., Ewees, A. A., Al-Sulttani, A. O., Beyaztas, U., Hameed, M. M., Salih, S. Q., Armanuos, A. M., Al-Ansari, N., Voyant, C., & Shahid, S. (2021). Global solar radiation prediction over North Dakota using air temperature: Development of novel hybrid intelligence model. Energy Reports, 7, 136–157; doi: org/10.1016/j.egyr.2020.11.033
  52. Vapnik, V. (2013). The nature of statistical learning theory. Springer science & business media
  53. Wang, L., Kisi, O., Zounemat-Kermani, M., Zhu, Z., Gong, W., Niu, Z., Liu, H., & Liu, Z. (2017). Prediction of solar radiation in China using different adaptive neuro-fuzzy methods and M5 model tree. International Journal of Climatology, 37(3), 1141–1155;
  54. Wang, R., Lu, S., & Li, Q. (2019). Multi-criteria comprehensive study on predictive algorithm of hourly heating energy consumption for residential buildings. Sustainable Cities and Society, 49, 101623; doi: org/10.1016/j.scs.2019.101623
  55. Wang, Y., Kandeal, A. W., Swidan, A., Sharshir, S. W., Abdelaziz, G. B., Halim, M. A., Kabeel, A. E., & Yang, N. (2021). Prediction of tubular solar still performance by machine learning integrated with Bayesian optimization algorithm. Applied Thermal Engineering, 184, 116233; doi: org/10.1016/j.applthermaleng.2020.116233
  56. Wang, Z., Wang, Y., & Srinivasan, R. S. (2018). A novel ensemble learning approach to support building energy use prediction. Energy and Buildings, 159, 109–122; doi: org/10.1016/j.enbuild.2017.10.085
  57. World Energy Outlook 2020 – Analysis. (n.d.). IEA. Retrieved 14 January 2021, from
  58. Wu, J., Chen, X.-Y., Zhang, H., Xiong, L.-D., Lei, H., & Deng, S.-H. (2019). Hyperparameter optimization for machine learning models based on Bayesian optimization. Journal of Electronic Science and Technology, 17(1), 26–40; doi: org/10.11989/JEST.1674-862X.80904120
  59. Yadav, A. K., Malik, H., & Chandel, S. S. (2014). Selection of most relevant input parameters using WEKA for artificial neural network based solar radiation prediction models. Renewable and Sustainable Energy Reviews, 31, 509–519;
  60. Zeng, Z., Wang, Z., Gui, K., Yan, X., Gao, M., Luo, M., Geng, H., Liao, T., Li, X., & An, J. (2020). Daily Global Solar Radiation in China Estimated From High-Density Meteorological Observations: A Random Forest Model Framework. Earth and Space Science, 7(2), e2019EA001058; doi: org/10.1029/2019EA001058
  61. Zhang, Q., Hu, W., Liu, Z., & Tan, J. (2020). TBM performance prediction with Bayesian optimization and automated machine learning. Tunnelling and Underground Space Technology, 103, 103493; doi: org/10.1016/
  62. Zhang, Z., Wang, G., Liu, C., Cheng, L., & Sha, D. (2021). Bagging-based positive-unlabeled learning algorithm with Bayesian hyperparameter optimization for three-dimensional mineral potential mapping. Computers & Geosciences, 104817; doi: org/10.1016/j.cageo.2021.104817
  63. Zhou, Z.-H. (2012). Ensemble methods: Foundations and algorithms. Chapman and Hall/CRC

Last update:

No citation recorded.

Last update:

No citation recorded.