Proceedings of International Conference on Applied Innovation in IT
2020/03/10, Volume 8, Issue 1, pp.55-61
Prediction of Air Pollution Concentration Using Weather Data and Regression Models
Aleksandar Trenchevski, Marija Kalendar, Hristijan Gjoreski, Danijela Efnusheva
Abstract: Air pollution is becoming a global environmental problem, in both developed and developing countries. It has greatly impacted the health and lives of millions of people, thus increasing mortality rates and pollution induced diseases reports. This paper proposes machine learning methods for predicting the rates of possibly increased air pollution in several areas, by processing the gathered data from multiple weather and air quality meter stations. The data has been gathered over a period of several years including air quality and pollution data and weather data including temperature, humidity and wind characteristics. The development process included feature extraction, feature selection for removing redundancy, and finally training multiple regression models and hyperparameter optimization. Pollutants and air quality index (AQI) were used as target variables, and appropriate regression models were trained. The performed experiments show that XGBoost is the most accurate, achieving MAE of 8.9 for Center, 8.9 for Karpos and 7.3 for Kumanovo municipality for the PM10 pollutant. The improvements over the baseline, Dummy regressor are significant, reducing the MAE for 12 on average.
Keywords: Air Pollution, Feature Selection, Machine Learning, Prediction, Regression Models
- Awad M., Khanna R., Support Vector Regression. In: Efficient Learning Machines. Apress, Berkeley, CA, 2015.
- Tuysuzoglu, G.; Birant, D.; Pala, A. Majority Voting Based Multi-Task Clustering of Air Quality Monitoring Network in Turkey. Appl. Sci., vol. 9, 2019, p. 1610.
- Xu, X.; Ren, W. Prediction of Air Pollution Concentration Based on mRMR and Echo State Network, Appl. Sci., vol. 9, 2019, p.1811.
- H. Liu, Q. Li, D. Yu, Yu Gu, Air Quality Index andAir Pollutant Concentration Prediction Based onMachine Learning Algorithms, Appl. Sci., vol. 9,2019, p. 4069; doi:10.3390/app9194069.
- Backward Stepwise Regression. [Online] Available:http://www.analystsoft.com/en/products/statplus/content/help/analysis_regression_backward_stepwise_elimination_regression_model.html (28.12.2019).
- Decision Trees in Python with Scikit-Learn.[Online] Available: https://stackabuse.com/decision-trees-in-python-with-scikit-learn/ (28.12.2019).
- Linear Regression using Python. [Online] Avail-able: https://medium.com/analytics-vidhya/linear-
- regression-using-python-ce21aa90ade6? (28.12.2019).
- Random Forest Regression model explained in depth.[Online] Available: https://gdcoder.com/random-forest-regressor-explained-in-depth/ (28.12.2019).
- Support Vector Regression Or SVR. [Online]Available: https://medium.com/coinmonks/support-vector-regression-or-svr-8eb3acf6d0ff (28.12.2019).
- A Step by Step Regression Tree Example.[Online] Available: https://sefiks.com/2018/08/28/a-step-by-step-regression-decision-tree-example/(28.12.2019).
- S. Cai, Y. Wang, B. Zhao, S. Wang, X. Chang andJ. Hao, "The impact of the "air pollutionprevention and control action plan" on PM2.5concentrations in Jing-Jin-Ji region during2012-2020. Sci. Total Environ. 2017, 580,pp.197–209.
- L. Li, J.H. Zhang, W.Y. Qiu, J. Wang and Y. Fang,An Ensemble Spatiotemporal Model forPredicting PM2.5Concentrations. Int. J. Environ.Res. Public Health, vol. 14, 2017, p. 549.
- P. Pérez, A. Trier and J. Reyes, Prediction ofPM2.5 concentrations several hours in advanceusing neural networks in Santiago, Chile. Atmos.Environ. 2000, 34, pp.1189-1196.
- G. Corani, Air quality prediction in Milan:Feed-forward neural networks, pruned neuralnetworks and lazy learning. Ecol. Model. 2005, 185,513–529.
- F. Biancofiore, M. Busilacchio, M. Verdecchia,B.Tomassetti, E. Aruo, S. Bianco, S. DiTommaso, C. Colangeli, G. Rosatelli and P. DiCarlo, Recursive neural network model foranalysis and forecast of PM10 and PM2.5. Atmos.Pollut. Res. 2017, 8, pp.652-659.
- G.W. Fuller, D.C. Carslaw and H.W. Lodge, "Anempirical approach for the prediction of daily meanPM10 concentrations". Atmos. Environ. 2002,36, pp.1431-1441.
- S. Zhu, X. Lian, H. Liu, J. Hu, Y. Wang, and J. Che,"Daily air quality index forecasting with hybridmodels", A case in China. Environ. Pollut. 2017,231, pp.1232-1244.
- P. Ilijevski, Gj. Smilevski, Predicting Air Pollution inSkopje, Project work for the course Data Warehousesand Data Processing.
- AirCare, Air Quality Visualized. [Online] Available:https://getaircare.com/.
- Pollution measurement data dumps, AirCare,December 2019. [Online] Available:https://github.com/jovanovski/MojVozduhExports.
- Dark Sky API, Weather Data on the Web.,December 2019. [Online] Available: https://darksky.net/dev.