摘要
空气质量监测对于污染评估、危害降低和环保治理等具有重要的指导意义。然而,由于空气质量监测站点的数量十分有限,且空气质量随位置的变化是非线性的,因此空气质量空间估计(即估计任意无空气质量监测站点位置的空气质量)是一项具有挑战性的工作。目前最先进的空气质量空间估计方法考虑了交通、人流、POI等因素,并基于机器学习技术建立估计模型。然而,这些方法仍存在如下不足:1)由于考虑的因素主要反映城区的特性,因此只能局限在城区范围内使用;2)直接使用从各类因素中提取的特征建立模型,没有对特征进行更深层次的提炼。针对上述问题,提出了一种基于地形因素的空气质量空间估计方法。在该方法中,首先建立地形数据库并提取地形特征,然后基于集成决策树模型对地形特征进行深层转换,最后基于因子分解机建立回归模型。基于真实数据的实验表明,该方法对估计自然地形(如高原、森林、水域等)区域中的空气质量有明显的优势。
Monitoring air quality is important for pollution evaluation,harm reduction and environmental protection.However,since the number of air quality monitoring stations is extremely limited and air quality varies non-linearly with the change of location,spatial estimation of air quality(i.e.,estimating the air quality of any location without an air quality monitor station)is a challenging task.Currently,the most state-of-the-art spatial estimation methods for air quality take the factors such as traffic flow,human mobility and POI into account,and build estimation models based on machine learning.However,there are still some limitations in these methods.On one hand,the considered factors mainly reflect the characteristics of urban area,so these methods are constrained to be used in urban area.On the other hand,these methods train models based on features directly extracted from the factors without refinement.Aiming at these problems,this paper proposed a spatial estimation method of air quality based on terrain factors.First,terrain database is established and terrain features are extracted.Then,the original terrain features are deeply converted based on an ensemble decision tree model.Finally,a regression model is trained based on factorization machine.The experiments on real datasets suggest that the proposed method has advantage in terms of estimating the air quality over the areas with natural terrain(e.g.,highland,forest,water).
引文
[1]HUANG J,ZHOU C,LEE X,et al.The effects of rapid urbanization on the levels in tropospheric nitrogen dioxide and ozone over East China[J].Atmospheric Environment,2013,77(1):558-567.
[2]WATSON D F,PHILIP G M.A refinement of inverse distance weighted interpolation[J].Geoprocessing,1985,2(4):315-327.
[3]SHAD R,MESGARI M S,ABKAR A,et al.Predicting air pollution using fuzzy genetic linear membership kriging in GIS[J].Computers,Environment and Urban Systems,2009,33(6):472-481.
[4]ZHENG Y,LIU F,HSIEH H P.U-Air:When urban air quality inference meets big data[C]∥Proceedings of ACM Conference on Knowledge Discovery and Data Mining.2013:1436-1444.
[5]CHEN L,CAI Y,DING Y,et al.Spatially fine-grained urban air quality estimation using ensemble semi-supervised learning and pruning[C]∥Proceedings of the ACM Joint Conference on Pervasive and Ubiquitous Computing.2016:1076-1087.
[6]HOEK G,BEELEN R,DE HOOGH D,et al.A review of landuse regression models to assess spatial variation of outdoor air pollution[J].Atmospheric Environment,2008,42(1):7561-7578.
[7]HO C C,CHAN C C,CHO C W,et al.Land use regression modeling with vertical distribution measurements for fine particulate matter and elements in an urban area[J].Atmospheric Environment,2015,104(1):256-263.
[8]JUTZELER A,LI J J,FALTINGS B.A region-based model for estimating urban air pollution[C]∥Proceedings of AAAI Conference on Artificial Intelligence.2014:424-430.
[9]HASENFRATZ D,SAUKH O,WALSER C,et al.Pushing the spatio-temporal resolution limit of urban air pollution maps[C]∥Proceedings of IEEE Conference on Pervasive Computing and Communications.2014:69-77.
[10]KIM D,STOCKWELL W R.An online coupled meteorological and air quality modeling study of the effect of complex terrain on the regional transport and transformation of air pollutants over the Western United States[J].Atmospheric Environment,2008,42(17):4006-4021.
[11]HE X,PAN J,JIN O,et al.Practical lessons from predicting clicks on ads at Facebook[C]∥Proceedings of ACM Conference on Knowledge Discovery and Data Mining.2014:1-9.
[12]LI N,YU Y,ZHOU Z H.Diversity regularized ensemble pruning[C]∥Proceedings of Joint European Conference on Machine Learning and Knowledge Discovery in Databases.2012:330-345.
[13]FRIEDMAN J H.Greedy function approximation:A gradient boosting machine[J].Annals of Statistics,1999,29(1):1189-1232.
[14]RENDLE S.Factorization machines[C]∥Proceedings of IEEEInternational Conference on Data Mining.2010:995-1000.
[15]RENDLE S.Factorization machines with libFM[J].ACMTransactions on Intelligent Systems and Technology,2012,3(3):1-22.