用户名: 密码: 验证码:
大数据背景下采用互信息与随机森林算法的空气质量预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:AIR QUALITY FORECASTING WITH MUTUAL INFORMATION AND RANDOM FORESTS BASED ON BIG
  • 作者:杨正理 ; 史文 ; 陈海霞 ; 王长鹏
  • 英文作者:YANG Zheng-li;SHI Wen;CHEN Hai-xia;WANG Chang-peng;School of Mechanical and Electrical Engineering,Sanjiang University;
  • 关键词:大数据 ; 关联因素辨识 ; 互信息 ; 随机森林 ; 空气质量预测 ; 云计算
  • 英文关键词:big data;;correlation factor identification;;mutual information;;random forests;;air quality forecast;;cloud computing
  • 中文刊名:环境工程
  • 英文刊名:Environmental Engineering
  • 机构:三江学院机械与电气工程学院;
  • 出版日期:2019-03-15
  • 出版单位:环境工程
  • 年:2019
  • 期:03
  • 基金:江苏省高校自然科学研究面上项目(17KJB470011)
  • 语种:中文;
  • 页:183-188
  • 页数:6
  • CN:11-2097/X
  • ISSN:1000-8942
  • 分类号:X51
摘要
为了实现城市空气质量的精准预测,针对与城市空气质量预测相关的大数据种类多、规模大、维度高和生成速度快等特点,在研究城市不同区域空气质量评价指标的基础上,提出不同区域空气质量子空间聚类分析方法,挖掘不同区域空气质量的特征。通过对不同区域进行群体划分,并利用互信息矩阵从城市功能、地形、气象条件等方面辨识与不同区域空气质量相关联的因素,构建基于随机森林算法的城市空气质量预测模型。该方法可以有效识别城市不同区域空气质量的强关联因素,避免由于关联因素的差异性对空气质量预测造成的不利影响。仿真结果表明:该方法适用于大数据的分析与处理,并具有较高的预测精度。
        In order to forecast city air quality accurately,taking into the related features of big data account,including numerous varieties,great scale,high-dimension and high velocity,based on city air quality evaluation indexes of different regions,the subspace clustering analysis method of different regional air quality was put forward to investigate the characteristics of different regional air quality. Through grouping of different regions,the mutual information matrix was used to identify factors related to different regional air quality from the aspects of city functions,terrain and weather conditions,etc. in order to establish the city air quality forecast model based on random forest algorithm. The method could effectively identify the strong correlation factors of city air quality in different regions,and avoid the adverse effect on air quality forecasting due to the difference of correlation factors. The simulation results showed that this method was suitable for the analysis and processing of big data,and had high prediction accuracy.
引文
[1] Chen Y L,Wang L Z,Li F Y,et al. Air quality data clustering using EPLS method[J]. Information Fusion,2016,(36):225-232.
    [2]胡世前,姜倩雯,凌冰,等.基于改进支持向量机的空气质量监测预警模型[J].江苏大学学报(自然科学版),2016,37(4):491-496.
    [3]刘燕,张永平,朱成,等.基于大数据和物联网的空气质量预测监测研究[J].通信学报,2017,38(增刊2):129-138.
    [4] Miranda A I,Ferreira J,Silveira C,et al. Teixeira A costefficiency and health benefit approach to improve urban air quality[J]. Science of The Total Environment,2016,(569/570):342-351.
    [5] Sun Y,Fang J,Han Y. A distributed real-time storage method for stream data[C]∥WISA 2013:Proceedings of the 10th Conference on Web Information System and Application.Washington,DC:IEEE Computer Society,2013:314-317.
    [6]孙暠,宁平,史建武,等.基于改进时间序列统计模型的空气质量预报[J].昆明理工大学学报(自然科学版),2017,42(1):91-97.
    [7] Ghahabi O,Hernando J. Deep belief networks for i-vector based speaker recognition[C]∥Proceedings of 2014 International Conference on Acoustics,Speech and Signal Processing. Florence:IEEE,2014:1700-1704.
    [8]潘本锋,宫正宇,王帅,等.环境空气质量指数在应用中存在的问题及建议[J].中国环境监测,2015,31(1):64-67.
    [9] Xu Y Z,Yang W D,Wang J Z. Air quality early-warning system for cities in China[J]. Atmospheric Environment,2017,148:239-257.
    [10]杨瑞君,赵楠,凡耀峰,等.基于随机森林模型的城市空气质量评价[J].计算机工程与设计,2017,38(11):3151-3156.
    [11]吴辰文,梁靖涵,王伟,等.基于递归特征消除方法的随机森林算法[J].统计与决策,2017,(21):60-63.
    [12]杜续,冯景瑜,吕少卿,等.基于随机森林回归分析的PM2. 5浓度预测模型[J].电信科学,2017,33(7):66-75.
    [13]史静,朱虹.基于随机森林的天气场景判别算法[J].微型机与应用,2017,36(24):51-53.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700