用户名: 密码: 验证码:
硬盘故障预测模型在大型数据中心环境下的验证
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Hard Disk Failure Prediction Model Validation in Large Data Center Environment
  • 作者:贾宇晗 ; 李静 ; 贾润莹 ; 李忠伟 ; 王刚 ; 刘晓光 ; 肖康
  • 英文作者:Jia Yuhan;Li Jing;Jia Runying;Li Zhongwei;Wang Gang;Liu Xiaoguang;Xiao Kang;College of Computer and Control Engineering,Nankai University;Beijing Qihoo Technology Co.Ltd;College of Software,Nankai University;
  • 关键词:硬盘故障预测 ; 决策树 ; BP神经网络 ; SMART ; 大型数据中心
  • 英文关键词:hard drive failure prediction;;decision tree;;BP neural network;;self-monitoring,analysis and reporting technology(SMART);;big data center
  • 中文刊名:JFYZ
  • 英文刊名:Journal of Computer Research and Development
  • 机构:南开大学计算机与控制工程学院;北京奇虎科技有限公司;南开大学软件学院;
  • 出版日期:2015-12-15
  • 出版单位:计算机研究与发展
  • 年:2015
  • 期:v.52
  • 基金:国家自然科学基金项目(61373018;11301288);; 教育部新世纪优秀人才支持计划基金项目(NCET130301);; 中央高校基础科研费基金项目(65141021)
  • 语种:中文;
  • 页:JFYZ2015S2009
  • 页数:8
  • CN:S2
  • ISSN:11-1777/TP
  • 分类号:61-68
摘要
随着互联网的发展、存储规模的骤增,大型数据中心硬盘频繁损坏导致的数据丢失给企业带来的损失已成为不可忽视的重大问题.以往基于硬盘SMART(self-monitoring,analysis and reporting technology)属性建立的包括应用统计学和机器学习等方法在内的各种硬盘故障预测模型,虽然取得了较好的效果,但其数据采集及处理等方面均存在不足之处.基于某真实的互联网大型数据中心环境,提取SMART属性数据,并提出了一种基于神经网络权值矩阵的方法,结合Rank Sum秩和检验、RAT反向安排测试、Z-Score评分3种无参统计学方法,对属性进行选择,应用CART决策树及BP神经网络2种机器学习方法,建立硬盘故障预测模型.实验表明描述的2种硬盘故障预测模型均具有很好的性能,这是机器学习算法在实际应用场景下很好的实践.此外,通过实验以及对实验的分析和解释,得出一些有益的结论,这为下一步的研究工作奠定了基础.
        With the surge in the development of the Internet and the scale of storage,frequent damage of large data center disk resulting in data missing and bringing great loss to enterprises has become a major problem that cannot be ignored.Past research build all kinds of hard disk failure prediction models by means of statistics or machine learning based on SMART(self-monitoring,analysis and reporting technology),although it has obtained good performance,its data acquisition and processing exist shortcomings.Based on a large real Internet data center environment,this paper extracts the SMART attribute data and proposes an attribute selection method based on neural network weight matrix,combining with three kinds of non-parametric statistical methods(Rank Sum test,RAT reverse arrangement test,Z-Score)to select useful attributes for building hard disk failure prediction model base on two kinds of machine learning methods(CART decision tree and BP neural network).Experimental results show that the two kinds of hard disk failure prediction models obtain very good performance,which is a very good practice of the machine learning algorithm in actual practical application scenarios.In addition,this paper draws some useful conclusions through experiments as well as the analysis and interpretation of the experiments,which lays the foundation for further research.
引文
[1]Schroeder B,Gibson G A.Disk failures in the real world:What does an MTTF of 1,000,000hours mean to you///Proc of the 5th USENIX Conf on File and Storage Technologies(FAST).Berkeley,CA:USENIX Assocication,2007:7-1-7-16
    [2]Bairavasundaram L N,Goodson G R,Pasupathy S,et al.An analysis of latent sector errors in disk drives//Proc of the Int Conf on Measurements and Modeling of Computer Systems.New York:ACM,2007:289-300
    [3]Pinheiro E,Weber W D,Barroso L A.Failure trends in a large disk drive population//Proc of the 5th USENIX Conf on File and Storage Technologies(FAST).Berkeley,CA:USENIX Assocication,2007:17-29
    [4]Murray J F,Hughes G F,Kreutz-Delgado K.Machinelearning methods for predicting failures in hard drives:Amultiple-instance application.Journal of Machine Learning Research,2005,6(5):783-816
    [5]Hughes G F,Murray J F,Kreutz-Delgado K,et al.Improved disk-drive failure warnings.IEEE Trans on Reliability,2002,51(3):350-357
    [6]Hamerly G,Elkan C.Bayesian approaches to failure prediction for disk drives//Proc of the 18th Int Conf on Machine Learning.San Francisco,CA:ICML,2001:202-209
    [7]Murray J F,Hughes G F,Kreutz-Delgado K.Hard drive failure prediction using non-parametric statistical methods//Proc of the Int Conf on Artificial Neural Networks(ICANN)/ICONIP 2003.Berlin:Springer,2003
    [8]Zhao Y,Liu X,Gan S,et al.Predicting disk failures with HMM-and HSMM-based approaches//Proc of the 10th Industrial Conf on Advances in Data Mining:Applications and Theoretical Aspects.Berlin:Springer,2010:390-404
    [9]Zhu B,Wang G,Liu X,et al.Proactive drive failure prediction for large scale storage systems//Proc of the 29th IEEE Conf on Massive Storage Systems and Technologies(MSST).Piscataway,NJ:IEEE,2013:1-5
    [10]Wang Y,Miao Q,Pecht M.Health monitoring of hard disk drive based on mahalanobis distance//Proc of Conf in Prognostics and System Health Management Conf(PHM2011).Piscataway,NJ:IEEE,2011:1-8
    [11]Wang Y,Miao Q,Ma E W,et al.Online anomaly detection for hard disk drives based on mahalanobis distance.IEEE Trans on Reliability,2013,62(1):136-145
    [12]Li J,Ji X,Jia Y,et al.Hard drive failure prediction using classification and regression trees//Proc of the 44th Annual IEEE/IFIP Int Conf on Dependable Systems and Networks(DSN).Los Alamitos,CA:IEEE Computer Society,2014:383-394
    [13]Allen B.Monitoring hard disks with SMART.Linux Journal,2004,2004(117):74-77
    [14]Strom B D,Lee S C,Tyndall G W,et al.Hard disk drive reliability modeling and failure prediction.IEEE Trans on Magnetics,2007,43(9):3676-3684
    [15]Ma A,Douglis F,Lu G,et al.RAIDShield:Characterizing,monitoring,and proactively protecting against disk failures//Proc of the 13th USENIX Conf on File and Storage Technologies(FAST'15).Berkeley,CA:USENIX Association,2015:16-19
    [16]Williams G,Use R.Data Mining with Rattle and R:the Art of Excavating Data for Knowledge Discovery.Berlin:Springer,2011

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700