用户名: 密码: 验证码:
面向网络的快速容错恢复技术
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Fast fault tolerant recovery technique for network
  • 作者:张志敏 ; 吴军 ; 严明玉
  • 英文作者:ZHANG Zhi-min;WU Jun;YAN Ming-yu;Institute of Computing Technology,Chinese Academy of Sciences;State Key Laboratory of Computer Architecture,Institute of Computing Technology,Chinese Academy of Sciences;Beijing Institute of Control Engineering;School of Computer and Control Engineering,University of Chinese Academy of Sciences;
  • 关键词:多机系统 ; 容错设计 ; 任务迁移 ; SpaceWire网络 ; VxWorks操作系统
  • 英文关键词:multi system;;fault-tolerant design;;task migration;;SpaceWire network;;VxWorks OS
  • 中文刊名:SJSJ
  • 英文刊名:Computer Engineering and Design
  • 机构:中国科学院计算技术研究所;中国科学院计算技术研究所计算机体系结构国家重点实验室;北京控制工程研究所;中国科学院大学计算机与控制学院;
  • 出版日期:2018-09-16
  • 出版单位:计算机工程与设计
  • 年:2018
  • 期:v.39;No.381
  • 语种:中文;
  • 页:SJSJ201809001
  • 页数:6
  • CN:09
  • ISSN:11-1775/TP
  • 分类号:9-14
摘要
多节点系统的容错设计面临较大挑战,降低容错成本和提高容错实时性成为主要研究目的。介绍面向SpaceWire网络的多节点容错恢复技术研究工作,引进任务动态迁移机制,从体系结构角度,研究检查点设置、故障发现、实时任务迁移、任务恢复等技术,突破检查点设置与卷回、任务动态恢复、消息重定向、任务迁移策略等关键技术。实测结果表明,从故障发现到任务迁移恢复平均时间为278ms,比传统任务时间提高了一个数量级,实现了一种多节点系统容错快速恢复技术途径。
        The fault-tolerant design of multi-node system faces great challenge,reducing fault-tolerant cost and improving faulttolerant real-time performance become the main research objectives.The research work of multi-node fault-tolerant recovery technology for SpaceWire network was introduced.Dynamic migration mechanism of tasks was introduced.According to the perspective of architecture,the key technologies such as checkpoint setting,fault detection,real-time task migration,task recovery,checkpoint settings and rollback,task dynamic recovery,message redirection and task migration policy were realized.Results show that the average time to recover from fault discovery using task migration is 278 ms,which is an order of magnitude higher than that of traditional method,and realizes a fast multi-node system fault-tolerant recovery technology.
引文
[1]ZHOU Yujie.Dual redundant cubic on-board computer based on high star[D].Najing:Nanjing University of Science and Technology,2016:11-19(in Chinese).[周宇杰.基于双模冗余的立方星高星载计算机设计[D].南京:南京理工大学,2016:11-19.]
    [2]LI Xiao.Design and implementation of cluster fault tolerant system[D].Dalian:Dalian University of Technology,2008:14-20(in Chinese).[李肖.机群容错系统的设计与实现[D].大连:大连理工大学,2008:14-20.]
    [3]OUYANG Yiming,SUN Chenglong,LI Jianhua,et al.Fault tolerant method for NoC link based on instantaneous fault and intermittent fault[J].Journal of Computer Research and Development,2017,54(5):1109-1120(in Chinese).[欧阳一鸣,孙成龙,李建华,等.针对瞬时故障和间歇性故障的NoC链路容错方法[J].计算机研究与发展,2017,54(5):1109-1120.]
    [4]LV Xun,JIANG Bin,CHEN Xin,et al.Study on fault tolerant architecture of unmanned aviate flight control computer[J].Systems Engineering and Electronic Technology,2016,38(11):2587-2597(in Chinese).[吕迅,姜斌,陈欣,等.无人机容错飞行控制计算机体系结构研究[J].系统工程与电子技术,2016,38(11):2587-2597.]
    [5]Ren Wei,Zhang Tao,Huang Zhen,et al.Real-time simulation system of satellite attitude reconfigurable control based on VxWorks[C]//Proceedings of IEEE Chinese Guidance,Navigation and Control Conference,2014:2577-2580.
    [6]WANG Ying,ZHOU Jiqin,ZHANG Weigong,et al.Redundant fault-tolerant computer structure based on dynamic reconfiguration bus[C]//Joint International Mechanical,Electronic and Information Technology Conference,2015:331-337.
    [7]Dejan S,Fred D,Yves P,et al.Process migration[J].ACM Computing Surveys,2000,32(3):241-299.
    [8]Eduardo W B,Daniel B,Wronski F,et al.Impact of task migration in NoC-based MPSoCs for soft real-time applications[C]//IFIP International Conference on Very Large Scale Integration,2007:296-299.
    [9]Goodarzi B,Sarbazi-Azad H.Task migration in mesh NoCs over virtual point-to-point connections[C]//19th Euromicro International Conference on Parallel,Distributed and NetworkBased Processing,2011:463-469.
    [10]Amirreza Zarrabi.Dynamic transparent general purpose process migration for Linux[J].International Journal of Grid Computing&Applications,2012,3(4):159-163.
    [11]Upadhyay A,Lakkadwala P.Secure live migration of VM’s in cloud computing:A survey[C]//3rd International Conference on Reliability,Infocom Technologies and Optimization,2014:1-4.
    [12]CHEN Binmei,XU Hong.Research and implementation of process migration adaptive dynamic load balancing algorithm[J].Journal of Chengdu University of Information Technology,2013,28(3):211-216(in Chinese).[陈彬玫,徐虹.进程迁移自适应动态负载平衡算法的研究与实现[J].成都信息工程学院学报,2013,28(3):211-216.]
    [13]LIU Tiantian,YANG Shengchun,OU Zhonghong,et al.Research and implementation of parallel process migration technology based on message passing[J].Computer Science,2009,36(4):166-168(in Chinese).[刘天田,杨升春,欧中红,等.基于消息传递并行进程迁移技术的研究与实现[J].计算机科学,2009,36(4):166-168.]
    [14]WANG Kehuai.System design of process migration based on Linux operating system[J].Intelligent Computer and its Application,2014,4(2):97-99(in Chinese).[王科怀.基于Linux操作系统进程迁移系统设计[J].智能计算机与应用,2014,4(2):97-99.]
    [15]WANG Liang,FU Fangfa,LIU Zhaochi,et al.Implementation of task migration in distributed multi-core system based on NoC[J].Computer Engineering,2014,40(5):289-294(in Chinese).[王良,付方发,刘钊池,等.基于NoC分布式多核系统中任务迁移的实现[J].计算机工程,2014,40(5):289-294.]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700