用户名: 密码: 验证码:
片上网络无缓冲路由器关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
微电子技术的迅猛发展推动了芯片设计进入多核时代,随着片上集成核数的不断增加,片上核间通信已成为多核片上系统(System-on-Chip,SoC)的性能瓶颈。片上网络(Network-on-Chip,NoC)的出现,替代了传统的总线和交叉开关互连结构,成为一种可扩展、高带宽的通信架构,有效解决了大规模多核SoC中的全局通信问题,提升了多核片上通信的性能。但是,随着集成度的不断提高,功耗和面积的日益增加成为制约多核SoC发展的重要因素,并且,特征尺寸的缩小、电源电压的降低以及时钟频率的提升严重影响NoC的可靠性。因此,研究高能效、低开销、高可靠的NoC对于大规模多核SoC的设计具有重要意义。
     无缓冲路由器为NoC提供了一种低开销的解决方案。在无缓冲路由器中,除了流水线寄存器外,不需要额外的缓冲器,在很大程度上降低了路由器的功耗与面积开销,简化了路由器的设计。现有的无缓冲路由器中串行化的交换分配器制约了其性能的提升,并且在无缓冲路由器中缺乏对可靠性设计的支持,难以在复杂环境下有效应对故障。为此,本文围绕无缓冲路由器微体系结构的性能优化和可靠性设计展开研究,主要工作体现在以下四个方面:
     1.偏转路由性能分析及基于置换网络的无缓冲路由器
     论文针对多种NoC拓扑结构设计偏转路由算法,并在多种合成通信模式下对偏转路由的性能进行分析和评估。评估结果表明,网络拓扑结构和通信模式对偏转路由算法的性能具有很大影响,设计师在设计无缓冲NoC时可以针对特定应用选择合适的拓扑结构。针对目前广泛使用的2D Mesh NoC,论文提出一种基于置换网络的单周期高性能无缓冲路由器(BLESS_PERM),采用一个简单的两级置换网络替代了原路由器中串行化的交换分配器以及交叉开关,有效缩短了关键路径的逻辑级数,简化了设计复杂度,提高了实现的时钟频率。模拟结果表明,合成通信模式下BLESS_PERM路由器的包平均延迟比VC、BLESS_BASE、BLESS_PL及CHIPPER路由器分别少70%,65%,56%和41%;真实应用通信模式下BLESS_PERM路由器的包平均延迟比VC、BLESS_BASE、BLESS_PL及CHIPPER路由器分别少80%,72%,66%和38%。
     2.无缓冲路由器容错体系结构
     针对无缓冲路由器的可靠性设计,论文提出了一套完整的容错体系结构,可以检测并处理链路中出现的瞬态故障与永久故障。该容错体系结构包括:
     一种基于分块SECDED编码的在线故障检测机制,能够有效检测并区分瞬态故障与永久故障,并且不干扰正常包的传输。
     一种自动请求重传(ARQ)与前向纠错(FEC)相混合的容错流控策略,在链路级处理包传输过程中出现的瞬态故障。
     两种容错偏转路由算法,在网络层绕开永久故障链路路由。邻近节点故障感知偏转路由算法(FoN)基于2跳步故障信息传递模型以及故障区域的形状进行路由选择,可以有效处理无连续凹点的凸形和凹形规则故障区域。基于强化学习的可重构容错偏转路由算法(FTDR)针对非规则故障区域,采用一种强化学习的方法对路由表进行重配置以实现容错。为了降低FTDR算法的实现开销,还提出了一种基于层次化路由表的算法FTDR-H。
     一种基于可配置双向链路的容错偏转路由器(BiFTDR),根据链路故障状态及到达包信息对相邻路由器之间的双向链路进行方向配置,不需要绕道路由即可处理单向故障链路。
     3.基于偏转路由的高性能可容错多播机制
     论文提出三种基于偏转路由的高性能多播机制(DRM)。DRM_noPR机制实现简单,多播包路由过程中根据最佳候选目标选择最佳路由方向,沿一条动态变化的路径路由到每一个目标。DRM_PR_src和DRM_PR_all机制根据路由器端口的忙闲状态在源节点或中间节点对多播包按一种区域划分规则进行复制,增加了多播路径的多样化,有效降低了多播延迟。此外,为了提高多播传输的可靠性,论文在三种DRM机制的基础上提出了容错DRM机制(FT_DRM)。FT_DRM采用基于强化学习的方法对路由表进行重配置,可以绕开永久故障链路进行多播路由并且不存在丢包。实验结果表明,无故障网络中DRM_PR_src机制的包平均延迟比DRM_noPR机制少18%;DRM_PR_all机制的包平均延迟比DRM_noPR和DRM_PR_src机制分别少40%和27%;在网络中存在5%及10%故障链路的情况下,DRM_PR_src机制的包平均延迟比DRM_noPR机制少17%;DRM_PR_all机制的包平均延迟比DRM_noPR机制少38%。
     4.面向三维片上网络的无缓冲路由器
     针对将无缓冲路由器由二维扩展到三维,串行化输出端口分配进一步导致路由器性能严重下降的问题,论文提出一种基于三级置换网络的单周期高性能三维无缓冲路由器(3D_PERM),采用一个三级置换网络替换串行化的交换分配器以及7×7交叉开关,在包交换的同时采用简单置换规则有效避免活锁,提高性能的同时降低了硬件实现开销。模拟结果表明,合成通信模式下3D_PERM路由器的包平均延迟比3D_BASE和3D_CHIPEER路由器分别小73%和14%;真实应用通信模式下3D_PERM路由器的包平均延迟比3D_BASE和3D_CHIPPER路由器分别小78%和14%。针对三维集成电路面临的TSV制造工艺低成品率问题,论文提出一种低开销容错偏转路由器(FTDR-3D_OPT)用于3D Mesh NoC。FTDR-3D_OPT使用一个层路由表和两个TSV状态向量代替全局路由表以避开水平故障链路和垂直故障链路路由实现容错。综合结果表明,与采用全局路由表的三维容错偏转路由器相比,FTDR-3D_OPT的面积和功耗分别降低40%和49%。
With the rapid development of microelectronic techniques, chip design enters intothemulticoreera. Duetotheincreasingnumberofcoresonasinglechip,communicationsbetween cores have become the performance bottleneck of the multicore System-on-Chip(SoC). Network-on-Chip (NoC) as an alternative to the classical bus or crossbar intercon-nection architecture has become a scalable and high-bandwidth communication paradig-m, which solves the global communication problem for the large scale multicore SoC andimproves the performance of the on-chip communication effectively. However, with theenhancementoftheintegrationdegree, powerconsumptionandareahavealreadybecomea limiting constraint in the design of multicore SoC. In addition, shrinking feature size,lower power voltage and higher frequency have a negative impact on the reliability ofNoC. Thus, energy-efficient, low-overhead and high reliable NoC is especially desirablefor the large scale multicore SoC.
     Bufferless router provides a low-overhead solution for NoC. In bufferless router, noadditional buffers are needed except the pipeline registers, which can reduce the powerconsumption and area overhead significantly and also simplify the design. The serializedswitch allocator in existing bufferless router limits the enhancement of the performance.Furthermore, the lack of reliability design in bufferless router makes it difficult to han-dle faults in the complicated situation. Thus, this dissertation investigates performanceoptimization and reliability design for the bufferless router microarchitecture. The maincontributions of this dissertation are as follows:
     1. Performance analysis for deflection routing and bufferless router based on a per-mutation network
     The thesis designs deflection routing algorithms for various NoC topologies andconducts the performance evaluations using different synthetic traffic patterns. The e-valuation results illustrate that the performance of deflection routing is susceptible tothe network topology and traffic pattern. The NoC architect should choose the suitableNoC topology for the specific application when designing bufferless NoC. For the univer-sal topology——2D Mesh NoC, the thesis proposes a1-cycle high-performance buffer-less router based on a permutation network (called BLESS_PERM). The BLESS_PERMrouter replaces the serialized switch allocator and crossbar with a simple2-level permuta- tionnetwork, whichcanreducethenumberoflogiclevelsonthecriticalpath, simplifythedesign complexity and enhance the clock frequency. Simulation results illustrate that theBLESS_PERMrouterachieves70%,65%,56%and41%lessaveragepacketlatencythanthe VC, BLESS_BASE, BLESS_PL and CHIPPER routers respectively under synthetictrafficworkloads, andachieves80%,72%,66%and38%lessaveragepacketlatencythanthose four routers respectively under real application workloads.
     2. Fault-tolerant architecture for bufferless router
     Forthereliabilitydesignofthebufferlessrouter,thethesisproposesacompletefault-tolerant architecture, which can detect and handle both transient and permanent faultylinks. The fault-tolerant architecture includes:
     An on-line fault detection mechanism using SECDED block coding, which can de-tect and distinguish transient faults from permanent faults without interfering withnormal packets transmission.
     A hybrid automatic repeat request (ARQ) and forward error correction (FEC) fault-tolerant flow-control scheme to handle transient faults occurring in packet on link-level.
     Two fault-tolerant deflection routing algorithms to route packets around permanentlinkfaultsonnetworklayer. TheFault-on-Neighbor(FoN)awaredeflectionroutingalgorithm, which can tolerate convex and concave fault regions without two con-cave points in sequence, makes routing decision based on the2-hop fault informa-tion transmission model and the fault region shape without deadlock and livelock.The reconfigurable fault-tolerant deflection routing algorithm (FTDR) based on re-inforcement learning, which can handle irregular fault regions, utilizes a reinforce-ment learning method to reconfigure the routing table to achieve fault-tolerance.A hierarchical-routing-table-based algorithm (FTDR-H) is also presented to reducethe area overhead of the FTDR router.
     Afault-tolerantdeflectionrouterwithreconfigurablebidirectionallinks(calledBiFT-DR). The BiFTDR router reconfigures the direction of the bidirectional links be-tween neighboring routers according to the link status and incoming packets infor-mation, which can handle unidirectional fault model without bypassing.
     3. High-performance and fault-tolerant deflection-routing-based multicast schemesThethesisproposesthreehigh-performancedeflection-routing-basedmulticast(DR- M) schemes. The DRM_noPR scheme is a simple multicast scheme, which selects theproductive direction based on the best candidate. The multicast packet will be routed toeachdestinationalongadynamicpathintheDRM_noPRscheme. TheDRM_PR_srcandDRM_PR_all schemes replicate multicast packets according to a region partition rule andthe busy or free status of the output ports, which can increase the diversity of the multicastpath and reduce the multicast latency. Furthermore, in order to improve the reliability ofthe multicast communication, the fault-tolerant DRM schemes (FT_DRM) are proposedbased on the three DRM schemes. FT_DRM schemes reconfigure the routing table basedon a reinforcement learning method and route multicast packets around permanent linkfaults without any packet lost. Experimental results show that in the network withoutfaulty links the DRM_PR_src scheme achieves18%less average packet latency than theDRM_noPR scheme, and the DRM_PR_all scheme achieves40%and27%less averagepacket latency than the DRM_noPR and DRM_PR_src schemes respectively. In the net-workwith5%and10%faultylinksoftotallinks, theDRM_PR_srcschemeachieves17%less average packet latency than the DRM_noPR scheme, and the DRM_PR_all schemeachieves38%less average packet latency than the DRM_noPR scheme.
     4. Bufferless router for3D NoC
     As the bufferless router extends from2D to3D, the performance of the router de-grades with the serialized output port allocation further. The thesis proposes a1-cyclehigh-performance3D bufferless router based on a3-level permutation network (called3D_PERM). The3D_PERM router uses a3-level permutation network to replace the se-rialized switch allocator and a7×7crossbar, which can improve the performance andreduce the hardware overhead. Simulation results demonstrate that the3D_PERM routerachieves73%and14%lessaveragepacketlatencythanthe3D_BASEand3D_CHIPPERrouters respectively under synthetic traffic workloads, and achieves78%and14%lessaverage packet latency than the above two3D bufferless routers respectively under realapplication workloads. To address the low yield of the TSV manufacture technology in3D IC, the thesis proposes a low-overhead fault-tolerant deflection router (called FTDR-3D_OPT) for3D Mesh NoC. The FTDR-3D_OPT router uses a layer routing table andtwo TSV state vectors to make efficient routing decision to avoid both horizontal and ver-tical link faults. Synthesize results demonstrate that the area and power consumption ofthe FTDR-3D_OPT router are40%and49%less than those of a3D fault-tolerant deflec- tion router with a global routing table.
引文
[1] Moore G. Cramming More Components onto Integrated Circuits [J]. Electronics.1965,38(8):114–117.
    [2] ITRS. International Technology Roadmap for Semiconductors2009Edition [E-B/OL].2009. http://www.itrs.net/Links/2009ITRS/Home2009.htm.
    [3] Ross P E. Why CPU Frequency Stalled [J]. IEEE Spectrum.2008,45(4):72.
    [4] Borkar S. Thousand Core Chips: A Technology Perspective [C]. In Proceedingsof the44th Annual Design Automation Conference. San Diego, CA, USA,2007:746–749.
    [5] Hennessy J L, Patterson D A. Computer Architecture: A Quantitative Approach,4th Edition [M]. San Francisco, CA, USA: Morgan Kaufmann Publishers,2006.
    [6] Nicopoulos C, Narayanan V, Das C R. Network-on-Chip Architectures: A HolisticDesign Exploration [M]. New York, NY, USA: Springer,2009.
    [7] Tullsen D M, Eggers S J, Levy H M. Simultaneous Multithreading: MaximizingOn-chip Parallelism [C]. In Proceedings of the22nd Annual International Sympo-sium on Computer Architecture. Santa Margherita Ligure, Italy,1995:392–403.
    [8] Olukotun K, Hammond L, Laudon J. Chip-Multiprocessor Architecture: Tech-niques to Improve Throughput and Latency [M]. Morgan&Claypool,2007.
    [9] Wolf W. The Future of Multiprocessor Systems-on-Chips [C]. In Proceedings ofthe41st Design Automation Conference. San Diego, CA, USA,2004:681–685.
    [10]钱悦.片上网络演算模型及性能分析[D].湖南长沙:国防科学技术大学,2010.
    [11] Lu Z, Jantsch A. Trends of Terascale Computing Chips in the Next Ten Years [C].InProceedingsofthe8thIEEEInternationalConferenceonASIC.Changsha,Chi-na,2009:62–66.
    [12] Bjerregaard T, Mahadevan S. A Survey of Research and Practices of Network-on-Chip [J]. ACM Computing Surveys.2006,38(1):1–54.
    [13] ARM. AMBA Open Specifications [EB/OL].2011. http://www.arm.com/products/system-ip/amba/amba-open-specifications.php.
    [14] IBM.CoreConnectBusArchitecture[EB/OL].2011.https://www-01.ibm.com/chips/techlib/techlib.nsf/products/CoreConnect_Bus_Architecture.
    [15] OpenCores. SoC Interconnection: Wishbone [EB/OL].2011. http://opencores.org/opencores,wishbone.
    [16] ARM. CoreLink Network Interconnect for AMBA AXI [EB/OL].2011.http://www.arm.com/products/system-ip/interconnect/axi/index.php.
    [17]葛芬.专用片上网络设计关键技术研究[D].江苏南京:南京航空航天大学,2010.
    [18]高明伦,杜高明. NoC:下一代集成电路主流设计技术[J].微电子学.2006,36(4):461–466.
    [19] HemaniA,JantschA,KumarS,etal.NetworkonChip:AnArchitectureforBillionTransistor Era [C]. In Proceedings of the18th IEEE NorChip Conference. Turku,Finland,2000.
    [20] Dally W J, Towles B. Route Packets, Not Wires: On-Chip Interconnection Net-works [C]. In Proceedings of the38th Design Automation Conference. Las Vegas,Nevada, USA,2001:684–689.
    [21] Benini L, Micheli G D. Powering Networks on Chips: Energy-efficient and Re-liable Interconnect Design for SoCs [C]. In Proceedings of the14th InternationalSymposium on System Synthesis. Montreal, Canada,2001:33–38.
    [22] Leon A S, Shin J L, Tam K W, et al. A Power-Efficient High-Throughput32-Thread SPARC Processor [C]. In Proceedings of IEEE International Solid-StateCircuits Conference. San Francisco, CA, USA,2006:295–304.
    [23] Ainsworth T W, Pinkston T M. Characterizing the Cell EIB On-Chip Network [J].IEEE Micro.2007,27(5):6–14.
    [24] Wentzlaff D, Griffin P, Hoffmann H, et al. On-Chip Interconnection Architectureof the TILE Processor [J]. IEEE Micro.2007,27(5):15–31.
    [25] Benini L, Micheli G D. Networks on Chips: A New SoC Paradigm [J]. IEEE Com-puter.2002,35(1):70–78.
    [26] Lee H G, Chang H, Ogras U Y, et al. On-Chip Communication Architecture Ex-ploration:AQuantitativeEvaluationofPoint-to-Point,Bus,andNetwork-on-ChipApproaches [J]. ACM Transactions on Design Automation of Electronic Systems.2007,12(3):1–20.
    [27]齐树波.面向片上网络的高性能路由器关键技术研究[D].湖南长沙:国防科学技术大学,2011.
    [28] Krstic M, Grass E, Gurkaynak F K, et al. Globally Asynchronous, Locally Syn-chronousCircuits:OverviewandOutlook[J].IEEEDesignandTestofComputers.2007,24(5):430–441.
    [29] Shamshiri S, Ghofrani A, Cheng K-T. End-to-End Error Correction and OnlineDiagnosis for On-Chip Networks [C]. In Proceedings of International Test Con-ference. Anabeim, CA, USA,2011.
    [30] Park D, Nicopoulos C, Kim J, et al. Exploring Fault-Tolerant Network-on-ChipArchitectures[C].InProceedingsofInternationalConferenceonDependableSys-tems and Networks. Philadelphia, Pennsylvania, USA,2006:93–104.
    [31] Marculescu R, Ogras U Y, Peh L-S, et al. Outstanding Research Problems in NoCDesign: System, Microarchitecture, and Circuit Perspectives [J]. IEEE Transac-tions on Computer-Aided Design of Integrated Circuits and Systems.2009,28(1):3–21.
    [32] Constantinescu C. Trends and Challenges in VLSI Circuit Reliability [J]. IEEEMicro.2003,23(4):14–19.
    [33] Borkar S. Designing Reliable Systems from Unreliable Components: The Chal-lenges of Transistor Variability and Degradation [J]. IEEE Micro.2005,25(6):10–16.
    [34] Nelson V P. Fault-tolerant Computing: Fundamental Concepts [J]. Computer.1990,23(7):19–25.
    [35] Hazucha P, Karnik T, Maiz J, et al. Neutron Soft Error Rate Measurements in a90-nmCMOSProcessandScalingTrendsinSRAMfrom0.25-μsto90-nmGener-ation[C].InProceedingsofthe49thIEEEInternationalElectronDevicesMeeting.Washington DC, USA,2003:21.5.1–21.5.4.
    [36] Wada Y, Nii K, Kuriyama H, et al. A128Kb SRAM with Soft Error Immunity for0.35μm SOI-CMOS Embedded Cell Arrays [C]. In Proceedings of the24th IEEEInternational SOI Conference. Stuart, Florida, USA,1998:127–128.
    [37] Noda K, Matsui K, Ito S, et al. An Ultra-High-Density High-Speed Loadless Four-Transistor SRAM Macro with a Dual-Layered Twisted Bit-Line and a Triple-WellShield [C]. In Proceedings of the IEEE Custom Integrated Circuits Conference.Orlando, Florida, USA,2000:283–286.
    [38] Lehtonen T. On Fault Tolerance Methods for Networks-on-Chip [D]. Turku, Fin-land: University of Turku,2009.
    [39] Bolchini C, Salice F, Sciuto D. Designing Self-Checking FPGAs through ErrorDetection Codes [C]. In Proceedings of the17th IEEE International Symposiumon Defect and Fault Tolerance in VLSI Systems. Vancouver, BC, Canada,2002:60–68.
    [40] Maamar A, Russell G. A32-Bit RISC Processor with Concurrent Error Detec-tion [C]. In Proceedings of the24th Euromicro Conference. Vesteras, Sweden,1998:461–467.
    [41] Quach N. High Availability and Reliability in the Itanium Processor [J]. IEEE Mi-cro.2000,20(5):61–69.
    [42] Check M A, Slegel T J. Custom S/390G5and G6Microprocessors [J]. IBM Jour-nal of Research and Development.1999,43(5/6):671–680.
    [43] Vijaykumar T N, Pomeranz I, Cheng K. Transient-Fault Recovery using Simulta-neous Multithreading [C]. In Proceedings of the29th International Symposium onComputer Architecture. Anchorage, AK, USA,2002:87–98.
    [44] Hoskote Y, Vangal S, Singh A, et al. A5-GHz Mesh Interconnect for a TeraflopsProcessor [J]. IEEE Micro.2007,27(5):51–61.
    [45] Taylor M B, Psota J, Saraf A, et al. Evaluation of the Raw Microprocessor: AnExposed-wire-delay Architecture for ILP and Streams [C]. In Proceedings of the31st Annual International Symposium on Computer Architecture. Munich, Ger-many,2004:2–13.
    [46] Kahng A B, Li B, Peh L-S, et al. ORION2.0: A Fast and Accurate NoC Power andArea Model for Early-Stage Design Space Exploration [C]. In Proceedings of theDesign, Automation and Test in Europe Conference and Exhibition. Nice, France,2009:423–428.
    [47] Wang H, Peh L-S, Malik S. Power-driven Design of Router Microarchitectures inOn-chip Networks [C]. In Proceedings of the36th Annual IEEE/ACM Interna-tional Symposium on Microarchitecture. San Diego, CA, USA,2003:105–116.
    [48] Gratz P, Kim C, Sankaralingam K, et al. On-Chip Interconnection Networks of theTRIPS Chip [J]. IEEE Micro.2007,27(5):41–50.
    [49] Moscibroda T, Mutlu O. A Case for Bufferless Routing in On-Chip Networks [C].In Proceedings of the36th Annual International Symposium on Computer Archi-tecture. New York, NY, USA,2009:196–207.
    [50] Lu Z, Zhong M, Jantsch A. Evaluation of On-Chip Networks Using DeflectionRouting [C]. In Proceedings of the16th ACM Great Lakes Symposium on VLSI.Philadelphia, PA, USA,2006:296–301.
    [51] Al-Tawil K M, Abd-El-Barr M, Ashraf F. A Survey and Comparison of WormholeRouting Techniques in a Mesh Networks [J]. IEEE Network.1997,11(2):38–45.
    [52] Kermani P, Kleinrock L. Virtual Cut-Through: A New Computer CommunicationSwitching Technique [J]. Computer Networks.1979,3:267–286.
    [53] Gomez C, Gomez M E, Lopez P, et al. Reducing Packet Dropping in a BufferlessNoC[C].InProceedingsofthe14thInternationalEuro-ParConferenceonParallelProcessing. Berlin, Heidelberg,2008:899–909.
    [54] Libeskind-Hadas R, Watkins K, Hehre T. Fault-Tolerant Multicast Routing in theMesh with No Virtual Channels [C]. In Proceedings of the2nd IEEE Sympo-sium on High-Performance Computer Architecture. San Jose, CA, USA,1996:180–191.
    [55] Shivakumar P, Kistler M, Keckler S W, et al. Modeling the Effect of Technolo-gy Trends on the Soft Error Rate of Combinational Logic [C]. In Proceedings ofInternational Conference on Dependable Systems and Networks. Bethesda, MD,USA,2002:389–398.
    [56] Lehtonen T, Liljeberg P, Plosila J. Fault Tolerance Analysis of NoC Architec-tures [C]. In Proceedings of IEEE International Symposium on Circuits and Sys-tems. New Orleans, USA,2007:361–364.
    [57] Dally W J, Towles B. Principles and Practices of Interconnection Networks [M].San Francisco, CA, USA: Morgan Kaufmann Publishers,2004.
    [58] Peh L-S, Dally W J. A Delay Model and Speculative Architecture for PipelinedRouters [C]. In Proceedings of the7th International Symposium on High-Performance Computer Architecture. Nuevo Leone, Mexico,2001:255–266.
    [59] Mullins R, West A, Moore S. Low-Latency Virtual-Channel Routers for On-ChipNetworks [C]. In Proceedings of the31st Annual International Symposium onComputer Architecture. Munich, Germany,2004:188–197.
    [60] Kim J, Nicopoulos C, Park D, et al. A Gracefully Degrading and Energy-EfficientModular Router Architecture for On-Chip Networks [C]. In Proceedings of the33rd International Symposium on Computer Architecture. Boston, MA, USA,2006:4–15.
    [61] RamanujamRS,SoteriouV,LinB,etal.DesignofaHigh-ThroughputDistributedShared-BufferNoCRouter[C].InProceedingsofthe4thACM/IEEEInternationalSymposium on Networks-on-Chip. Grenoble, France,2010:69–78.
    [62] Hayenga M, Jerger N E, Lipasti M. SCARAB: A Single Cycle Adaptive Routingand Bufferless Network [C]. In Proceedings of the42nd Annual IEEE/ACM Inter-national Symposium on Microarchitecture. New York, NY, USA.,2009:244–254.
    [63] Fallin C, Craik C, Mutlu O. CHIPPER: A Low-complexity Bufferless DeflectionRouter [C]. In Proceedings of the17th IEEE International Symposium on High-Performance Computer Architecture. San Antonio, Texas, USA,2011:144–155.
    [64] Baran P. On Distributed Communications Networks [J]. IEEE Transactions onCommunications Systems.1964,12(1):1–9.
    [65] Chich T, Cohen J, Fraigniaud P. Unslotted Deflection Routing: A Practical andEfficient Protocol for Multihop Optical Networks [J]. IEEE Transactions on Net-working.2001,9(1):47–59.
    [66] McKinley P K, Xu H, Esfahanian A-H, et al. Unicast-Based Multicast Commu-nication in Wormhole-Routed Networks [J]. IEEE Transactions on Parallel andDistributed Systems.1994,5(12):1252–1265.
    [67] Lu Z, Yin B, Jantsch A. Connection-oriented Multicasting in Wormhole-switchedNetworks on Chip [C]. In Proceedings of the2006Emerging VLSI Technologiesand Architectures. Karlsruhe, Germany,2006:1–6.
    [68] Daneshtalab M, Ebrahimi M, Xu T C, et al. A Generic Adaptive Path-basedRouting Method for MPSoCs [J]. Journal of Systems Architecture.2010,57(1):109–120.
    [69] Jerger N E, Peh L-S, Lipasti M. Virtual Circuit Tree Multicasting: A Case for On-Chip Hardware Multicast Support [C]. In Proceedings of the35th InternationalSymposium on Computer Architecture. Beijing, China,2008:229–240.
    [70] Hu W, Lu Z, Jantsch A, et al. Power-efficient Tree-based Multicast Support forNetworks-on-Chip [C]. In Proceedings of the16th Asia and South Pacific DesignAutomation Conference. Yokohama, Japan,2011:363–368.
    [71] Wang L, Jin Y, Kim H, et al. Recursive Partitioning Multicast: A Bandwidth-EfficientRoutingforNetworks-on-Chip[C].InProceedingsofthe3rdACM/IEEEInternational Symposium on Networks-on-Chip. La Jolla, CA, USA,2009:64–73.
    [72] Abad P, Puente V, Gregorio J A. MRR: Enabling Fully Adaptive Multicast Rout-ingforCMPInterconnectionNetworks[C].InProceedingsofthe15thIEEEInter-national Symposium on High-Performance Computer Architecture. Raleigh, NC,USA,2009:355–366.
    [73] Topol A W, Tulipe D C L, Shi L, et al. Three-Dimensional Integrated Circuits [J].IBM Journal of Research and Development.2006,50(4/5):491–506.
    [74] Feero B S, Pande P P. Networks-on-Chip in a Three-Dimensional Environment:A Performance Evaluation [J]. IEEE Transactions on Computers.2009,58(1):32–45.
    [75] Kim J, Nicopoulos C, Park D, et al. A Novel Dimensionally-decomposed Routerfor On-Chip Communication in3D Architectures [C]. In Proceedings of the34thAnnual International Symposium on Computer Architecture. San Diego, Califor-nia, USA,2007:138–149.
    [76] Li F, Nicopoulos C, Richardson T, et al. Design and Management of3D Chip Mul-tiprocessors Using Network-in-Memory [C]. In Proceedings of the33rd Interna-tional Symposium on Computer Architecture. Boston, MA, USA,2006:130–141.
    [77] Bertozzi D, Benini L, Micheli G D. Error Control Schemes for On-Chip Com-munication Links: The Energy–Reliability Tradeoff [J]. IEEE Transactions onComputer-Aided Design of Integrated Circuits and Systems.2005,24(6):818–831.
    [78] MuraliS,TheocharidesT,NarayananV,etal.AnalysisofErrorRecoverySchemesfor Networks on Chips [J]. IEEE Design and Test of Computers.2005,22(5):434–442.
    [79] Mediratta S D, Draper J. Performance Evaluation of Probe-Send Fault-tolerantNetwork-on-Chip Router [C]. In Proceedings of IEEE International ConferenceonApplication-specificSystems,ArchitecturesandProcessors.Montreal,Canada,2007:69–75.
    [80] Raik J, Ubar R, Govind V. Test Configurations for Diagnosing Faulty Links inNoC Switches [C]. In Proceedings of the12th IEEE European Test Symposium.Freiburg, Germany,2007:29–34.
    [81] Grecu C, Ivanov A, Saleh R, et al. On-line Fault Detection and Location for NoCInterconnects [C]. In Proceedings of the12th IEEE International On-Line TestingSymposium. Como, Italy,2006:1–6.
    [82] Lehtonen T, Wolpert D, Liljeberg P, et al. Self-Adaptive System for AddressingPermanent Errors in On-Chip Interconnects [J]. IEEE Transactions on Very LargeScale Integration (VLSI) Systems.2010,18(4):527–540.
    [83] Duato J, Yalamanchili S, Ni L. Interconnection Networks: An Engineering Ap-proach [M]. San Francisco, CA, USA: Morgan Kaufmann Publishers,2003.
    [84] Ali M, Welzl M, Hessler S. An End-to-End Reliability Protocol to Address Tran-sient Faults in Network on Chips [C]. In Proceedings of Workshop on DiagnosticServices in Network-on-Chips. Nice, France,2007:376–381.
    [85] Kang Y H, Kwon T-J, Draper J. Fault-Tolerant Flow Control in On-Chip Net-works [C]. In Proceedings of the4th ACM/IEEE International Symposium onNetworks-on-Chip. Grenoble, France,2010:79–86.
    [86] Kang Y H, Kwon T-J, Drape J. Dynamic Packet Fragmentation for IncreasedVirtual Channel Utilization in On-Chip Routers [C]. In Proceedings of the3rdACM/IEEE International Symposium on Networks-on-Chip. La Jolla, CA, USA,2009:250–255.
    [87] Kohler A, Schley G, Radetzki M. Fault Tolerant Network on Chip Switching withGraceful Performance Degradation [J]. IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems.2010,29(6):883–896.
    [88] Bogdan P, Dumitras T, Marculescu R. Stochastic Communication: A NewParadigm for Fault-Tolerant Networks-on-Chip [J]. VLSI Design.2007,2007:1–17.
    [89] Dumitras T, Kerner S, Marculescu R. Towards On-Chip Fault-Tolerant Commu-nication [C]. In Proceedings of the Asia and South Pacific Design AutomationConference. Kitakyushu, Japan,2003:225–232.
    [90] Pirretti M, Link G M, Brooks R R, et al. Fault Tolerant Algorithms for Network-on-ChipInterconnect[C].InProceedingsofIEEEComputerSocietyAnnualSym-posium on VLSI. Lafayette, LA, USA,2004:46–51.
    [91]张磊,李华伟,李晓维.用于片上网络的容错通信算法[J].计算机辅助设计与图形学学报.2007,19(4):508–514.
    [92] Wu J. Fault-Tolerant Adaptive and Minimal Routing in Mesh-Connected Multi-computers Using Extended Safety Levels [J]. IEEE Transactions on Parallel andDistributed Systems.2000,11(2):149–159.
    [93] Zhang Z, Greiner A, Taktak S. A Reconfigurable Routing Algorithm for a Fault-Tolerant2D-Mesh Network-on-Chip [C]. In Proceedings of the45th ACM/IEEEDesign Automation Conference. Anaheim, CA, USA,2008:441–446.
    [94] Holsmark R, Palesi M, Kumar S. Deadlock Free Routing Algorithms for IrregularMesh Topology NoC Systems with Rectangular Regions [J]. Journal of SystemsArchitecture.2008,54(3-4):427–440.
    [95] BondyJA,MurtyUSR.GraphTheory[M].NewYork,NY,USA:Springer,2008.
    [96] Schroeder M D, Birrell A D, Burrows M, et al. Autonet: A High-Speed, Self-Configuring Local Area Network Using Point-to-Point Links [J]. IEEE Journal onSelected Areas in Communications.1991,9(8):1318–1335.
    [97] Puente V, Gregorio J A, Vallejo F, et al. IMMUNET: Dependable Routing for In-terconnection Networks with Arbitrary Topology [J]. IEEE Transactions on Com-puters.2009,57(12):1676–1689.
    [98] Schonwald T, Zimmermann J, Bringmann O, et al. Fully Adaptive Fault-TolerantRouting Algorithm for Network-on-Chip Architectures [C]. In Proceedings of the10thEuromicroConferenceonDigitalSystemDesignArchitectures,MethodsandTools. Lubeck, Germany,2007:527–534.
    [99] Fick D, DeOrio A, Chen G, et al. A Highly Resilient Routing Algorithm for Fault-Tolerant NoCs [C]. In Proceedings of the Design, Automation and Test in EuropeConference and Exhibition. Nice, France,2009:21–26.
    [100] Mejia A, Palesi M, Flich J, et al. Region-based Routing: A Mechanism to SupportEfficient Routing Algorithms in NoCs [J]. IEEE Transactions on Very Large ScaleIntegration (VLSI) Systems.2009,17(3):356–369.
    [101] Xiang D, Zhang Y, Sun J. Unicast-based Fault-tolerant Multicasting in Wormhole-routedHypercubes[J].JournalofSystemsArchitecture.2008,54(12):1164–1178.
    [102] Xiang D. Fault-tolerant Routing in Hypercube Multicomputers Using Local Safe-ty Information [J]. IEEE Transactions on Parallel and Distributed Systems.2001,12(9):942–951.
    [103] Wu J, Chen X. Fault-Tolerant Tree-Based Multicasting in Mesh Multicomputer-s [J]. Journal of Computer Science and Technology.2001,16(5):393–409.
    [104] RodrigoS,FlichJ,DuatoJ.EfficientUnicastandMulticastSupportforCMPs[C].In Proceedings of the41st IEEE/ACM International Symposium on Microarchi-tecture. Lake Como, Italy,2008:364–375.
    [105] Millberg M, Nilsson E, Thid R, et al. The Nostrum Backbone-a CommunicationProtocol Stack for Networks on Chip [C]. In Proceedings of the17th InternationalConference on VLSI Design. Mumbai, India,2004:693–696.
    [106] Rijpkema E, Goossens K, Radulescu A, et al. Trade-offs in the Design of a RouterwithbothGuaranteedandBest-effortServicesforNetworksonChip[J].IEECom-puters and Digital Techniques.2003,150(5):294–302.
    [107] Goossens K, Dielissen J, Radulescu A. thereal Network on Chip: Concepts, Ar-chitectures, and Implementations [J]. IEEE Design and Test of Computers.2005,22(5):414–421.
    [108] Bertozzi D, Benini L. Xpipes: A Network-on-Chip Architecture for GigascaleSystems-on-Chip [J]. IEEE Circuits and Systems Magazines.2004,4(2):18–31.
    [109] Bertozzi D, Jalabert A, Murali S, et al. NoC Synthesis Flow for Customized Do-main Specific Multiprocessor Systems-on-Chip [J]. IEEE Transactions on Paralleland Distributed Systems.2005,16(2):113–129.
    [110] Bjerregaard T, Sparso J. Virtual Channel Designs for Guaranteeing Bandwidth inAsynchronous Network-on-Chip [C]. In Proceedings of the22nd Norchip Confer-ence. Oslo, Norway,2004:269–272.
    [111] Bjerregaard T, Sparso J. A Router Architecture for Connection-oriented ServiceGuarantees in the MANGO Clockless Network-on-Chip [C]. In Proceedings ofthe Design, Automation and Test in Europe Conference and Exhibition. Munich,Germany,2005:1226–1231.
    [112] Bjerregaard T, Sparso J. A Scheduling Discipline for Latency and BandwidthGuarantees in Asynchronous Network-on-Chip [C]. In Proceedings of the11thIEEE International Symposium on Asynchronous Circuits and Systems. New Y-ork, NY, USA,2005:34–43.
    [113] Gowan M K, Biro L L, Jackson D B. Power Considerations in the Design of theAlpha21264Microprocessor [C]. In Proceedings of the35th Design AutomationConference. San Francico, CA, USA,1998:726–731.
    [114] Tota S, Casu M R, Macchiarulo L. Implementation Analysis of NoC: A MPSoCTrace-Driven Approach [C]. In Proceedings of the16th ACM Great Lakes sym-posium on VLSI. New York, NY, USA,2006:204–209.
    [115] Hosseinabady M, Mathew J, Pradhan D K. Application of De Bruijn Graphs toNOCDesign[C].InProceedingsofWorkshoponDiagnosticServicesinNetwork-on-Chips. Nice, France,2007:346–351.
    [116] DuboisF,CanoJ,CoppolaM.SpidergonSTNoCDesignFlow[C].InProceedingsof the5th IEEE/ACM International Symposium on Networks on Chip. Pittsburgh,Pennsylvania, USA,2011:267–268.
    [117] Maxemchuk N F. Routing in the Manhattan Street Network [J]. IEEE Transactionson Communications.1987,35(5):503–512.
    [118] Nilsson E, Millberg M, Oberg J, et al. Load Distribution with the Proximity Con-gestion Awareness in a Network on Chip [C]. In Proceedings of the Design, Au-tomation and Test in Europe Conference and Exhibition. Munich, Germany,2003:1126–1127.
    [119] Mao J-W, Yang C-B. Shortest Path Routing and Fault-Tolerant Routing on DeBruijn Networks [J]. Networks.2000,35(3):207–215.
    [120] Nilsson E. Design and Implementation of a Hot-potato Switch in a Network onChip [D]. Stockholm, Sweden: Royal Institute of Technology,2002.
    [121] Woo S C, Ohara M, Torrie E, et al. The SPLASH-2Programs: Characterizationand Methodological Considerations [C]. In Proceedings of the22nd Annual Inter-national Symposium on Computer Architecture. Santa Margherita Ligure, Italy,1995:24–36.
    [122] Magnusson P S, Christensson M, Eskilson J, et al. Simics: A Full System Simula-tion Platform [J]. IEEE Computer.2002,35(2):50–58.
    [123] Martin M M K, Sorin D J, Beckmann B M, et al. Multifacet’s General Execution-driven Multiprocessor Simulator (GEMS) Toolset [J]. ACM SIGARCH ComputerArchitecture News.2005,33(4):92–99.
    [124] Agarwal N, Krishna T, Peh L-S, et al. GARNET: A Detailed On-Chip NetworkModel inside a Full-System Simulator [C]. In Proceedings of IEEE Internation-al Symposium on Performance Analysis of Systems and Software. Boston, Mas-sachusetts, USA,2009:33–42.
    [125] Frantz A P, Carro L, Cota E, et al. Evaluating of SEU and Crosstalk Effects inNetwork-on-Chip Routers [C]. In Proceedings of the12th IEEE International On-Line Testing Symposium. Como, Italy,2006.
    [126] Karl E, Blaauw D, Sylvester D, et al. Reliability Modeling and Management inDynamic Microprocessor-based Systems [C]. In Proceedings of the43rd DesignAutomation Conference. San Francisco, CA, USA,2006:1057–1060.
    [127] Zimmermann H. OSI Reference Model–The ISO Model of Architecture for OpenSystemsInterconnection[J].IEEETransactionsonCommunications.1980,28(4):425–432.
    [128] Pasricha S, Zou Y, Connors D, et al. OE+IOE: A Novel Turn Model BasedFault Tolerant Routing Scheme for Networks-on-Chip [C]. In Proceedings ofIEEE/ACM/IFIP International Conference on Hardware/Software Codesign andSystem Synthesis. Scottsdale, AZ, USA,2010:85–93.
    [129] Patooghy A, Miremadi S G. XYX: A Power&Performance Efficient Fault-Tolerant Routing Algorithm for Network on Chip [C]. In Proceedings of the17thEuromicro International Conference on Parallel, Distributed and Network-basedProcessing. Weimar, Germany,2009:245–251.
    [130] Narasimham B, Ramachandran V, Bhuva B L, et al. On-Chip Characterization ofSingle-Event Transient Pulsewidths [J]. IEEE Transactions on Device and Mate-rials Reliability.2006,6(4):542–549.
    [131] Safaei F, Fathy M, Khonsari A, et al. On Quantifying Fault Patterns of the MeshInterconnect Networks [C]. In Proceedings of the21st International Conferenceon Advanced Information Networking and Applications. Niagara Falls, Canada,2007:956–961.
    [132] Chaix F, Avresky D, Zergainoh N-E, et al. Fault-tolerant Deadlock-free AdaptiveRouting for Any Set of Link and Node Failures in Multi-Cores Systems [C]. InProceedings of the9th IEEE International Symposium on Network Computingand Applications. Cambridge, MA, USA,2010:52–59.
    [133] SuttonRS,BartoAG.ReinforcementLearning:AnIntroduction[M].Cambridge,Massachusetts: MIT Press,2005.
    [134] Kaelbling L P, Littman M L, Moore A W. Reinforcement Learning: A Survey [J].Journal of Artificial Intelligence Research.1996,4:237–285.
    [135] Boyan J A, Littman M L. Packet Routing in Dynamically Changing Networks: AReinforcement Learning Approach [J]. Advances in Neural Information Process-ing Systems.1994,6(1994):671–678.
    [136] Chen X, Lu Z, Jantsch A, et al. Supporting Distributed Shared Memory on Multi-core Network-on-Chips Using a Dual Microcoded Controller [C]. In Proceedingsof the Design, Automation and Test in Europe Conference and Exhibition. Dres-den, Germany,2010:39–44.
    [137] Piedra R M. Parallel1-D FFT Implementation with TMS320C4x DSPs [R].1994.
    [138] Kim J, Park D, Nicopoulos C, et al. Design and Analysis of an NoC ArchitecturefromPerformance,ReliabilityandEnergyPerspective[C].InProceedingsofACMSymposium on Architecture for Networking and Communication Systems. NewYork, NY, USA,2005:173–182.
    [139] Lillis J, Cheng C-K. Timing Optimization for Multisource Nets: CharacterizationandOptimalRepeaterInsertion[J].IEEETransactionsonComputer-AidedDesignof Integrated Circuits and Systems.1999,18(3):322–331.
    [140] Lan Y-C, Lo S-H, Lin Y-C, et al. BiNoC: A Bidirectional NoC Architecture withDynamic Self-reconfigurable Channel [C]. In Proceedings of the3rd ACM/IEEEInternational Symposium on Networks-on-Chip. San Diego, CA, USA,2009:266–275.
    [141] Tsai W-C, Zheng D-Y, Chen S-J, et al. A Fault-tolerant NoC Scheme Using Bidi-rectional Channel [C]. In Proceedings of the48th Design Automation Conference.San Diego, CA, USA,2011:918–923.
    [142] Cho M H, Lis M, Shim K S, et al. Oblivious Routing in On-Chip Bandwidth-adaptive Networks [C]. In Proceedings of the18th International Conference onParallel Architectures and Compilation Techniques. Raleigh, North Carolina, US-A,2009:181–190.
    [143] Chaiken D, Field C, Kurihara K, et al. Directory-based Cache Coherence in Large-scale Multiprocessors [J]. Computer.1990,23(6):49–58.
    [144] MartinMD,MiloM KandHill,WoodDA.TokenCoherence:DecouplingPerfor-mance and Correctness [C]. In Proceedings of the30th International Symposiumon Computer Architecture. San Diego, CA, USA,2003:182–193.
    [145] ChiangC-M,NiLM.Multi-addressEncodingforMulticast[C].InProceedingsofthe1stInternationalWorkshoponParallelComputerRoutingandCommunication.London, UK,1994:146–160.
    [146] Patti R S. Three-Dimensional Integrated Circuits and the Future of System-on-Chip Designs [J]. Proceedings of the IEEE.2006,94(6):1214–1224.
    [147] Joyner J W, Venkatesan R, Zarkesh-Ha P, et al. Impact of Three-dimensional Ar-chitectures on Interconnects in Gigascale Integration [J]. IEEE Transactions onVery Large Scale Integration (VLSI) Systems.2001,9(6):922–928.
    [148] Loi I, Angiolini F, Benini L. Supporting Vertical Links for3D Networks-on-Chip:Toward an Automated Design and Analysis Flow [C]. In Proceedings of the2ndInternational Conference on Nano-Networks. Brussels, Belgium,2007.
    [149] Pavlidis V F, Friedman E G.3-D Topologies for Networks-on-Chip [J]. IEEETransactions on Very Large Scale Integration (VLSI) Systems.2007,15(10):1081–1090.
    [150] Weldezion A Y, Grange M, Pamunuwa D, et al. Scalability of Network-on-Chip Communication Architecture for3D Meshes [C]. In Proceedings of the3rdACM/IEEE International Symposium on Networks-on-Chip. La Jolla, CA, USA,2009:114–123.
    [151] ParkD,EachempatiS,DasR,etal.MIRA:AMulti-LayeredOn-ChipInterconnectRouter Architecture [C]. In Proceedings of the35th International Symposium onComputer Architecture.2008:251–261.
    [152] Loi I, Mitra S, Lee T H, et al. A Low-overhead Fault Tolerance Scheme for TSV-based3D Network on Chip Links [C]. In Proceedings of IEEE/ACM InternationalConference on Computer-Aided Design. San Jose, CA, USA,2008:598–602.
    [153] Miyakawa N, Maebashi T, Nakamura N, et al. New Multi-Layer Stacking Tech-nology and Trial Manufacture [R].2007.
    [154] Swinnen B, Ruythooren W, De Moor P, et al.3D Integration by Cu-Cu Thermo-compression Bonding of Extremely Thinned Bulk-Si Die Containing10μm PitchThrough-Si Vias [C]. In Proceedings of International Electron Devices Meeting.San Francisco, CA, USA,2006:1–4.
    [155] Topol A W, La Tulipe D C, Shi L, et al. Enabling SOI based Assembly Tech-nology for Three Dimensional Integrated Circuits [C]. In Proceedings of IEEEInternational Electron Devices Meeting. Washington DC, USA,2005:352–355.
    [156] Qian Y, Lu Z, Dou W. From2D to3D NoCs: A Case Study on Worst-Case Com-munication Performance [C]. In Proceedings of IEEE/ACM International Confer-ence on Computer-Aided Design. San Jose, CA, USA,2009:555–562.
    [157] Addo-QuayeC.Thermal-awareMappingandPlacementfor3-DNoCDesigns[C].In Proceedings of IEEE International SOC Conference. Washington DC, USA,2005:25–28.
    [158] Bartzas A, Skalis N, Siozios K, et al. Exploration of Alternative Topologies forApplication-Specific3D Networks-on-Chip [C]. In Proceedings of the5th Work-shop on Application Specific Processors. Salzburg, Austria,2007.
    [159] Wu J. A Simple Fault-Tolerant Adaptive and Minimal Routing Approach in3DMeshes [J]. Journal of Computer Science and Technology.2003,18(1):1–13.
    [160] Lu Z, Jantsch A, Salminen E, et al. Network-on-Chip Benchmarking SpecificationPart2: Micro-Benchmark Specification [C]. In OCP-IP.2008.
    [161] Hu J, Marculescu R. Energy-aware Mapping for Tile-based NoC Architecturesunder Performance Constraints [C]. In Proceedings of the Asia and South PacificDesign Automation Conference. Kitakyushu, Japan,2003:233–239.
    [162] Marcon C A M, Moreno E I, Calazans N L V, et al. Comparison of Network-on-Chip Mapping Algorithms Targeting Low Energy Consumption [J]. IET Comput-ers and Digital Techniques.2008,2(6):471–482.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700