用户名: 密码: 验证码:
面向RSA的密码芯片硬件体系结构的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
信息安全在当前的社会生活中成为人们越来越关心问题,而保证信息安全的核心——加解密技术无疑是当前信息安全领域的研究热点。如何高效快速地实现当前广泛应用的RSA加解密算法是保证信息安全的关键。本文利用余数系统的数值表征形式,在课题组多年积累的基于传输触发架构的可配置可扩展处理器设计方法学基础上,实现了兼具高数据吞吐率、低硬件资源开销和支持多种密钥长度等优点的RSA加解密协处理器。
     本文首先对实现RSA加解密流程的具体算法进行了讨论和优化。采用了基于余数系统(RNS)的RSA加解密实现算法,并对RNS蒙哥马利模乘算法进行了数据并行度的分析,进一步根据可配置可扩展处理器的硬件特性,对基于余数系统的蒙哥马利模乘以及模幂算法进行了重新调度,最大限度的揭示其数据级并行度。在此基础上,选用了特殊的RNS基,使得在RNS域下的模乘、模加等操作的复杂度大大降低。并在这种特殊的基形式下,提出了余数系统数值表征形式和传统二进制数值表征形式之间的相互转换算法,解决了困扰余数系统实现的另一个关键问题。
     其次,对基于传输触发架构的RSA密码协处理器进行架构设计和功能单元的定制。利用传输触发架构在汇编代码层次上的细粒度数据传输可见的优势,消除了部分读后写和写后写等数据相关,充分挖掘了指令级并行性。设计了可重构的模乘累加阵列,以加速RSA加解密的核心运算——RNS蒙哥马利模乘,同时在模乘累加阵列设计了可重构的专用数据通路,降低了总线负载,提高了硬件资源利用率。并对协处理器的互联网络进行了裁剪,仅保留需要数据传输的寄存器之间的互连资源,降低硬件资源开销。
     通过最终的实验结果可以看出,所设计的RSA密码协处理器兼具高数据吞吐率、低硬件资源开销和支持多种密钥长度等优点。在100MHZ的主频下, 1024比特的RSA解密算法数据吞吐率达到106Kbps。在SMIC 0.18μm CMOS的工艺库下,协处理器的逻辑等效门数仅为101Kgates。同时支持从512比特到4096比特的当前主流密钥长度的RSA加解密算法,达到了较好的性能。
Information security is gaining increasing importance in the current social life. Public key cryptosystems, such as RSA which is widely used, is an essential tool in information security field. In this paper, the residue number system (RNS) is introduced to implement RSA. Based on the design methodology of configurable and scalable processor using transport trigger architecture (TTA), a cryptographic co-processor with high data throughput, low hardware overhead and wide RSA key length range supporting is implemented.
     Firstly, this paper discusses and optimizes the implementation algorithms of RSA encryption and decryption. An implementation method based on RNS is adopted, and data parallelism of RNS Montgomery modular multiplication algorithm is analyzed. Further more the RNS Montgomery modular multiplication algorithm is re-scheduled according to hardware resource to reveal the maximum data-level parallelism. A particular RNS base, which makes modular multiplication and modular addition operations in the RNS domain simple, is choose to reduce chip area cost. Base on this particular RNS base, an easy implementation transformation method between RNS representation and binary representation is proposed.
     Secondly, a RSA cryptographic co-processor is designed and the functional units are customized. Some name dependencies are eliminated taking advantage of fine-grained data move of TTA to fully exploit the instruction-level parallelism. A reconfigurable modular multiplication-and-accumulation array is designed to speed up the computation of RNS Montgomery modular multiplication which reduces the bus load and improves the hardware resource utilization. And the interconnections of the co-processor are customized to reduce hardware cost.
     The result shows that this design achieves high performance. The 1024-bit RSA decryption data throughput is up to 106Kbps at frequency of 100MHZ and the logic is only 101Kgates using SMIC 0.18μm CMOS technology.
引文
[1] R. L. Rivest, A. Shamir, L. Adleman, A method for obtaining digital signatures and public-key cryptosystems, Communications of the ACM, 1978, 21(2):120~126
    [2] Arora. D, Raghunathan. A, Ravi. S et al, Exploring Software Partitions for Fast Security Processing on a Multiprocessor Mobile SoC, IEEE Transactions on VLSI Systems, 2007, 15(6): 699~710
    [3] Thanh-Ha Le, Cécile Canovas, An Overview of Side Channel Analysis Attacks, Proceedings of the 2008 ACM symposium on Information, computer and communications security, 2008, 33~43
    [4]童元满,陆洪毅,王志英等,一种面向安全SOC的可信体系结构,华中科技大学学报(自然科学版),2008, 36(11):44~47
    [5] Hans Eberle, Nils Gura, SC Shantz et al, A Public-key Cryptographic Processor for RSA and ECC, Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, 2004, 98-110
    [6] A. Daly, W. Marnane, Efficient architectures for implementing Montgomery modular multiplication and RSA modular exponentiation on reconfigurable logic, Proceedings of ACM/SIGDA Int. Symp. on Field Programmable Gate Arrays (FPGA), 2002, 40~49
    [7] C. McIvor, M. McLoone, J. McCanny et al, Fast Montgomery modular multiplication and RSA cryptographic processor architectures, Proceedings of 37th Annu. Asilomar Conf. Signals, Systems and Computers, 2003, 379~384
    [8] W.C. Tsai, C.B. Shung, S.J.Wan, Two systolic architectures for modular multiplication, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2000, 8(1):103~107
    [9] N. Nedjah, L. Mourelle, Fast reconfigurable systolic hardware for modular multiplication and exponentiation, Journal of Systems Architecture, 2003, 49(7~9):387~396
    [10] G. Chen, G.Q. Bai, H.Y. Chen, A high-performance elliptic curve cryptographic processor for general curves over GF(p) based on a systolic arithmetic unit, IEEE Transactions on Circuits and SystemsII: Express Briefs, 2007, 54(5):412~416
    [11] F. Tenca, C. Koc, A scalable architecture for modular multiplication based on Montgomerys algorithm, IEEE Transaction on Computers, 2003, 52(9):1215~1221
    [12] C. McIvor, M. McLoone, J. McCanny, FPGA Montgomery modular multiplication architectures suitable for ECCs over GF(p), Proceedings of Int. Symp.on Circuits and Systems(ISCAS), 2004, 519~512
    [13] D. Harris, R. Krishnamurthy, S. Mathew et al, An improved unified scalable radix-2 Montgomery multiplier, Proceedings of 17th IEEE Symp. on Computer Arithmetic, 2005, 172~178.
    [14] S.M. Der, L.W. Ching, Word-based Montgomery modular multiplication algorithm for low-latency scalable architectures, IEEE Transaction on Computers, 2010, 59(8):1145~1151
    [15] M.Q. Huang, K. Gaj, T. El-Ghazawi, New hardware architectures for Montgomery modular multiplication algorithm, IEEE Transactions on Computers, 2011, 60(7):923~936
    [16] Ronghua Lu, Jun Han, Xiaoyang Zeng et al, A Low-Cost Cryptographic Processor for Security Embedded System, Proceedings of the 2008 Asia and South Pacific Design Automation Conference,2008, 113~114
    [17]曲英杰,刘卫东,战嘉瑾,可重构密码协处理器简介及其特性,计算机工程, 2004, 30(13):166~168
    [18] Sun Da-Zhi, Zhen-Fu Cao, Yu Sun, How to compute modular exponentiation with large operators based on the right-to-left binary algorithm, Applied Mathematics and Computation, 2006, 176(1):280~292
    [19] P. Montgomery, Modular multiplication without trial division, Mathematics of Computation, 1985, 44(170):519~521
    [20] Richa Garg, Renu Vig, An efficient Montgomery multiplication algorithm and RSA cryptographic processor, Proceedings of International Conference on Computational Intelligence and Multimedia Applications, 2007, 188~195
    [21] S. Kawamura, M. Koike, F. Sano et al, Cox-Rower Architecture for Fast Montgomery Multiplication, Proceedings of EUROCRYPT, 2000, 523~538
    [22] H. Nozaki, M. Motoyama, A. Shimbo et al, Implementation of RSA algorithm based on RNS Montgomery multiplication, Proceedings of Workshop on Cryptographic Hardware and Embedded Systems (CHES), 2001, 372~385
    [23] Jean-Claude Bajard, Nicolas Meloni, Thomas Plantard, Efficient RNS based for Cryptography, Proceedings of World Congress of IMACS,2005, 11~15
    [24] D. Hankerson, A. Menezes, S. Vanstone, Guide to Elliptic Curve Cryptography, Springer-Verlag, 2004
    [25] H. Corporaal, From VLIW to TTA, Chichester, UK: John Wiley & Sons, 1997, 35~40
    [26] CORPORAAL. H, ARNORLD M, Using transport triggered architectures for embedded processor design, Integrated Computer-Aided Eng, 1998, 5(1): 19~38
    [27] HOOGERBRUGGE. J, Code Generation for Embedded System, PhD thesis, Netherland: Delft University of Technology, 1996
    [28] Corporaal. H, TTAs, Missing the ILP Complexity Wall, Journal of Systems Architecture, 1999, 45(12): 949~973
    [29]岳虹,沈立,戴葵,基于TTA的嵌入式ASIP设计,计算机研究与发展,2006,43(4):752~758
    [30] Q. Liu, F.Z. Ma, D. Tong et al, A regular parallel RSA processor, Proceedings of 47th IEEE Int. Midwest Symp. on Circuits and Systems (MWSCAS),2004, 467~470
    [31] M.D. Shieh, J.H. Chen, W.C. Lin et al, A new algorithm for high-speed modular multiplication design, IEEE Transactions on Circuits and Systems-II: REGULAR PAPERS, 2009, 56(9):2009~2019
    [32] M.D. Shieh, J.H. Chen, H.H. Wu et al, A new modular exponentiation architecture for efficient design of RSA cryptosystem, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2008, 16(9):1151~1161

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700