Vector microprocessors for cryptography
Résumé
Embedded security devices like ‘Trusted Platforms' require both scalability (of power, performance and area) and flexibility (of software and countermeasures). This thesis illustrates how data parallel techniques can be used to implement scalable architectures for cryptography. Vector processing is used to provide high performance, power efficient and scalable processors. A programmable vector 4-stage pipelined co-processor, controlled by a scalar MIPS compatible processor, is described. The instruction set of the co-processor is defined for cryptographic algorithms like AES and Montgomery modular multiplication for RSA and ECC. The instructions are assessed using an instruction set simulator based on the ArchC tool. This instruction set simulator is used to see the impact of varying the vector register depth (p) and the number of vector processing units (r). Simulations indicate that for vector versions of AES, RSA and ECC the performance improves in O(log(r)). A cycle-accurate synthesisable Verilog model of the system (VeMICry) is implemented in TSMC's 90nm technology and used to show that the best area/power/performance trade-off is reached for r = p . Also, this highly scalable design allows area/power/performance trade-offs to be made for a panorama of applications ranging from smart-cards to servers. This thesis is, to my best knowledge, the first attempt to implement embedded cryptography using vector processing techniques.