Research Programme in Assuring Hardware Security by Design in Systems on Chip


Research thrusts


Leveraging on the physical protection techniques investigated in thrust 1, secure architectures with solidly grounded assumptions on physical security will be explored in thrust 2. Nowadays, energy efficiency and low power consumption are major market differentiators for all scales, ranging from MCUs to SoCs. This motivates the focus on low-overhead solutions for secure architectures in thrust 2.To this aim, the physical security countermeasures in thrust 1 and the architecture features are synergistically exploited to maximize energy and area efficiency. Firstly, the architecture investigated in this thrust introduces selective physical protectionin essential components that maintain the security state, while using architectural security approaches (i.e., access control, encrypted on-chip communication) to provide security guarantees on the physically unprotected areas of the chip. In this way, data can be transmitted over long distances inside the chip, while avoiding the large cost of physical security protection. Secondly, a key attribute of the architecture security component is the physical separation of secure and insecure components, as well as strict security control orchestrated by a trusted CPU with respect to isolation (i.e., NoC access to prevent information leakage) and access protection (to verify correct operation). The trusted CPU is the gatekeeper in the interaction between the secure and the insecure layers. Thirdly, the trusted CPU is protected against from protocol- and software-level attacks via innovative low-overhead, hardware-based monitoring.Fourthly, an across-level “vertical” approachis adopted that embraces the circuit (for key generation and physical protection in thrust 1), architectural and intra-chip communication protocol level, as opposed to more conventional approaches focusing on one or a few levels of abstraction.

Thrust 2 will leverage the unique expertise and capabilities of the SOCure team, spanning from Networks on Chip to secure communication protocols and architectures. A team member from NUS (previously with MIT) has been leading the field of NoCs for almost two decades [PEH], and has developed several NoC design methodologies that will be leveraged in the architectural exploration in this thrust. Another team member has wide expertise in SoC architectures and unique experience with the development of high-level simulations [TRV] such as the popular SNIPER [SNP], as needed for the architectural exploration and security-overhead tradeoff analysis in this thrust. Another team member is an expert in lightweight protocols and anomaly detection for distributed networks, as needed for the NoC-centric approach discussed below [BPS].Another team member has expertise in architecture-level security and Trojan detection [AC]. Various industrial partners with strong expertise in the domain covered by thrust 2 are in the process of joining the team. Our international partners from the US (Princeton) [RBL] and the UK (Cambridge) [SMN] are world-renowned experts in secure architectures.

Regarding the state of the art in secure architectures, a variety of secure systems have been proposed, each being restricted to a specific class of system (general-purpose vs. embedded) or threat target. However, proposals at the system and the Network-on-Chip (NoC) level tend to be complex and hence power and area hungry, or do not provide adequate whole-platform security guarantees. For example, ARM TrustZone [A09] and Intel SGX [I14] are examples of commercially available general-purpose systems that aim to increase security and isolation of the applications. TrustZone focuses on secure/insecure access being controlled by the main processor, and assumes that the confidential on-chip data cannot be accessed externally (which is actually not true, even under non-invasive attacks). Intel SGX encrypts off-chip data at a large hardware overhead, as it creates specialized hardware enclaves below the hypervisor level for software isolation, mostly targeting cloud providers. These fault models are clearly targeted to application-level isolation and external-chip data protection, but not application data protection from inside the chip. Embedded solutions like Fulmine [CSS17] continues with this trend but only enables secure storage outside of the chip. As many-core chips interconnected with NoCs such (e.g., Intel Xeon Phi) emerged in the server and datacenter domain, the state of the art in secure NoCs naturally targeted high-performance chips, with highly sophisticated NoC designs comprising many virtual channels and buffers. For instance, SurfNoC [WGO13] ensures non-interference between different domains, by partitioning and scheduling virtual channels of links across domains with a large number of virtual channels per input port (e.g., 16 to 32) and per NoC router (e.g., 80 to 160). Such highly complex NoCsentail a very large overhead that can be justified only in massively parallel architectures for servers, and cannot be efficiently scaled downto lower classes of computing. For example, a secure NoC counteracting NoC-level information leakage (i.e., Fort-NoCs [ACR14]), a router with 5 virtual channels per port and 25 virtual channels per router take up 706mW on 45nm TSMC process, which exceeds by orders of magnitude the power budget of secure MCUs, and is comparable to the entire power consumption of smartphone processors.

In SOCure, we will investigate ultra-lightweight secure NoCsthat provide secure connectivity across the SoC, yet ensuring very low power and area overhead. In general, NoCs comprise datapath (wires and switches for actually transporting the bits), control (for handling datapath sharing between packets), and buffering (for temporary storage to deal with contention by other packets, e.g. virtual channel queues). Control and buffering are essentially overheads in NoCs, and there have been prior NoC designs that reduce them, down to NoCs that are completely buffer-less. Without buffering in the NoC, flits that contend with others can no longer be temporarily stored in the routers. As a compromise, several proposed NoCs deflect the contending flits to other ports, misrouting them [MM09], [DPC16], dropping the contending flits [HJL09], introducing a regular NoC as backup [LMJ16], or buffering and retransmitting the dropped flits at the network interface [HJL09]. As for the control, this can be offloaded to software, so the compiler or scheduler determines a contention-free schedule [TKM02], [KMM17], or an ultra-lightweight control network [DPC17]. Nevertheless, these prior works in ultra-lightweight NoCs do not handle security. In SOCure, we will leverage our broad experience in prior ultra-lightweight NoC designs [DPC16], [KMM17], [DPC17] to introduce low-overhead mechanisms for security.Further details on our proposed NoC architecture are provided below.

Regarding the state of the art in secure NoC although at large area/energy penalty, existing security mechanisms primarily focus on access control, primarily based on monitoring and analyzing data transfers. Starting with the Security Enhanced Communication Architecture (SECA) [CRR05], [PGS11], such security mechanisms use state-ful and state-less policies that use address and/or values to determine access rights. Such approaches have high computational complexity and do not scale well with the addition of IP blocks. Enhancements to policy based access control mechanisms for NoC security have been proposed in the form of encryption techniques for confidentiality and integrity [CCG11], [CCG12]. The main drawback of existing approaches in this direction are the area, energy, and latency overheads. The encryption and integrity monitoring techniques proposed in [WB11], [CL10] are aimed at code and data in main memory, and do not address the problems of access control or attacks on availability. In the context of bus-type architectures, mechanisms that exploit the broadcast nature of the bus to detect non-conformant data transfers have also been proposed [KV11]. These techniques do not extent to NoC based architectures such as those considered in this proposal. Finally, there is little attention in existing literature to authentication of IP identities during run-time.

Regarding malicious RTL modificationand bugs, numerous policy and runtime-based approaches to detect them have been proposed [BBT17], [WS10], [HSK15], although they introduce additional hardware complexity and overhead. In addition, many previous hardware processor bugs are the result of incorrect privilege escalation [HSK15], the result of which can occur when repurposing general purpose processors for security-related tasks. To address this challenge, pre-silicon software tools checking for security-related bugs or backdoors will be adopted in the digital design flow, leveraging the unique capabilities of one of our industrial partners[SIC]. Also, SOCure completely eliminates the need for escalation bug checking, as the privileged processor only runs secure softwareas explained in the following. Traditional secure architectures either combine trusted and untrusted software on custom hardware [A09], or create large trusted IP regions that can be as large as the entire chip [CSS17], which would require extensive physical hardware security to prevent eavesdropping and tampering. The approach introduced in SOCure is to minimize the surface area of the secure and insecure worlds, and to isolate untrusted IPs (Fig. 15, blue boxes) from others (both trusted and untrusted). The resulting architecture minimizes attack vectors from untrusted IPs and software, and provides a low-overhead and clear security framework that controls access to the different components of this system. Isolating privileged operations on a separate small and efficient core, we avoid the potential for security-privilege escalation bugswhen CPUs or IPs operate in both secure and insecure modes[HSK15]. Examples of trusted CPU access control include router configuration (interference control) and the restriction of access to shared resources, such as external memory or internal sensors. The secure network interfaces encrypt all the data leaving the trusted IPs, and can directly access the outside world (SRAM, Flash) without the need to reprocess or encrypt the data. In other words, secure intra-chip communications automatically assure data security also off chip.

As for the detection of untrusted IPs, existing work on the detection and mitigation of threats from hardware Trojans in untrusted IPs is primarily based on evaluating their design and activation characteristics [HFK10], [TK10], [CNB09], [WS11]. Techniques such as those proposed in [WMS13], [BS10], [ZT11] and [RCK15] are based on static validation of the IP cores, with emphasis on the detection of suspicious regions, nodes or unused circuits. While these techniques may be effective in certain scenarios, they have high time complexity and cost, and frequently exhibit significant false negatives/positives depending on the choice of test sets, threshold values, design type etc. Run-time monitors for detecting hardware Trojans have been proposed in [WS10], [DDK13]. However, these solutions are specific to microprocessor cores and are not applicable to scenarios with arbitrary IPs interconnected in a SoC.

The above challenges and limitations of prior art are addressed in SOCure by adopting the architecture illustrated in Fig. 15. From this figure,the SOCure architecture comprises trusted and untrusted entities, and the NoC handles traffic within the trusted region, between trusted region and untrusted IPs, as well as with off-chip memories through untrusted memory controllers. Such security properties are supported by an ultra- lightweight NoC design, given the tight power and area constraints of at the low end of the computing scale spectrum. These tradeoffs prompt us to propose a bare metal buffer-less NoC architecture where scheduling and control is offloaded to the compiler and the OS scheduler running on the trusted CPUs, which lies within the Trusted regions where applications’ communications are known in advance. The software-scheduled NoC will essentially be composed of just the data path (wires and crossbar switches), with switch settings configured for each set of applications by the scheduler. This allows for a NoC that can be pushed to maximum throughput by the schedule, yet remain buffer-less and without control logic. The setup of the NoC switches is initiated by the trusted CPUs and the data is encrypted at the NoC interfaces and wrappers by a lightweight block cipher engine with single-cycle latency and ultra-low power consumption (see below). Accordingly, the data path remains encrypted throughout the NoC transmission. The data path wires and switches can be readily partitioned in the floorplan, and are physically protected to prevent temperature snooping, eavesdropping and tampering.

The encryption across the NoC and for off-chip communication needs to be performed with ultra-lightweight crypto-engines, as they are present in each NoC router and they are constantly between the sender and the receiver in any on-chip transaction. To this aim, we will use the recently proposed Simon block cipher [BSS13], which is relatively simple and can be easily accommodated within a single clock cycle for all practical architectures. Their area and energy efficiency is substantially better than AES [DR02]. Based on our recent estimates in CMOS 40nm under a set of novel energy reduction techniques, an energy of 0.1pJ/bit is achievable, leading to a power consumption in the order of 1mW at 500MHz, which is 10-100x lower than the power target for the low-power trusted processor. This methodology will enable large numbers of connected IPs without a significant power overhead

While applications are typically known beforehand on a secure platform, and thus the application communication flow can be pre-characterized, interactions between IP blocks and dynamic aspects of off-chip memory traffic will lead to some portion of on-chip traffic that is unpredictable at compile time. We will thus explore a buffer-less control network where ordering and dependencies between traffic flows can be captured as tokens, with the tokens triggering switch configuration across the NoC. The control network will similarly be tamper-proofed, and will leverage our prior work on buffer-less ordering NoCs [DPC17].In addition to system-level protection with encryption and lightweight access control, the security of the lightweight trusted processors will be hardened through lightweight monitoring, access protection, and authenticated encrypted software. As they act as the interface between the trusted and untrusted components, additional measures are needed to protect against replay attacks, buffer overflows, and other software vulnerabilities. Monitoring of software in hardware is continuous, providing cycle-level protection, as well as much more efficient compared to software-only techniques. Also, the integrity of the NoC and the trusted processors after manufacturingwill be checked through the common practice of reverse engineering, i.e. by delayering and imaging the chip to verify the perfect correspondence to the netlist of the original design. The cost of reverse engineering is now relatively low, by virtue of the availability of low-cost SEM microscopes (see “Landscape, trends and motivation” section), and is typically in the order of very few tens of k$/mm2 or lower.

In SOCure, robustness against attacks from untrusted on-chip IPsis based on the assumption that keys are generated locally in each router (using a physically secure PUF –see thrust 1), and the trusted CPU is able to securely manage the exchange of temporary session keys among IPs at run-time or chip boot time. This requires the adoption of secure communication protocols over the non-secured NoC, as discussed in the following. As in Fig. 15, the trusted CPU determines the trust in the presence of untrusted IPs. The trusted CPU manages keys, facilitates secure communication between IPs, implements security policies, and also serves as a detection agent for attacks launches by the IPs (e.g. DoS attacks on the NoC or an IP). To assist the trusted CPU in its operations, trusted NoC routers are adopted with encryption capability at the interface between each IP and the switch connecting it to the NoC. Since IPs may be untrusted, the security functionalities such as encryption, key exchange etc. for each IP will be handled by the trusted routers in the NoC switch to which the IP connected to. The routers are connected to the trusted CPU through point-to-point links (secure NoC in Fig. 15 in red line) that are physically secure. Since changes in the control policy are infrequent, these links may be just serial to keep their area/energy cost insignificant. The functions of the architecture and the related protocol are as follows:

All cryptographic keys are handled only by the trusted routers and CPU(s). The initial key exchange between the trusted CPU and each trusted router will be done at testing time. Each router is equipped with a one-time readable PUF (see thrust 1) that is read by the trusted CPU to setup a challenge-response pair (CRP) associated with the router. This initial CRP will be used by the router to facilitate the setup or update of cryptographic keys during the operational phase. When two IPs wish to communicate during the operational stage, a session key will be setup between then with the help of the trusted CPU. The IP initiating the communication will request the trusted CPU to set up a session key (through the trusted router that it is connected to). Then, the trusted CPU proceeds with the request, based on the privileges and security policies.

All intra-chip communications are encrypted using a lightweight Simon crypto core, ensuring the confidentiality of the messages and counteracting eavesdropping, man-in-the-middle and replay attacks. In addition, time- and space-based partitioning on the secure NoC will ensure that IPs that are not party to an ongoing message exchange will not have access to any contents of the messages, including the headers.3. ACCESS CONTROL. Access to resources (e.g. registers, memory locations) requested by any IP will be routed through the trusted CPU to ensure conformance with security policies. The trusted CPU also sets up the policies for routing tables in the NoC in order to provide isolation to data transfers. 4. REAL-TIME PROTECTION. The trusted routers and CPU(s) have features to facilitate the monitoring of the network activities due to each IP, in order to detect attacks and policy violations. For example, the routers monitor and report traffic metrics (e.g., delays experienced by packets) to the CPU to detect and counteract DoS attacks.

The fundamental novelty of the proposed intra-chip communication scheme is in the distributed nature of hardware security primitives coupled with a software-defined centrally controlled communication architecture, to ensure security in the presence of untrusted IPs. As second element of novelty of this communication scheme, the proposed architecture always operates in the secure mode(e.g. all packets are encrypted and turning it off is not an option), unlike existing solutions where the operation of the SoC may switch between secure and unsecure modes. Thirdly, the physically-based authentication and communication mechanism can be modified during the lifecycle of the SoC, thus allowing the interesting property of upgrade-ability. In other words, if a hardware vulnerability is discovered, or some security policy is discovered to be too restrictive, the security policy defined in the trusted CPU can be modified over time.

The trusted CPU in Fig. 15 is the other pillar of the proposed architecture, and its security assurance is a major challenge owing to various forms of vulnerabilities that a system can be exposed to, across design layers. In state-of-the-art designs, the trusted CPU is usually built of a Trusted Execution Environment (TEE), which is a hardened, tightly controlled and usually limited execution environment in the processor designed to run critical secure services and protect critical assets. TEE protects the confidentiality and the integrity of code and data loaded into it, so that the applications running in the Rich Execution Environment (REE) will not be able to tamper with it. The hardening of TEE and its separation with REE is a daunting task given multiple applications requesting the cryptographic services, and inevitable sharing of resources due to that. Existing solutions for TEE can be classified into three categories. First, the application runs in an encrypted enclave (e.g., Intel SGX) sharing the secure hardware with the insecure applications. Second, virtual machines run in an encrypted memory (e.g., AMD SEV). As third class, a virtual CPU is used to clearly separate the operations between a secure CPU and a normal CPU (e.g., ARM TrustZone). This hard separation advocated in the last approach is clearly advantageous, but also stops short of absolute security due to the semantic gap between the two modes of operations. Besides, to cater to the lightweight design segments, the ARM TrustZone does not include any built-in cryptographic capabilities and secure non-volatile memory, although they would be required for services such as secure boot, key and data sealing and remote attestation.Ideally, the trusted CPU needs a clear semantic translation or, even better, support the same Instruction-Set-Architecture (ISA) to run the TEE and the REE execution. Secondly, full support of cryptographic acceleration with minimal performance/power overhead is very important. Thirdly, the TEE system needs to be connected to the root of trust through a secure and robust chain of trusted operations. Fourthly, the trusted CPU operation needs to be protected against passive and active side channel attacks.

The above four capabilities will be pursued for the trusted CPU in SOCure by introducing new methods. In particular, an open-source architecture based on an open architecture(e.g., RISC-V ISA) will be used as a test vehicle, due to its widespread adoption in recent years and the strong interest of industry. An example of the system-level view of the trusted CPU operations is shown in Fig.16.Besides the side-channel attack-resistance (which is addressed in thrust 1), two directions will be explored in this thrust, as discussed in the following. First, new techniques to protect the trusted CPU against malicious activity arising from the debug interfacewill be investigated, being the related ports backdoors that allow intrusion during the lifetime of the device. This will be achieved through a novel protocol involving authenticated debugger and built-in key management schemes. Second, the trusted CPU will provide a low-level resistance against malware/ransomware by utilizing the hardware performance counters (HPCs) of the REE and TEE. The dataset recorded from the HPCs will be used to train an artificial neural network (ANN) under the normal application execution scenario, which will then be used to identify anomaliesand malicious applications at execution time. Depending on the results of the analysis of benchmarks, on-chip acceleration will be considered if the performance overhead exceeds the expectations (e.g., in the order of percentage points of the nominal throughput). The overall aim of this exploration is to achieve a targeted level of trustworthiness, with the constraint of minimizing silicon area and the performance/energy overhead. The design will be benchmarked against the targeted attack scenarios (for security) and the baseline design (for overhead quantification).

The design of trusted CPU brings forth the following novel propositions:
 Protecting scan-chain and debug interface from malicious attackers through cryptographic protocols
• Utilizing hardware performance counters for malware/ransomware detection has been recently proposed by our team [ABM18], with excellent preliminary results. This new idea will be widely explored in SOCure, in the context of HPC-trained ANN structure in lightweight processors.
 The entire chain-of-trust through the secure boot operation will be holistically investigated and designed, along with the assurance of memory integrity and confidentiality. Accordingly, a comprehensive analysis of the entire protocol will be performed, instead of mainly relying on a root-of-trust provision through key management. This will leverage the synergy between the creation of the root of trust in thrust 1, and the architecture-protocol innovation in thrust 2.

Finally, architectural support for run-time hardware-based memory isolation enforcement will also be investigated, in order to prevent software side-channel attacks on on-chip memories, which is known as a software-level threat that requires hardware solutions [P05], [OST06], [B05]. These attacks aim to retrieve confidential data from an area of the memory that a malicious app is not supposed to have access to (as enabled by fault attacks, among the others). As main research direction, low-power Content-addressable memories (CAMs) will be explored to introduce spatial randomization of on-chip memory access, as well as SRAMs empowered with single-cycle flush capability (i.e., erasure of entire banks in a single cycle). The first capability permits to break the deterministic relation between data and the physical address, thus preventing attackers from locating sensitive information in the memory. The second capability permits to quickly release portions of unused memories, so that malicious SW applications cannot read data coming from applications that have been previously executed, while avoiding the prohibitive time penalty of sequentially erasing traditional memories. These capabilities prevent the attacker from locating sensitive memory data and from accessing confidential data previously generated by other applications, as required to counteract SW side-channel attacks.

Regarding the deliverables (see details in Section 5), thrust 2 pursues the exploration, design, refinement and evaluation of the architectural system design through cycle-level simulations of from block level to the entire system level. This is a routine approach that is followed in the validation of architectural-level security, as it offers the ability to simulate large systems in a reasonable time (e.g., 50MHz of equivalent clock cycle) by leveraging massively parallel over-the-cloud computing services (e.g., Amazon AWS). In particular, simulation on servers and FPGAs will be the primary means to understand the system properties of interest, including the robustness against the various types of attacks at the architectural and protocol level. Traditional servers can provide performance, power and area results and are good for use while in development. For the evaluation phase, as well as software development, the FPGA hardware is significantly faster and allows for rapid development and evaluation of the system. As detailed in Section 5, simulations will be performed by describing the system with varying levels of accuracy, initially at the architecture level, and progressively down to cycle-level and then production of and RTL design, which is then usable in the chip design flow. With feedback from lower-level simulations in the chip design environment, the timing and power models of both the non-secure and secure designs will be evaluated, in order to quantify the overhead imposed by the various proposed solutions, and explore the related tradeoffs for each of them. This thrust will also contribute designs RTL for chip-level demonstrators, and in particular on critical blocks that need to be experimentally characterized on silicon (i.e., NoC, crypto-engine modules that are robust against side-channel attacks, a core such as MSP430 or lowRISC, core robust against Trojan injection).

Regarding the collaboration with RISE, the research work related to thrust 2 will be focused on re-engineering hardware fundamentals of IoT processor design. To this aim, we will exploit prior DARPA-funded work by Cambridge/SRI/Arm on CHERI for 64-bit cores. Scaling down to 32-bit processors, we will explore how fine-grained CHERI memory protection composes with microcontroller-facing Memory Protection Units (MPUs). This contrasts with 64-bit CHERI + Memory Management Units (MMUs) in application-class processors that support complex virtualisation of the address space (relocating accesses as well as controlling their use) not used in microcontrollers. Throughout this work, we will collaborate closely with IoT-facing vendors including ARM Research based on Cambridge.

In regard to the collaboration with Technion, the joint research activity in this thrust will be centered around using debug/testing interfaces as side channel. Reverse engineering of a VLSI device is a complex task that traditionally requires tedious work and expensive equipment. The ultimate goal of the reverse engineering process is to discover its underlying algorithm. The scope of this joint research addresses the extraction of the circuit from the physical device, such as removing the package, performing cross-section, delayering, and imaging of nanoscale. Then, techniques will be explored in the context of IP theft prevention, detecting HTH, malware and ransomware, and using scan side channel and machine learning techniques to detect unique signatures due to malicious circuit activities.