Chip gallery



Reconfigurable microcontroller/memory organization for energy-performance extension beyond voltage scaling

This work introduces reconfigurable thread count augmentation for existing microcontroller architectures, and row aggregation for their dedicated SRAM memory, to extend their energy-performance tradeoff beyond traditional voltage scaling, while at minimal design effort (“drop-in”). The proposed techniques are architecture-agnostic as the added reconfigure-ability does not modify the original instruction execution down to the cycle level. Reconfiguration permits to occasionally boost the throughput of simple architectures that were originally not conceived to allow multi-thread operation, while allowing the original single-thread operation in less performance-critical tasks. From a design viewpoint, thread count augmentation is fully automated and directly manipulates the gate-level netlist of an existing single-thread processor, allowing its application to commercial Intellectual Property cores (even if obfuscated by the IP vendor). Similarly, SRAM row aggregation can be applied on commercially compiled 6T SRAM arrays with minor modification in the row decoder. A 40nm ARM Cortex-M0 testchip shows 1.8X (1.4X) core (memory) performance boost beyond a baseline at nominal voltage, 1.4X lower minimum energy point at only 16% (4%) area (timing) overhead, and lowest energy/cycle to date. 



First Physically Unclonable Function with design margin reduction via in-situ and PVT sensor fusion for low-cost hardware security

This work introduces a Physically Unclonable Function-based (PUF) key generation scheme with run-time in-situ instability detection and process/voltage/temperature (PVT) sensors. Such sensors are fused to evaluate the sufficient number of correction bits NECC required by Error Correcting Code (ECC) to make the PUF output stable, and meet a given bit error rate target. Run-time sensing overcomes the substantial ECC energy penalty associated with the traditional design-time margin of NECC for worst-case word, die, voltage and temperature. ECC with tunable NECC is introduced to enable energy saving in typical cases where NECC is lower than its worst-case value. Sensor fusion via simple linear regression estimates the required NECC at run-time. A testchip in 40nm demonstrates the concept, based on a static monostable current mirror PUF with NECC = 0…4. Average energy reduction by 1.8X is shown compared to a traditional margined design, at an area overhead of less than 20%. As additional benefit of adjustable NECC, such energy savings can be further expanded under applications having less stringent stability requirements.



First standard cell-based DAC architecture with 16-bit resolution for ultra-compact and technology-portable on-chip calibration

Ultra-compact, high-resolution, standard cell-based DACs based on the Dyadic Digital Pulse Modulation (DDPM) are presented. As fundamental contribution, an optimal sampling condition is analytically derived to enhance conversion with inherent suppression of spurious harmonics. Operation under such optimal condition is experimentally demonstrated to assure resolution up to 16 bits, with 9.4-239X area reduction compared to prior art. The digital nature of the circuits also allows extremely low design effort in the order of 10 man-hours, portability across CMOS generations, and operation at the lowest supply voltage reported to date. A DAC for DC calibration achieving 16-bit resolution with 3.1-LSB INL, 2.5-LSB DNL, 45µW power, at only 530µm2 area is demonstrated in 40nm CMOS.


First energy-quality scalable Network on Chip with best-in-class energy (down to 6.9fJ/bit), while still being using conventional low-swing transmitter/receiver circuits

A new class of ultra-low energy on-chip links is introduced. Through the use of sub-word ranking and non-uniform swing, the proposed links allow graceful energy-quality tradeoff in intra-chip communication links for noise-resilient applications such as machine learning and video processing. The proposed techniques are demonstrated in a 28nm testchip that achieves up to 4.5X energy saving over conventional full-quality links, and up to 2.2X over approximate links at iso-quality. Conventional operation with no quality degradation is also allowed for data packets that require full quality.


First system architecture for widely adaptive and power-scalable MCU/PMU from sub-mW to nW, enabling battery-less and battery-indifferent operation

The proposed integrated system architecture consists of a power management unit (PMU) driving a microcontroller, and controlling a novel power knob that enables adaptation to the sensed power availability over an ultra-wide range, well beyond voltage scaling and down to nW level. Conventional battery-powered operation is augmented with pure harvesting. Wide power adaptation is enabled by comparator delay self-biasing and zero-current switching scheme shared among all power modes with single-cycle convergence.

DAC with graceful degradation

First DAC architecture with digital-like shrinking under scaled technologies, and exhibiting graceful degradation under voltage/frequency overscaling

The proposed DAC allows very low design effort, enables digital-like shrinkage across CMOS generations, low area at down-scaled technologies, and operation down to near-threshold voltages. The proposed DAC can operate at supply voltages that are significantly lower and/or at clock frequencies that are significantly greater than the intended design point, at the expense of moderate resolution degradation. In a 12-bit 40-nm testchip, graceful degradation of 0.3bit/100mV is achieved when  is over-scaled down to 0.8V, and 1.4bit/100mV when further scaled down to 0.6V. 
The proposed DAC enables dynamic power-resolution tradeoff with 3X (2X) power saving for 1-bit resolution degradation at iso-sample rate (iso-resolution).

Relaxation oscillator for sensor nodes with lowest power to date (pW-range), operating under 0.3V-1.8V unregulated supply without any reference/bias circuitry

A pW-power versatile relaxation oscillator operating from sub-threshold (0.3V) to nominal voltage (1.8V) is presented, having Hz-range frequency under sub-pF capacitor. The wide voltage and low sensitivity of frequency/absorbed current to the supply allow the suppression of the voltage regulator, and direct powering from harvesters (e.g., solar cell, thermal from machines) or 1.2-1.5V batteries. A 180nm testchip exhibits a frequency of 4Hz, 10%/V supply sensitivity at 0.3-1.8V, 8-18pA current, 4%/°C thermal drift from -20°C to 40°C.


The first microcontroller (MSP430) that can operate at the minimum-energy or the minimum-power point, with minimum power of 595pW (purely harvested in minimum-power mode)

This work presents an MSP430-compatible microcontroller with dual-mode standard cells enabling minimum-power and minimum-energy mode in 180nm. Minimum-power mode with sub-leakage power (595pW) allows purely energy harvested operation with sub-mm2 harvester. Minimum-energy mode (14-33pJ/cycle) maximizes battery lifetime, when battery-powered. Power management with ripple power gating self-startup allows cold start with on-chip 0.54mm2 solar cell at 55lux light condition.


First sub-mW feature extraction engine for ubiquitous computer vision and IoT

An energy-quality scalable (EQSCALE) feature extraction accelerator for IoT vision applications is presented. Knobs are introduced to dynamically adjust the tradeoff between energy and feature extraction quality, leveraging the intrinsic redundancy in video frames and the robustness of object recognition against missing features. The active area of the accelerator is 0.55mm2. EQSCALE enables at least 5.7X energy improvement and 1.8X area reduction over state-of-the-art accelerators. To the best of our knowledge, EQSCALE is the first feature extraction accelerator operating in the sub-mW range (0.51mW at VGA resolution and 30 fps, and 0.19mW at 5 fps).


First fully-synthesizable PUF ("PUF design in a day") and active temperature compensation with native 2.8% BER, 1.02fJ/b at 0.8-1.0V in 40nm

A fully-synthesizable Physically Unclonable Function (PUF) with hysteresis-enhanced stability and active compensation of temperature variations is proposed. To reduce undesired bit flips, hysteretic behavior is obtained through the insertion of a Muller C-element output stage. A feedback scheme is also introduced to compensate the effect of temperature variations at run time. Native worst-case BER of 2.8% is measured under 0.8-1.0V and 25-85°C, with instability degradation with temperature being 0.15% per 10°C. The PUF bitcell consumes 1.02fJ/b at 0.9V. This PUF can be designed with fully automated standard cell-based flows, thus enabling substantial design effort reduction compared to prior art based on custom design styles.


First reconfigurable microarchitecture down to the pipestage level for wide energy/voltage scaling (demonstration on FFT engine)

Dynamically adaptable pipelines with its full integration with automated digital flows at design time and with dynamic voltage scaling schemes at run time is demonstrated with a 256-point radix-4 fixed-point FFT engine on a 40-nm test chip. Measurements show energy savings up to 30% (38%) at iso throughput (iso-voltage). Area and worst-case performance penalty are 5% and 11%, respectively.


First demonstration of reconfigurable clock networks for adaptation under wide voltage scaling

A reconfigurable clock network design for operation from sub-threshold to nominal voltage is presented. The number of levels is adjusted with more levels at nominal voltage to mitigate the impact of wire delay, and fewer in sub-threshold to mitigate the dominant random skew due to repeaters. Clock skew is reduced by up to 2.5 standard deviations and enables 110mV Vmin reduction at 1.8% area penalty in an FFT 40nm testchip, compared to traditional clock networks.


PUF chip (2014)

15-fJ/bit Static Physically Unclonable Functions for Secure Chip Identification with <2% Native Bit Instability and 140X Intra/Inter PUF Hamming Distance Separation in 65nm

A static class of Physically Unclonable Functions for secure key generation and chip identification is presented. Energy down to 15 fJ/bit is achieved, key reproducibility and uniqueness meet inter/intra-PUF Hamming distance separation of 140X or greater, randomness passes all NIST tests. Native unstable bits are less than 2% at nominal conditions and less than 5% in 0.7-1 V voltage and 25-85 oC temperature range, before applying any further post-silicon technique for stability enhancement.