Chip gallery


(posted after ISSCC 2021) 
Unified In-Memory Dynamic (TRNG) and Multi-Bit Static (PUF) Entropy Generation for Ubiquitous Hardware Security

(this will be fully disclosed at ISSCC 2021)

(posted after ISSCC 2021) 

Capacitance-to-Digital Converter for Operation under Uncertain Harvested Voltage down to 0.3V with No Trimming, Reference and Voltage Regulation

(this will be fully disclosed at ISSCC 2021)

Fully-Digital Rail-to-Rail OTA with Sub-1,000 µm2 Area, 250-mV Minimum Supply and nW Power at 150-pF Load in 180nm

A fully-digital operational transconductance amplifier (DIGOTA) architecture for tightly energy-constrained low-cost systems is presented. A 180nm DIGOTA testchip exhibits an area below the 1,000-µm2 wall, and 2.4-nW power under 150pF load, and a minimum supply voltage Vmin of 0.25 V. In the 0.3-0.5 V supply range, DIGOTA improves the area-normalized small (large) signal energy FoM by at least 836X (267X) over prior sub-500mV OTAs, while reducing area by 27-85X. The low-Vmin and nW-power features are shown to enable direct harvesting at the mm scale.

level shifter

A Robust, High-Speed and Energy-Efficient Ultralow-Voltage Level Shifter

This work presents a robust level shifter design able to convert input voltages from the deep sub-threshold regime (about 100 mV) up to the nominal supply voltage (1.8 V). The proposed circuit is based on a self-biased low-voltage cascode current mirror (CM) topology that features diode-connected PMOS and NMOS transistors to drive the split-input inverting buffer used as output stage with high energy efficiency. Experimental results across corner wafers demonstrate the effectiveness of the proposed level shifter as compared to prior art. The proposed circuit allows a voltage up-conversion from a 0.4-V 100-kHz input pulse to 1.8 V with an average switching delay of 7.6 ns and an average energy per transition of only 69 fJ. This is achieved at an area of 82 µm2 for a standard cell-based design.



Fully-Synthesizable All-Digital Unified Dynamic Entropy Generation, Extraction and Utilization within the Same Cryptographic Core

This work introduces a novel class of fully-synthesizable all-digital True Random Number Generators (TRNGs) using the same private-key cryptographic core for raw dynamic entropy generation, its extraction via post-processing, and its utilization as crypto-key for constrained secure systems. Endogenous random bit generation is achieved via clock pulsewidth overstretching in the digital implementation of private-key cryptographic algorithms using pulsed-latch pipelines, leveraging inherent Shannon confusion and diffusion.
Demonstration on a 40-nm testchip based on a SIMON cryptographic core shows 64-bit key encryption down to 0.25 pJ/bit at 0.45 V, random number generation with cryptographic-grade entropy at 2.5 pJ/bit across manufacturing lots, dice, voltages and temperature corners. The overall area is kept well below the 1E6 F2 area wall (F = minimum feature size).


Broad-Purpose In-Memory Computing for Signal Monitoring and Machine Learning Workloads

In this work, a broad-purpose compute-in-memory solution (±CIM) able to handle arbitrary sign in both inputs/features and weights/coefficients is introduced. The ability to operate on arbitrary sign and under variable precision on both operands enables a wide range of applications, ranging from conventional neural networks to digital signal processing and monitoring. The ±CIM pipelined architecture, the reconfigurable row encoder and the adoption of a commercial 2-port bitcell allow uninterrupted memory availability for conventional read/write, even when performing in-memory computations. A 40nm testchip shows the ability of the ±CIM architecture to perform both neural network computations and classical signal processing. At 6-bit precision, the measured worst-case mismatch (noise) is 0.38 (0.62) LSB. The achieved accuracy when executing a LeNet-5 neural net workload is 98.3%, which is within 1.3% of state-of-the-art software implementations. As example of signal processing workload, 91.7% accuracy is achieved in voice activity detection, which is within 2.8% of a software implementation. Overall, the energy efficiency (throughput) of 41 TOPS/W (122 GOPS) is achieved at 38% area overhead, over a conventional SRAM with the same 4-KB capacity.



Voice Activity Detection with >83% Accuracy under SNR down to -3dB at 1.19µW and 0.07mm2 in 40nm

This work presents a voice activity detector for keyword spotting in self-powered speech interfaces with sub-syllable latency. A simple decision stump classifier and time averaging are introduced to provide >83% accuracy in noisy environments with SNR down to -3dB for reliable operation under a wide range of usage contexts (8-15dB lower than prior art). 1.19-µW power and 0.07mm2 area are shown in 40nm.

moonlight harvested sensor node

Multi-Sensor Platform with Five-Order-of-Magnitude System Power Adaptation down to 3.1nW and Sustained Operation under Moonlight Harvesting (Always-On even without Battery)

A sensor node with system power tuning is presented for 5-order-of-magnitude adaptation to harvested power. Coordinated tuning of unified voltage/capacitive/light sensor interface, MCU and direct MPPT with no intermediate power conversion scales system power to 3.1nW at 0.3V. Operation at 1lux (moonlight) with 4.1×4.1mm2 light harvester is shown.

The power (frequency) dynamic range is 110,000× and down to 480 pW (50,000× up to 2 MHz). Power-speed scaling of the sensor interface is similar to the MCU across 5 orders of magnitude, across which no sub-system sets a rigid power floor down to 3.1nW. The proposed platform operates under direct harvesting with a solar cell with 4.1mm×4.1mm active area down to 1lux, corresponding to moonlight harvesting for the first time. This paves the way for next-generation battery-light and battery-less always-on systems that do not miss any physical event in spite of the highly-fluctuating nature of energy harvesting.


battery-less voltage reference (0.25 V, 5.3 pW)

Voltage Reference with Lowest Operating Voltage down to 0.25 V and pW Power for Direct Harvesting and Battery-Less Systems

This work introduces a compact voltage reference operating at pW-power and 250-mV supply (e.g., direct harvester-powered). Body biasing assisted by replica biasing enables 25µV/oC temperature coefficient, 140µV/V line sensitivity, and 0.42mV process sensitivity in 180nm. 2.55-mV overall accuracy is achieved at 2,200µm2 area, without trimming. Operation at such low voltage and power introduces the capability to suppress the power-hungry intermediate DC-DC conversion stage of conventional sensor node architectures, and suppression of the battery altogether.


Ultra-Compact Current- and Voltage-Input Analog-to-Digital Converters with Minimal Design Effort ("ADC in a day")

Fully-synthesizable Successive Approximation Register (SAR) Analog-to-Digital Converters (ADCs) suitable for low-cost integrated systems are proposed both for voltage and current input. The proposed fully-digital ADC architectures enable low-effort design, silicon area reduction, and voltage scaling down to the near-threshold region. Compared to traditional analog-intensive designs, their digital nature allows easy technology and design porting, digital-like area shrinking across CMOS technology generations, and also drastically reduced system integration effort through immersed-in-logic ADC design.
The voltage-input ADC architecture is demonstrated with a 40-nm testchip showing 3,000-μm2 area, 6.4-bit ENOB, 2.8kS/s sampling rate, 40.4dB SNDR, 49.7dB SFDR, and 3.1μW power at 1V. A current-input ADC is also demonstrated for direct current readout without requiring a trans-resistance stage. 40-nm testchip measurements show a 5-nA to 1-μA input range, 4,970μm2 area, 6.7-bit ENOB and 2.2-kS/s sample rate, at 0.94-μW power. Compared to the state of the art, the proposed ADC architecture exhibits the highest level of design automation (standard cell), lowest area, and the unique ability to cover direct acquisition of both voltage and current inputs, suppressing the need for transresistance amplifier in current readout.


Low-Energy Voice Activity Detection via Energy-Quality Scaling from Data Conversion to Machine Learning

In this work, voice activity detection (VAD) systems with system-level energy-quality (EQ) scaling have been demonstrated. Compared to prior single-knob EQ scaling, multiple EQ knobs are selectively inserted into the entire signal chain from end to end (i.e., from data conversion to classification). EQ knobs are dynamically co-optimized to minimize energy for a given quality target. Multi-knob energy-quality scaling makes quality degradation more graceful than single-knob, allowing for more aggressive energy reduction under a given quality target, while retaining the ability to operate at full quality. Also, proper system-level EQ optimization enhances fitting in machine learning-based systems (e.g., decision tree-based), suppressing both underfitting and overfitting. Measurements on a 28nm testchip show that system-level EQ scaling can reduce energy by up to 3.5X at 2% accuracy degradation in 10-dB noise, compared to full quality. Iso-technology comparison shows that the minimum energy of 51.9 nJ/frame is lower than prior art by 1.9-74.4X at comparable speech/non-speech hit rates.



Deep sub-pJ/bit sub-10^6 F^2 energy-security scalable SIMON crypto-core

This work introduces an energy-security scalable crypto-core for private-key cryptography in low-end sensor nodes based on SIMON cipher. Energy and area footprints are reduced through techniques at the algorithm, microarchitectural and gate level.
The 40 nm testchip shows energy down to 0.31 pJ/bit at 0.45 V with 64-bit key and 0.79E6 F2 area (F = process minimum feature size). The proposed crypto-core is well suited for ubiquitous security in energy/area-constrained platforms (e.g., low-end sensor nodes, RFIDs), while preserving full 256-bit security when necessary.



Reconfigurable microcontroller/memory organization for energy-performance extension beyond voltage scaling

This work introduces reconfigurable thread count augmentation for existing microcontroller architectures, and row aggregation for their dedicated SRAM memory, to extend their energy-performance tradeoff beyond traditional voltage scaling, while at minimal design effort (“drop-in”). The proposed techniques are architecture-agnostic as the added reconfigure-ability does not modify the original instruction execution down to the cycle level. Reconfiguration permits to occasionally boost the throughput of simple architectures that were originally not conceived to allow multi-thread operation, while allowing the original single-thread operation in less performance-critical tasks. From a design viewpoint, thread count augmentation is fully automated and directly manipulates the gate-level netlist of an existing single-thread processor, allowing its application to commercial Intellectual Property cores (even if obfuscated by the IP vendor). Similarly, SRAM row aggregation can be applied on commercially compiled 6T SRAM arrays with minor modification in the row decoder. A 40nm ARM Cortex-M0 testchip shows 1.8X (1.4X) core (memory) performance boost beyond a baseline at nominal voltage, 1.4X lower minimum energy point at only 16% (4%) area (timing) overhead, and lowest energy/cycle to date. 



First Physically Unclonable Function with design margin reduction via in-situ and PVT sensor fusion for low-cost hardware security

This work introduces a Physically Unclonable Function-based (PUF) key generation scheme with run-time in-situ instability detection and process/voltage/temperature (PVT) sensors. Such sensors are fused to evaluate the sufficient number of correction bits NECC required by Error Correcting Code (ECC) to make the PUF output stable, and meet a given bit error rate target. Run-time sensing overcomes the substantial ECC energy penalty associated with the traditional design-time margin of NECC for worst-case word, die, voltage and temperature. ECC with tunable NECC is introduced to enable energy saving in typical cases where NECC is lower than its worst-case value. Sensor fusion via simple linear regression estimates the required NECC at run-time. A testchip in 40nm demonstrates the concept, based on a static monostable current mirror PUF with NECC = 0…4. Average energy reduction by 1.8X is shown compared to a traditional margined design, at an area overhead of less than 20%. As additional benefit of adjustable NECC, such energy savings can be further expanded under applications having less stringent stability requirements.



First standard cell-based DAC architecture with 16-bit resolution for ultra-compact and technology-portable on-chip calibration

Ultra-compact, high-resolution, standard cell-based DACs based on the Dyadic Digital Pulse Modulation (DDPM) are presented. As fundamental contribution, an optimal sampling condition is analytically derived to enhance conversion with inherent suppression of spurious harmonics. Operation under such optimal condition is experimentally demonstrated to assure resolution up to 16 bits, with 9.4-239X area reduction compared to prior art. The digital nature of the circuits also allows extremely low design effort in the order of 10 man-hours, portability across CMOS generations, and operation at the lowest supply voltage reported to date. A DAC for DC calibration achieving 16-bit resolution with 3.1-LSB INL, 2.5-LSB DNL, 45µW power, at only 530µm2 area is demonstrated in 40nm CMOS.


First energy-quality scalable Network on Chip with best-in-class energy (down to 6.9fJ/bit), while still being using conventional low-swing transmitter/receiver circuits

A new class of ultra-low energy on-chip links is introduced. Through the use of sub-word ranking and non-uniform swing, the proposed links allow graceful energy-quality tradeoff in intra-chip communication links for noise-resilient applications such as machine learning and video processing. The proposed techniques are demonstrated in a 28nm testchip that achieves up to 4.5X energy saving over conventional full-quality links, and up to 2.2X over approximate links at iso-quality. Conventional operation with no quality degradation is also allowed for data packets that require full quality.


First always-on system architecture for widely adaptive and power-scalable MCU/PMU from sub-mW to true nW, enabling battery-less and battery-indifferent operation

The proposed integrated system architecture consists of a power management unit (PMU) driving a microcontroller, and controlling a novel power knob that enables adaptation to the sensed power availability over an ultra-wide range, well beyond voltage scaling and down to nW level. Conventional battery-powered operation is augmented with pure harvesting. Wide power adaptation is enabled by comparator delay self-biasing and zero-current switching scheme shared among all power modes with single-cycle convergence.

DAC with graceful degradation

First DAC architecture with digital-like shrinking under scaled technologies, and exhibiting graceful degradation under voltage/frequency overscaling

The proposed DAC allows very low design effort, enables digital-like shrinkage across CMOS generations, low area at down-scaled technologies, and operation down to near-threshold voltages. The proposed DAC can operate at supply voltages that are significantly lower and/or at clock frequencies that are significantly greater than the intended design point, at the expense of moderate resolution degradation. In a 12-bit 40-nm testchip, graceful degradation of 0.3bit/100mV is achieved when  is over-scaled down to 0.8V, and 1.4bit/100mV when further scaled down to 0.6V. 
The proposed DAC enables dynamic power-resolution tradeoff with 3X (2X) power saving for 1-bit resolution degradation at iso-sample rate (iso-resolution).

Relaxation oscillator for sensor nodes with lowest power to date (pW-range), operating under 0.3V-1.8V unregulated supply without any reference/bias circuitry

A pW-power versatile relaxation oscillator operating from sub-threshold (0.3V) to nominal voltage (1.8V) is presented, having Hz-range frequency under sub-pF capacitor. The wide voltage and low sensitivity of frequency/absorbed current to the supply allow the suppression of the voltage regulator, and direct powering from harvesters (e.g., solar cell, thermal from machines) or 1.2-1.5V batteries. A 180nm testchip exhibits a frequency of 4Hz, 10%/V supply sensitivity at 0.3-1.8V, 8-18pA current, 4%/°C thermal drift from -20°C to 40°C.


The first microcontroller (MSP430) that can operate at the minimum-energy or the minimum-power point, with minimum power of 595pW (purely harvested in minimum-power mode)

This work presents an MSP430-compatible microcontroller with dual-mode standard cells enabling minimum-power and minimum-energy mode in 180nm. Minimum-power mode with sub-leakage power (595pW) allows purely energy harvested operation with sub-mm2 harvester. Minimum-energy mode (14-33pJ/cycle) maximizes battery lifetime, when battery-powered. Power management with ripple power gating self-startup allows cold start with on-chip 0.54mm2 solar cell at 55lux light condition.


First sub-mW feature extraction engine for ubiquitous computer vision and IoT

An energy-quality scalable (EQSCALE) feature extraction accelerator for IoT vision applications is presented. Knobs are introduced to dynamically adjust the tradeoff between energy and feature extraction quality, leveraging the intrinsic redundancy in video frames and the robustness of object recognition against missing features. The active area of the accelerator is 0.55mm2. EQSCALE enables at least 5.7X energy improvement and 1.8X area reduction over state-of-the-art accelerators. To the best of our knowledge, EQSCALE is the first feature extraction accelerator operating in the sub-mW range (0.51mW at VGA resolution and 30 fps, and 0.19mW at 5 fps).


First fully-synthesizable PUF ("PUF design in a day") and active temperature compensation with native 2.8% BER, 1.02fJ/b at 0.8-1.0V in 40nm

A fully-synthesizable Physically Unclonable Function (PUF) with hysteresis-enhanced stability and active compensation of temperature variations is proposed. To reduce undesired bit flips, hysteretic behavior is obtained through the insertion of a Muller C-element output stage. A feedback scheme is also introduced to compensate the effect of temperature variations at run time. Native worst-case BER of 2.8% is measured under 0.8-1.0V and 25-85°C, with instability degradation with temperature being 0.15% per 10°C. The PUF bitcell consumes 1.02fJ/b at 0.9V. This PUF can be designed with fully automated standard cell-based flows, thus enabling substantial design effort reduction compared to prior art based on custom design styles.


First reconfigurable microarchitecture down to the pipestage level for wide energy/voltage scaling (demonstration on FFT engine)

Dynamically adaptable pipelines with its full integration with automated digital flows at design time and with dynamic voltage scaling schemes at run time is demonstrated with a 256-point radix-4 fixed-point FFT engine on a 40-nm test chip. Measurements show energy savings up to 30% (38%) at iso throughput (iso-voltage). Area and worst-case performance penalty are 5% and 11%, respectively.


First demonstration of reconfigurable clock networks for adaptation under wide voltage scaling

A reconfigurable clock network design for operation from sub-threshold to nominal voltage is presented. The number of levels is adjusted with more levels at nominal voltage to mitigate the impact of wire delay, and fewer in sub-threshold to mitigate the dominant random skew due to repeaters. Clock skew is reduced by up to 2.5 standard deviations and enables 110mV Vmin reduction at 1.8% area penalty in an FFT 40nm testchip, compared to traditional clock networks.


PUF chip (2014)

15-fJ/bit Static Physically Unclonable Functions for Secure Chip Identification with <2% Native Bit Instability and 140X Intra/Inter PUF Hamming Distance Separation in 65nm

A static class of Physically Unclonable Functions for secure key generation and chip identification is presented. Energy down to 15 fJ/bit is achieved, key reproducibility and uniqueness meet inter/intra-PUF Hamming distance separation of 140X or greater, randomness passes all NIST tests. Native unstable bits are less than 2% at nominal conditions and less than 5% in 0.7-1 V voltage and 25-85 oC temperature range, before applying any further post-silicon technique for stability enhancement.