Chip gallery

2024

CogniVision: End-to-End SoC for Always-on Smart Vision with mW Power in 40nm (IEEE manuscript, Scholar Bank draft)

(details coming as soon as this technology is presented in the public domain)

2024

122.7 TOPS/W Stdcell-Based DNN Accelerator Based on Transition Density Data Representation, Clock-Less MAC Operation, Pseudo-Sparsity Exploitation in 40 nm (IEEE manuscript, Scholar Bank draft)

(details coming as soon as this technology is presented in the public domain)

2024

E-Textile Battery-Less Walking Step Counting System with <23 pW Power, Dual-Function Harvesting from Breathing, and No High-Voltage CMOS Process (IEEE manuscript, Scholar Bank draft)

(details coming as soon as this technology is presented in the public domain)

2024

Imager with In-Sensor Event Detection and Morphological Transformations with 2.9 pJ/pixel×frame Object Segmentation FOM for Always-On Surveillance in 40 nm (IEEE manuscript, Scholar Bank draft)

(details coming as soon as this technology is presented in the public domain)

2023

Single-Antenna Backscattered BLE5 Transmitter with up to 97m Range, 10.6 µW Peak Power for Purely-Harvested Green Systems (IEEE manuscript, Scholar Bank draft)

This paper introduces a backscattered BLE5 transmitter for low-cost single-antenna green systems solely powered by mm-scale harvesters. Peak power reduction to 10.6 µW is achieved while enabling BLE-compliant spectral mask up to the maximum allowed backscattered power for range extension. Peak power is reduced via an approximate GFSK modulator architecture based on a non-uniform self-sampling digitally controlled oscillator (DCO) with period pruning/clustering, in place of a power-hungry Gaussian filter and PLL used in conventional GFSK modulators. A 180-nm testchip shows 97-m range with commodity receiver at 4X power and 3X range improvement with respect to prior art.

2023

 

A 0.4-V 12-bit Self-Calibrated SAR ADC with Offset Injection Assist Achieving 0.43 fJ/conv-step (IEEE manuscript, Scholar Bank draft)

A 12-bit hybrid SAR ADC with SAR search assisted by comparator offset injection is presented. The proposed ADC architecture reuses the offset calibration circuitry for conversion of the last two LSBs, reducing the capacitive DAC dynamic range requirement for a given unit capacitance , and hence pushing the total capacitance down towards its thermal noise limit. The proposed ADC is self-calibrated with no need for accurate input generation for ubiquitous adoption. The 40-nm testchip achieves a Walden energy FoM of 0.43 fJ/conv-step (lowest in SAR ADCs with ENOB>10 bit), while delivering an SNDR>64.7dB and SFDR>74.3dB.

2023

Super-Cutoff Analog Building Blocks for pW/Stage Operation and Demonstration of 78-pW Battery-Less Light-Harvested Wake-Up Receiver down to Moonlight (IEEE manuscript, Scholar Bank draft)

Techniques to lower power in analog and sensor interfaces well below regular transistor leakage (VGS=0V) are introduced. The proposed circuit techniques enable pW power in super-cutoff (VGS<0V) building blocks from current mirrors to OTAs, pseudo-resistors and bias. A LiFi optical wake-up receiver with 78-pW power is demonstrated with continuous operation solely powered by an unregulated 1-mm2 solar cell down to 1 lux (moonlight).

2023

 

38.4-pW, 0.14-mm2 Body-Driven Temperature-to-Digital Converter and Voltage Reference with 0.6-1.6-V Unregulated Supply for Battery-Less Systems (IEEE manuscript, Scholar Bank draft)

A temperature-to-digital converter for low-cost purely-harvested systems is introduced. Its architecture is based on an oscillator pair with PTAT and CTAT frequency via body-driven control, and compact temperature sensors with implicit self-regulation. Supply regulation is eliminated for true sub-100 pW power. State-of-the-art power of 38.4 pW is achieved in 180 nm at 0.6 V with 0.49-°C resolution at 0.14-mm2 area.

2023

 

Self-Referenced Design-Agnostic Laser Voltage Probing Attack Detection with 100% Protection Coverage, 58% Area Overhead for Automated Design (IEEE manuscript, Scholar Bank draft)

A self-referenced distributed on-chip scheme is introduced to achieve continuous detection of laser voltage probing (LVP) attacks against digital IPs with full-area coverage via temperature sensing. Calibration-free, automated and design-agnostic adoption are enabled by a stdcell-based approach, offering a 2.5X area overhead reduction compared to prior art.

2023

 

ECC-Less Multi-Level SRAM Physically Unclonable Function and 127% PUF-to-Memory Capacity Ratio with No Bitcell Modification in 28nm (IEEE manuscript, Scholar Bank draft)

A multi-level (2 bits/bitcell) SRAM PUF is introduced to uniquely enable ECC-less operation with PUF capacity exceeding storage capacity at no cell modification. The first PUF bit is generated from steady-state post-reset bitcell value with >4X higher stability than conventional power-up. The second is simultaneously extracted from the transient response. Above-storage capacity and improved stability eliminate ECC down to the SRAM V_min (0.6 V) at 75-fJ/bit energy and 3.3% area overhead in 28 nm.

2023

55-pW/pixel Peak Power Imager with Near-Sensor Novelty/Edge Detection and DC-DC Converter-Less MPPT for Purely-Harvested Sensor Nodes (IEEE manuscript, Scholar Bank draft)

A μW-range low-cost imager with on-chip detection of Regions of Interest (ROI) with novel content and power adaptation to purely-harvested sources is proposed. Novel content detection is achieved via inexpensive edge detection and edge point counting. Frame rate adaptation to light is enabled by a DC-DC converter-less MPPT loop for reduced system complexity. 55-pW/pixel always-on power is achieved in a 180-nm testchip powered by a 3.3×3.3-mm2 solar cell down to 100 lux.

2023 

Dual-Mode Conversion Gating, Comparator Merging and Reference-Less Calibration for 2.7X Energy Reduction in SAR ADCs under Low-Activity Inputs (IEEE manuscript, Scholar Bank draft)

This work introduces a SAR ADC architecture that reduces energy by skipping conversion whenever samples lie within a pre-defined activity window Δ of previous data conversion(s), while simultaneously providing an uncommonly flexible window for signal specificity exploitation and ample design reuse, minimally invasive design at system level, and suppressing any additional accurate circuitry for windowing. In detail, the proposed ADC has a uniquely flexible activity window (both center and width are tunable) and is adjustable, instead of having center/width rigidly set at design time and/or by fixed absolute thresholds. Also, the proposed ADC is minimally invasive at system level as 1) it guarantees uniform sampling and conversion completion at every sample without missing samples, 2) it does not require any additional accurate circuitry such as voltage reference or DAC. A reference-less calibration based on pulse counting is introduced to accurately set the activity window threshold with sub-LSB granularity. Comparator merging inherently compensates offset in normal conversions, reusing calibrated comparators used for ±Δ windowing. A 12-bit 40nm ADC testchip with an energy FOM of 0.95 fJ/convstep shows 2.7X energy saving over a SAR baseline at 22% area overhead.

2022

 

 

Picowatt-Power Analog Gain Stages in Super-Cutoff Region with Purely-Harvested Demonstration (IEEE manuscript, Scholar Bank draft)

In this work, gain stages with power down to the sub-pW/stage range are introduced to enable always-on mm-scale systems based on either pure harvesting across all practical environmental conditions, or micro-battery with near-shelf life lifetime (e.g., 20 years). The proposed circuit techniques suppress the need for supply voltage regulation, allowing direct harvesting (i.e., no intermediate DC-DC conversion). A CMOS 180-nm fully-differential OTA is shown to consume 0.43-1.26 pW at 0.4-0.6 V as voltage harvested across all practical conditions from 2.16-mm2 solar cell down to 1 lux. The OTA shows 0.8-mV input offset and 18-µV input-referred noise. As an example of its application, a pW-power human grip/touch detection system for always-on event monitoring is demonstrated at light harvesting down to moonlight.

2022

 

 

Capacitance-Based Voltage Regulation- and Reference-Free Temperature-to-Digital Converter down to 0.3 V and 2.5 nW for Direct Harvesting (IEEE manuscript, Scholar Bank draft)

A temperature-to-digital converter for direct harvesting is proposed, where no DC-DC conversion is required between the DC harvester and the system. Temperature-induced capacitance differences are read out through ring oscillator frequency. PVT variations are suppressed by the differential nature of the temperature sensor architecture, whereas mismatch is compensated via a self-referenced calibration procedure. No reference, regulator, digital post-processing and digital direct temperature readout is needed to retain true-nW and low-Vmin operation. A 180-nm testchip tested across corner wafers shows 7bit ENOB, 2.5-4.5nW from solar and thermal direct harvesting at 0.3-0.5 V, as representative of a very wide range of environmental conditions.

2022

 

 

 

Imager with Dynamic LSB Adaptation and Ratiometric Readout for Low-Bit Depth 5-µW Peak Power in Purely-Harvested Systems (IEEE manuscript, Scholar Bank draft)

An imager with µW peak power is introduced for purely-harvested operation. The LSB is dynamically adapted to the light intensity of the scene for aggressive bit depth down-scaling, avoiding traditional dynamic range over-margining across practical light intensities under fixed LSB. Ratiometric readout of pixel current cancels threshold voltage mismatch. A 256×256-pixel 180-nm imager shows 5-uW power at 1 fps and 4 bits, while keeping ImageNet classification accuracy drop to percentage points under 75-dB ambient light range, across original and brightness-adjusted images.

2022 

 

On-Chip Laser Voltage Probing Attack Detection with 100% Area Coverage at Above/Below the Bandgap Wavelength and Fully-Automated Design (IEEE manuscript, Scholar Bank draft)

An on-chip detection scheme against Laser Voltage Probing (LVP) attacks is introduced. It enables digital IP full-area coverage, and its architecture preserves automated design and stdcell layout discipline (including restricted design rules). Stdcells with laser sensing are proposed with photocurrent sensitivity up to 10 pA/m2 both above/below the bandgap wavelength, inherent PVT resilience, and no calibration.

2022 

 

 

Fully-Digital Broadband Calibration-Less Impedance Monitor for Probe Insertion Detection against Power Analysis Attacks (IEEE manuscript, Scholar Bank draft)

In this work, a broadband supply impedance monitor is proposed to detect insertion of probing devices or package/PCB modifications in secure systems. The fully-digital architecture allows automated and portable design, and its compact area allows under-pad placement for inexpensive adoption. Ratiometric measurements suppress variations at no calibration, temporal zooming enhances sensitivity to reactance. Detection of several attacks with different probing devices is demonstrated with a 28-nm testchip up to 2 GHz.

2022

 

 

DDPMnet: All-Digital Pulse Density-Based DNN Architecture with 228 Gate Equivalents/MAC Unit, 28-TOPS/W and 1.5-TOPS/mm2 in 40nm (IEEE manuscript, Scholar Bank draft)

Relentless advances in DNN accelerator energy and area efficiency are demanded in low-cost edge devices. Both directly benefit from the reduction in the complexity of MAC units (neurons), thanks to the reduction in area and energy of computations and the interconnect fabric. In this work, the all-digital DDPMnet architecture for DNN acceleration based on a pulse density data representation is introduced to reduce the gate count/MAC unit from the thousand range to few hundreds. The proposed architecture removes any arithmetic block from MAC units (e.g., multipliers), while retaining the advantages of standard cell based design.

2022

 

Side-Channel Attack Counteraction via Machine Learning Targeted Power Compensation for Post-Silicon HW Security Patching (IEEE manuscript, Scholar Bank draft)

Machine learning-based side-channel attack counteraction is presented for security upgradeability via retraining, upon vulnerability discovery after deployment. Based on stdcell design, direct compensation of information-leaking power contributions reduces the power overhead over conventional indiscriminate compensation of total power fluctuations. A 40nm chip demonstrates design reuse across crypto algorithms, patching against a new attack to PRESENT, and AES protection under >1.2B traces.

2021

 

A 109TOPS/mm2 and 749-1,459TOPS/W SRAM Buffer with In-Memory Inference and Prediction-Less Bitline Activity Reduction in 28nm (IEEE manuscript, Scholar Bank draft)

This work presents an SRAM macro for continuous data buffering, simultaneous (always-on) in-memory computing for area- and energy-efficient event detection, and energy-efficient bulk read upon event occurrence for off-memory deeper data insights. The bitcell with non-precharged operation enables signed current summing for energy-efficient inference, 90% bitline activity reduction in conventional reads without any data prediction circuitry. Hence, the proposed SRAM uniquely enhances energy efficiency both in read access and in-memory compute operation. The proposed architecture allows simultaneous inference and buffering (write) of incoming data for continuous-sensing applications with uninterrupted input data streams or samples (e.g., for computer vision). Circuit reuse and sharing make the area overhead low (17.6%) over the same array without in-memory compute capabilities. The proposed SRAM achieves 109TOPS/mm2 area efficiency and 749-1,459TOPS/W energy efficiency in 28nm under neural net workloads.

2021

 

A 1448-Mpixel/s, 84-pJ/pixel Display Stream Compression Encoder in 28 nm for 4K Video Resolution (IEEE manuscript, Scholar Bank draft)

A VESA Display Stream Compression (DSC) video encoder architecture in 28 nm for 4K-resolution virtual reality headsets and smartphones is presented. The pipelined prediction loop shortens the critical path and enables time-interleaving, same-row serial slice processing and logic reuse across frame slices. A component-wise memory architecture with dynamic allocation reduces the required buffer capacity by 48%. Energy and area efficiency improvements of up to 2.5X and 1.75X are achieved compared to conventional parallel multi-slice architectures. This first published DSC encoder chip achieves 1,448 Mpixels/s and down to 84 pJ/pixel at 4K UHDTV, enabling integration in battery-powered portable and wearable systems.

2021

 

Trimming-Less 0.2-V, 3.2-pW Voltage Reference Based on Corner-Aware Replica Combination with 1.6% Process Sensitivity, 1.4-mV Accuracy across PVT and Wafers (IEEE manuscript, Scholar Bank draft)

This work introduces a class of voltage references able to operate down to 3.2 pW and 0.2-V supply for energy harvesting with relaxed or suppressed voltage regulation (direct harvesting). Inherent wafer-to-wafer process sensitivity limitations and effect of process corners in deep sub-threshold are mitigated via a selection/combination of circuit replicas driven by a process sensor, at zero testing effort and trimming. A 180-nm testchip shows 1.6% process sensitivity (including wafer-to-wafer variations), 60.7-µV/V line sensitivity, and 34.9-µV/oC temperature coefficient, leading to 1.4-mV overall accuracy across corner wafers.

2021

 

Battery-Less IoT Sensor Node with PLL-Less WiFi Backscattering Communications in a 2.5-µW Peak Power Envelope (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A system on chip including 802.11b WiFi communications is introduced to demonstrate battery-less operation for low-cost mm-scale sensor nodes. µW peak power is enabled by PLL-less WiFi backscattering communications and event-driven frequency regulation to compensate environmental variations. A 180nm testchip integrating the entire signal chain from any of four sensor interfaces to wireless communications with a commercial WiFi router exhibits 2.5µW total power.

2021

 

Fully-Digital Self-Calibrating Decoder with Sub-µW, 1.6fJ/convstep and 0.0075mm2 per Receptor for Scaling to Human-Like Tactile Sensing Density (YouTube video demo, IEEE manuscriptScholar Bank draft)

This work presents an area- and energy-efficient decoder for tactile e-skin sensing encoding to scale up receptor density to the human scale. A fully-digital signal-adaptive receptor interface and event decoder architecture are introduced, leveraging temporal/spatial tactile signal sparsity to dynamically reduce activity and time resolution at negligible accuracy degradation. A novel reference-less self-calibrating senseamp is introduced to cancel offset by exploiting the statistical balance of spread-spectrum tactile pulses and noise. The 40nm testchip shows 1.6-fJ/convstep energy (0.0075mm2 area) per receptor with 50X (5X) improvement over prior art, and 80-receptor e-skin aggregation on a single pad.

2021

 

 

Rail-to-Rail Dynamic Voltage Comparator Scalable down to pW-Range Power and 0.15-V Supply (IEEE manuscriptScholar Bank draft)

An ultra-low voltage, ultra-low power rail-to-rail dynamic voltage comparator solely based on digital standard cells is presented. Thanks to its digital nature, the comparator can be designed and integrated with fully-automated digital design flows and can operate at very low voltages down to deep sub-threshold. Measurements on an 180nm testchip show correct operation under rail-to-rail common-mode input at a supply voltage ranging from 0.6V down to 0.15V.  The minimum supply voltage and power are the lowest reported to date, and make the circuit suitable for direct powering from mm-scale harvesters.

2021

 

A 0.6-to-1.8V Trimming-Less CMOS Current Reference with Near-100% Power Utilization (IEEE manuscript, Scholar Bank draft)

In this work, a current reference is proposed to introduce the new capability of operating under wide supply voltage ranges and at near-100% power utilization, as necessary in resource-constrained systems such as IoT sensor nodes. Operation from near-threshold (0.6 V) to nominal voltage (1.8 V) is demonstrated. The proposed reference uniquely limits the power absorbed by the peripheral circuitry to only 0.1% of the overall power, thus utilizing 99.9% of it for the intended output current. As demonstrated in a 180-nm testchip (15 tested dice from the same manufacturing lot), the near-100% power utilization with its compact area of 4,000 um2 allows power- and area-frugal reference current generation.

 

2021

 

A 300mV-Supply, sub-nW-Power Digital-Based Operational Transconductance Amplifier (IEEE manuscript, Scholar Bank draft)

An ultra-low voltage and ultra-low power Digital-Based Operational Transconductance Amplifier (DB-OTA) is presented and demonstrated on silicon in 180 nm CMOS. The DB-OTA is designed using digital standard cells, hence benefitting from technology scaling as much as digital circuits, while also being technology- and design-portable, and requiring minimal design and integration effort compared to conventional analog-intensive OTAs. The fabricated DB-OTA testchip occupies a compact area of 1,426 μm2, operates at supply voltages down to 300 mV, and consumes only 590 pW while driving a capacitive load of 80pF. Its measured Total Harmonic Distortion (THD) is lower than 5% at a 100-mV input signal swing. Based on these results, the proposed DB-OTA achieves 2,101 V-1 small-signal figure of merit (FOMS) and 1,070 large-signal figure of merit (FOML). To the best of the authors’ knowledge, the power is the lowest reported to date in an OTA, and the achieved figures of merit are the best in sub-500 mV OTAs reported to date. The low cost, the low design effort and the high power efficiency of DB-OTA make it well suited for purely harvested low-frequency analog interfaces in sensor nodes.

2021

 

Unified In-Memory Dynamic (TRNG) and Multi-Bit Static (PUF) Entropy Generation for Ubiquitous Hardware Security (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This work introduces the first unified in-memory True Random Number Generator (TRNG) and Physically Unclonable Function (PUF) for complete and inexpensive key generation within an SRAM array. TRNG is based on time-to-digital conversion of jitter accumulated at bitline discharge from leakage. Multi-bit per bitcell PUF is achieved by binning the discharge rate difference of bitline pairs. A 28 nm testchip shows TRNG at 16,000 F2 area per output stream, and 2-bit/bitcell PUF with 6.4 Gbps, 78 fJ/bit energy.

 

2021

 

Capacitance-to-Digital Converter for Operation under Uncertain Harvested Voltage down to 0.3V with No Trimming, Reference and Voltage Regulation  (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This work introduces the first capacitance-to-digital converter (CDC) for low-cost systems that are directly powered by a harvester. The CDC is based on swappable oscillators and does not require any additional circuitry, suppressing any reference and voltage regulation. Load-agnostic self-calibration eliminates the need for a specific test load and testing-time trimming. A 180nm testchip shows 7-bit ENOB down to 0.3V and 1.37-nW overall power, when powered by a 1-mm2 indoor solar cell.

2020

 

Fully-Digital Rail-to-Rail OTA with Sub-1,000 µm2 Area, 250-mV Minimum Supply and nW Power at 150-pF Load in 180nm (IEEE manuscript, Scholar Bank draft)

A fully-digital operational transconductance amplifier (DIGOTA) architecture for tightly energy-constrained low-cost systems is presented. A 180nm DIGOTA testchip exhibits an area below the 1,000-µm2 wall, and 2.4-nW power under 150pF load, and a minimum supply voltage Vmin of 0.25 V. In the 0.3-0.5 V supply range, DIGOTA improves the area-normalized small (large) signal energy FoM by at least 836X (267X) over prior sub-500mV OTAs, while reducing area by 27-85X. The low-Vmin and nW-power features are shown to enable direct harvesting at the mm scale.

2020

A Robust, High-Speed and Energy-Efficient Ultralow-Voltage Level Shifter (IEEE manuscript, Scholar Bank draft)

This work presents a robust level shifter design able to convert input voltages from the deep sub-threshold regime (about 100 mV) up to the nominal supply voltage (1.8 V). The proposed circuit is based on a self-biased low-voltage cascode current mirror (CM) topology that features diode-connected PMOS and NMOS transistors to drive the split-input inverting buffer used as output stage with high energy efficiency. Experimental results across corner wafers demonstrate the effectiveness of the proposed level shifter as compared to prior art. The proposed circuit allows a voltage up-conversion from a 0.4-V 100-kHz input pulse to 1.8 V with an average switching delay of 7.6 ns and an average energy per transition of only 69 fJ. This is achieved at an area of 82 µm2 for a standard cell-based design.

2020

 

Fully-Synthesizable All-Digital Unified Dynamic Entropy Generation, Extraction and Utilization within the Same Cryptographic Core (IEEE manuscriptScholar Bank draft)

This work introduces a novel class of fully-synthesizable all-digital True Random Number Generators (TRNGs) using the same private-key cryptographic core for raw dynamic entropy generation, its extraction via post-processing, and its utilization as crypto-key for constrained secure systems. Endogenous random bit generation is achieved via clock pulsewidth overstretching in the digital implementation of private-key cryptographic algorithms using pulsed-latch pipelines, leveraging inherent Shannon confusion and diffusion. Demonstration on a 40-nm testchip based on a SIMON cryptographic core shows 64-bit key encryption down to 0.25 pJ/bit at 0.45 V, random number generation with cryptographic-grade entropy at 2.5 pJ/bit across manufacturing lots, dice, voltages and temperature corners. The overall area is kept well below the 1E6 F2 area wall (F = minimum feature size).

2020

 

Broad-Purpose In-Memory Computing for Signal Monitoring and Machine Learning Workloads (IEEE manuscriptScholar Bank draft)

In this work, a broad-purpose compute-in-memory solution (±CIM) able to handle arbitrary sign in both inputs/features and weights/coefficients is introduced. The ability to operate on arbitrary sign and under variable precision on both operands enables a wide range of applications, ranging from conventional neural networks to digital signal processing and monitoring. The ±CIM pipelined architecture, the reconfigurable row encoder and the adoption of a commercial 2-port bitcell allow uninterrupted memory availability for conventional read/write, even when performing in-memory computations. A 40nm testchip shows the ability of the ±CIM architecture to perform both neural network computations and classical signal processing. At 6-bit precision, the measured worst-case mismatch (noise) is 0.38 (0.62) LSB. The achieved accuracy when executing a LeNet-5 neural net workload is 98.3%, which is within 1.3% of state-of-the-art software implementations. As example of signal processing workload, 91.7% accuracy is achieved in voice activity detection, which is within 2.8% of a software implementation. Overall, the energy efficiency (throughput) of 41 TOPS/W (122 GOPS) is achieved at 38% area overhead, over a conventional SRAM with the same 4-KB capacity.

 

2020

 

Voice Activity Detection with >83% Accuracy under SNR down to -3dB at 1.19µW and 0.07mm2 in 40nm (IEEE manuscriptScholar Bank draft)

This work presents a voice activity detector for keyword spotting in self-powered speech interfaces with sub-syllable latency. A simple decision stump classifier and time averaging are introduced to provide >83% accuracy in noisy environments with SNR down to -3dB for reliable operation under a wide range of usage contexts (8-15dB lower than prior art). 1.19-µW power and 0.07mm2 area are shown in 40nm.

 

2020

 

Multi-Sensor Platform with Five-Order-of-Magnitude System Power Adaptation down to 3.1nW and Sustained Operation under Moonlight Harvesting (Always-On even without Battery) (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A sensor node with system power tuning is presented for 5-order-of-magnitude adaptation to harvested power. Coordinated tuning of unified voltage/capacitive/light sensor interface, MCU and direct MPPT with no intermediate power conversion scales system power to 3.1nW at 0.3V. Operation at 1lux (moonlight) with 4.1×4.1mm2 light harvester is shown.

The power (frequency) dynamic range is 110,000× and down to 480 pW (50,000× up to 2 MHz). Power-speed scaling of the sensor interface is similar to the MCU across orders of magnitude, across which no sub-system sets a rigid power floor down to 3.1nW. The proposed platform operates under direct harvesting with a solar cell with 4.1mm×4.1mm active area down to 1lux, corresponding to moonlight harvesting for the first time. This paves the way for next-generation battery-light and battery-less always-on systems that do not miss any physical event in spite of the highly-fluctuating nature of energy harvesting.

2020

 

Voltage Reference with Lowest Operating Voltage down to 0.25 V and pW Power for Direct Harvesting and Battery-Less Systems (IEEE manuscript, Scholar Bank draft)

This work introduces a compact voltage reference operating at pW-power and 250-mV supply (e.g., direct harvester-powered). Body biasing assisted by replica biasing enables 25µV/oC temperature coefficient, 140µV/V line sensitivity, and 0.42mV process sensitivity in 180nm. 2.55-mV overall accuracy is achieved at 2,200µm2 area, without trimming. Operation at such low voltage and power introduces the capability to suppress the power-hungry intermediate DC-DC conversion stage of conventional sensor node architectures, and suppression of the battery altogether.

 

2020

Ultra-Compact Current- and Voltage-Input Analog-to-Digital Converters with Minimal Design Effort (“ADC in a day”) (IEEE manuscript, Scholar Bank draft)

Fully-synthesizable Successive Approximation Register (SAR) Analog-to-Digital Converters (ADCs) suitable for low-cost integrated systems are proposed both for voltage and current input. The proposed fully-digital ADC architectures enable low-effort design, silicon area reduction, and voltage scaling down to the near-threshold region. Compared to traditional analog-intensive designs, their digital nature allows easy technology and design porting, digital-like area shrinking across CMOS technology generations, and also drastically reduced system integration effort through immersed-in-logic ADC design. The voltage-input ADC architecture is demonstrated with a 40-nm testchip showing 3,000-μm2 area, 6.4-bit ENOB, 2.8kS/s sampling rate, 40.4dB SNDR, 49.7dB SFDR, and 3.1μW power at 1V. A current-input ADC is also demonstrated for direct current readout without requiring a trans-resistance stage. 40-nm testchip measurements show a 5-nA to 1-μA input range, 4,970μm2 area, 6.7-bit ENOB and 2.2-kS/s sample rate, at 0.94-μW power. Compared to the state of the art, the proposed ADC architecture exhibits the highest level of design automation (standard cell), lowest area, and the unique ability to cover direct acquisition of both voltage and current inputs, suppressing the need for transresistance amplifier in current readout.

2020

 

Low-Energy Voice Activity Detection via Energy-Quality Scaling from Data Conversion to Machine Learning (IEEE manuscriptScholar Bank draft)

In this work, voice activity detection (VAD) systems with system-level energy-quality (EQ) scaling have been demonstrated. Compared to prior single-knob EQ scaling, multiple EQ knobs are selectively inserted into the entire signal chain from end to end (i.e., from data conversion to classification). EQ knobs are dynamically co-optimized to minimize energy for a given quality target. Multi-knob energy-quality scaling makes quality degradation more graceful than single-knob, allowing for more aggressive energy reduction under a given quality target, while retaining the ability to operate at full quality. Also, proper system-level EQ optimization enhances fitting in machine learning-based systems (e.g., decision tree-based), suppressing both underfitting and overfitting. Measurements on a 28nm testchip show that system-level EQ scaling can reduce energy by up to 3.5X at 2% accuracy degradation in 10-dB noise, compared to full quality. Iso-technology comparison shows that the minimum energy of 51.9 nJ/frame is lower than prior art by 1.9-74.4X at comparable speech/non-speech hit rates.

 

2020

 

Deep sub-pJ/bit sub-10^6 F^2 energy-security scalable SIMON crypto-core (IEEE manuscriptScholar Bank draft)

This work introduces an energy-security scalable crypto-core for private-key cryptography in low-end sensor nodes based on SIMON cipher. Energy and area footprints are reduced through techniques at the algorithm, microarchitectural and gate level. The 40 nm testchip shows energy down to 0.31 pJ/bit at 0.45 V with 64-bit key and 0.79E6 F2 area (F = process minimum feature size). The proposed crypto-core is well suited for ubiquitous security in energy/area-constrained platforms (e.g., low-end sensor nodes, RFIDs), while preserving full 256-bit security when necessary.

2019

Reconfigurable microcontroller/memory organization for energy-performance extension beyond voltage scaling (IEEE manuscript, Scholar Bank draft)

This work introduces reconfigurable thread count augmentation for existing microcontroller architectures, and row aggregation for their dedicated SRAM memory, to extend their energy-performance tradeoff beyond traditional voltage scaling, while at minimal design effort (“drop-in”). The proposed techniques are architecture-agnostic as the added reconfigure-ability does not modify the original instruction execution down to the cycle level. Reconfiguration permits to occasionally boost the throughput of simple architectures that were originally not conceived to allow multi-thread operation, while allowing the original single-thread operation in less performance-critical tasks. From a design viewpoint, thread count augmentation is fully automated and directly manipulates the gate-level netlist of an existing single-thread processor, allowing its application to commercial Intellectual Property cores (even if obfuscated by the IP vendor). Similarly, SRAM row aggregation can be applied on commercially compiled 6T SRAM arrays with minor modification in the row decoder. A 40nm ARM Cortex-M0 testchip shows 1.8X (1.4X) core (memory) performance boost beyond a baseline at nominal voltage, 1.4X lower minimum energy point at only 16% (4%) area (timing) overhead, and lowest energy/cycle to date.

2019

 

First Physically Unclonable Function with design margin reduction via in-situ and PVT sensor fusion for low-cost hardware security (IEEE manuscript, Scholar Bank draft)

This work introduces a Physically Unclonable Function-based (PUF) key generation scheme with run-time in-situ instability detection and process/voltage/temperature (PVT) sensors. Such sensors are fused to evaluate the sufficient number of correction bits NECC required by Error Correcting Code (ECC) to make the PUF output stable, and meet a given bit error rate target. Run-time sensing overcomes the substantial ECC energy penalty associated with the traditional design-time margin of NECC for worst-case word, die, voltage and temperature. ECC with tunable NECC is introduced to enable energy saving in typical cases where NECC is lower than its worst-case value. Sensor fusion via simple linear regression estimates the required NECC at run-time. A testchip in 40nm demonstrates the concept, based on a static monostable current mirror PUF with NECC = 0…4. Average energy reduction by 1.8X is shown compared to a traditional margined design, at an area overhead of less than 20%. As additional benefit of adjustable NECC, such energy savings can be further expanded under applications having less stringent stability requirements.

2019

Standard Cell-Based Ultra-Compact DACs in 40-nm CMOS (IEEE manuscript, Scholar Bank draft)

Ultra-compact, high-resolution, standard cell-based DACs based on the Dyadic Digital Pulse Modulation (DDPM) are presented. As fundamental contribution, an optimal sampling condition is analytically derived to enhance conversion with inherent suppression of spurious harmonics. Operation under such optimal condition is experimentally demonstrated to assure resolution up to 16 bits, with 9.4-239X area reduction compared to prior art. The digital nature of the circuits also allows extremely low design effort in the order of 10 man-hours, portability across CMOS generations, and operation at the lowest supply voltage reported to date. A DAC for DC calibration achieving 16-bit resolution with 3.1-LSB INL, 2.5-LSB DNL, 45µW power, at only 530µm2 area is demonstrated in 40nm CMOS.

2019

 

First energy-quality scalable Network on Chip with best-in-class energy (down to 6.9fJ/bit), while still being using conventional low-swing transmitter/receiver circuits (IEEE manuscript, Scholar Bank draft)

A new class of ultra-low energy on-chip links is introduced. Through the use of sub-word ranking and non-uniform swing, the proposed links allow graceful energy-quality tradeoff in intra-chip communication links for noise-resilient applications such as machine learning and video processing. The proposed techniques are demonstrated in a 28nm testchip that achieves up to 4.5X energy saving over conventional full-quality links, and up to 2.2X over approximate links at iso-quality. Conventional operation with no quality degradation is also allowed for data packets that require full quality.

2019

 

First always-on system architecture for widely adaptive and power-scalable MCU/PMU from sub-mW to true nW, enabling battery-less and battery-indifferent operation (IEEE manuscript, Scholar Bank draft)

The proposed integrated system architecture consists of a power management unit (PMU) driving a microcontroller, and controlling a novel power knob that enables adaptation to the sensed power availability over an ultra-wide range, well beyond voltage scaling and down to nW level. Conventional battery-powered operation is augmented with pure harvesting. Wide power adaptation is enabled by comparator delay self-biasing and zero-current switching scheme shared among all power modes with single-cycle convergence.

2019

First DAC architecture with digital-like shrinking under scaled technologies, and exhibiting graceful degradation under voltage/frequency overscaling (IEEE manuscript, Scholar Bank draft)

The proposed DAC allows very low design effort, enables digital-like shrinkage across CMOS generations, low area at down-scaled technologies, and operation down to near-threshold voltages. The proposed DAC can operate at supply voltages that are significantly lower and/or at clock frequencies that are significantly greater than the intended design point, at the expense of moderate resolution degradation. In a 12-bit 40-nm testchip, graceful degradation of 0.3bit/100mV is achieved when  is over-scaled down to 0.8V, and 1.4bit/100mV when further scaled down to 0.6V. 
The proposed DAC enables dynamic power-resolution tradeoff with 3X (2X) power saving for 1-bit resolution degradation at iso-sample rate (iso-resolution).

2018

Relaxation oscillator for sensor nodes with lowest power to date (pW-range), operating under 0.3V-1.8V unregulated supply without any reference/bias circuitry (IEEE manuscript, Scholar Bank draft)

A pW-power versatile relaxation oscillator operating from sub-threshold (0.3V) to nominal voltage (1.8V) is presented, having Hz-range frequency under sub-pF capacitor. The wide voltage and low sensitivity of frequency/absorbed current to the supply allow the suppression of the voltage regulator, and direct powering from harvesters (e.g., solar cell, thermal from machines) or 1.2-1.5V batteries. A 180nm testchip exhibits a frequency of 4Hz, 10%/V supply sensitivity at 0.3-1.8V, 8-18pA current, 4%/°C thermal drift from -20°C to 40°C.

2018

 

The first microcontroller (MSP430) that can operate at the minimum-energy or the minimum-power point, with minimum power of 595pW (purely harvested in minimum-power mode) (IEEE manuscript, Scholar Bank draft)

This work presents an MSP430-compatible microcontroller with dual-mode standard cells enabling minimum-power and minimum-energy mode in 180nm. Minimum-power mode with sub-leakage power (595pW) allows purely energy harvested operation with sub-mm2 harvester. Minimum-energy mode (14-33pJ/cycle) maximizes battery lifetime, when battery-powered. Power management with ripple power gating self-startup allows cold start with on-chip 0.54mm2 solar cell at 55lux light condition.

2017

 

 

First sub-mW feature extraction engine for ubiquitous computer vision and IoT (IEEE manuscript, Scholar Bank draft)

An energy-quality scalable (EQSCALE) feature extraction accelerator for IoT vision applications is presented. Knobs are introduced to dynamically adjust the tradeoff between energy and feature extraction quality, leveraging the intrinsic redundancy in video frames and the robustness of object recognition against missing features. The active area of the accelerator is 0.55mm2. EQSCALE enables at least 5.7X energy improvement and 1.8X area reduction over state-of-the-art accelerators. To the best of our knowledge, EQSCALE is the first feature extraction accelerator operating in the sub-mW range (0.51mW at VGA resolution and 30 fps, and 0.19mW at 5 fps).

2017

 

First fully-synthesizable PUF (“PUF design in a day”) and active temperature compensation with native 2.8% BER, 1.02fJ/b at 0.8-1.0V in 40nm (IEEE manuscript, Scholar Bank draft)

A fully-synthesizable Physically Unclonable Function (PUF) with hysteresis-enhanced stability and active compensation of temperature variations is proposed. To reduce undesired bit flips, hysteretic behavior is obtained through the insertion of a Muller C-element output stage. A feedback scheme is also introduced to compensate the effect of temperature variations at run time. Native worst-case BER of 2.8% is measured under 0.8-1.0V and 25-85°C, with instability degradation with temperature being 0.15% per 10°C. The PUF bitcell consumes 1.02fJ/b at 0.9V. This PUF can be designed with fully automated standard cell-based flows, thus enabling substantial design effort reduction compared to prior art based on custom design styles.


2017

 

First reconfigurable microarchitecture down to the pipestage level for wide energy/voltage scaling (demonstration on FFT engine) (IEEE manuscript, Scholar Bank draft)

Dynamically adaptable pipelines with its full integration with automated digital flows at design time and with dynamic voltage scaling schemes at run time is demonstrated with a 256-point radix-4 fixed-point FFT engine on a 40-nm test chip. Measurements show energy savings up to 30% (38%) at iso throughput (iso-voltage). Area and worst-case performance penalty are 5% and 11%, respectively.

2017

 

First demonstration of reconfigurable clock networks for adaptation under wide voltage scaling (IEEE manuscript, Scholar Bank draft)

A reconfigurable clock network design for operation from sub-threshold to nominal voltage is presented. The number of llevels is adjusted with more levels at nominal voltage to mitigate the impact of wire delay, and fewer in sub-threshold to mitigate the dominant random skew due to repeaters. Clock skew is reduced by up to 2.5 standard deviations and enables 110mV Vmin reduction at 1.8% area penalty in an FFT 40nm testchip, compared to traditional clock networks.

2015

 

 

15-fJ/bit Static Physically Unclonable Functions for Secure Chip Identification with <2% Native Bit Instability and 140X Intra/Inter PUF Hamming Distance Separation in 65nm (IEEE manuscript, Scholar Bank draft)

A static class of Physically Unclonable Functions for secure key generation and chip identification is presented. Energy down to 15 fJ/bit is achieved, key reproducibility and uniqueness meet inter/intra-PUF Hamming distance separation of 140X or greater, randomness passes all NIST tests. Native unstable bits are less than 2% at nominal conditions and less than 5% in 0.7-1 V voltage and 25-85 oC temperature range, before applying any further post-silicon technique for stability enhancement.