Chip gallery

2026

One-Wire Architecture for Chiplet Reuse and Integration in Low-Dimensional Systems (IEEE manuscript, Scholar Bank draft)

(details will be shared in the public domain after the presentation of this work at CICC 2026)

2026

Smart Imager with Object Detection Exploiting Edge-Frame-Base Processing and Bounding Box Extraction for μW Power Purely-Harvested Sensor Nodes (IEEE manuscript, Scholar Bank draft)

(details will be shared in the public domain after the presentation of this work at DATE 2026)

2026

Multi-Spectral Filter-Less Computational Imager in Standard CMOS for Beyond-Visible Ubiquitous Machine Vision and Flexible Adaptation to Application (IEEE manuscript, Scholar Bank draft)

(details will be shared in the public domain after the presentation of this work at CICC 2026)

2026

Fully-Integrated mm-Scale 5G RF MIMO Harvesting with -40-dBm Sensitivity and Spatial MPPT via Hybrid Transformer-Based Combining/Shifting (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A MIMO RF harvester in the 5G 28-GHz band with spatial MPPT is introduced to self-align high-gain harvesting direction with maximum power availability. Spatial scanning is enabled by a hybrid 2-level RF/DC antenna power combining and phase shifting based on transformers, while eliminating conventional off-chip capacitor(s). A 22nm testchip shows -40 dBm harvesting sensitivity, and power conversion efficiency of 56.7% at 0 dBm (1.3X better than prior art at 28 GHz).

2026

Highly-Integrated Light Sensing System with RF Harvesting and Transmission in Commercial N-Type IGZO Flexible Technology (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A sticker-like light sensing system is demonstrated in flexible N type-only IGZO TFT. It uniquely reuses RF signals for harvested battery charging and backscattered communications, and repurposes resistors as light sensors. Low-power operation is enabled by techniques for state-of-the-art efficiency in voltage regulation, signal amplification, data conversion and pseudo-dynamic logic. Indoor light sensing and 0.9-GHz data transmission are demonstrated with 2.02mm2 area at 22μW (<1% duty cycle).

2026

Fully-Integrated Backscattered WiFi 802.11b Transmitter with Active Harmonics and Image Rejection for 30dB IRR and 36dB HRR at 0.88µW (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A sub-μW fully-integrated backscattered 802.11b WiFi transmitter is introduced with active image and harmonics rejection for common dense wireless environments. Rejection is achieved via time-domain complex filters, and oversampling with sequential RF switch enablement (no digital filter). Temperature event-driven calibration adjusts the underlying pulse basis and programmable complex impedance. A 22nm testchip shows 30-dB IRR (12-dB better than prior art), 36-dB HRR (not shown priorly).

2025

802.11b/g backscattered transmitter enabling OFDM and highly-scalable 1-18 Mbps datarate and 0.34-3.67 pJ/bit energy (IEEE manuscript, Scholar Bank draft)

In this work, a standard-compliant WiFi backscattered transmitter supporting multi-carrier Orthogonal Frequency Division Multiplexing (OFDM) is presented for the first time. Highly-scalable datarate (1-18 Mbps) and energy (0.34-3.67 pJ/bit, down to battery-less) are demonstrated in 22-nm FD-SOI. A 9X and 7.4X improvement are achieved over prior best datarate and energy/bit.
The proposed architecture is based on common Passive backscattered communications scheme. The incident wave is sent by a tone generator (shared by many edge devices), and reflected to the receiver after proper antenna impedance modulation. The reflected signal is centered at the desired channel and is directly demodulated by commodity receivers.

2025

SRAM Compute-in-Memory Macro with Dual-Dataflow Architecture for Efficient Support of Multi-Modal Transformers and CNNs (IEEE manuscript, Scholar Bank draft)

This work presents a SRAM compute-in-memory macro with a dual-dataflow architecture that enables both uninterrupted static and dynamic-input matrix multiplication and accumulation (MAC) for the first time. This uniquely eliminates the traditionally heavy inter-bank load-store overhead of dynamic MACs in all prior SRAM art, where one of the two operands is required to statically reside into the array. The dual-dataflow architecture efficiently supports both CNNs and (multi-modal) transformers.
Unified and efficient static and dynamic MACs are enabled by the proposed 1) reset-less in-bitcell Boolean computations, 2) hybrid-domain accumulator, 3) concurrent write/compute. A 28-nm 160-kb SRAM testchip shows a competitive energy efficiency of up to 138.2 (30.5) TOPS/W for 8-bit integer S-MAC (D-MAC) at full output precision.

2025

Online Learning-Based Countermeasure Against Power Analysis Attacks (IEEE manuscript, Scholar Bank draft)

An online learning-based countermeasure against power analysis attacks is presented. The proposed countermeasure digitizes power and learns on chip, training a power compensation machine learning model. For the first time, this enhances security in a die-specific manner (e.g., adaptation to mismatch) and over time (e.g., adaptation to aging). On-chip learning avoids disastrous (18,000 worse) aging-induced security deterioration, achieving billion-scale MTD.

2025

Sub-μW Battery- and Crystal-Free Tag featuring 802.11ba/b-Compliant Wake-up Receiver, Backscattered Transmitter and 3D Localization (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A sub-μW tag is introduced for battery-less operation, reusing 2-tone incident wave for 1) RF harvesting, 2) IM2 extraction of WiFi-compliant clock, 3) 802.11ba wake-up receiver, 4) WiFi 802.11b backscattered transmitter, 5) 3D localization via signal strength. Smart label demonstration for fulfillment centers shows cm-range accuracy in 180 nm.

→ TECHNOLOGY HIGHLIGHT IN IEEE Journal of Solid-State Circuits in Q3 2025

2025

On-Chip Circuit Harness Enabling Probe-Less, Position-Invariant and Massive Testing of Chiplets via Die Front/Back-Side Capacitive Coupling (IEEE manuscript, Scholar Bank draft)

This work introduces the first on-chip testing harness circuit and chiplet arrangement enabling probe-less and position-invariant testing. The proposed approach enables simultaneous power, two-way data, and clock capacitive transfer, enabling low-cost massive simultaneous testing of low-cost low-power chiplets. The proposed testing harness circuit supports touch-less chiplet power measurements for power binning, and full reuse across different chiplets (i.e., no need for chiplet-specific probe cards). The chip under test can be positioned arbitrarily on the XY plane and even flipped upside down without affecting the test results. This allows for a fully 3D invariant testing approach.

2025

Sensor-Less Laser Voltage Probing Attack Detection via Run-Time Leakage Shift Monitoring with 4.35% Area Overhead (IEEE manuscript, Scholar Bank draft)

A novel architecture to detect laser voltage probing (LVP) attacks is introduced to make silicon systems secure against such threats, while pushing the area overhead to a level that is compatible with low-cost chip products. The inherent and sharp temperature rise due to the presence of a laser beam (i.e., an attack) is detected by sensing the resulting exponential leakage increase. In turn, this is achieved by sensing the leakage of the logic under protection via intermittent power gating, suppressing altogether area-hungry and explicit sensors in prior art. In particular, the rate of the virtual supply decay rate is sensed via simple and local voltage comparison.
The proposed approach is seamlessly incorporated into an automated digital design flow, and the inherent resilience against process/voltage/temperature variations suppresses any post-silicon calibration. Extensive LVP attacks on a 28 nm testchip demonstrate attack decision margin well above 6 standard deviations, insignificant power overhead, percentage point-range performance degradation, and 4.35% area overhead (13.3× better than prior best [31]). This enables for the first time ubiquitous inclusion of LVP protection even in low-cost consumer electronics.

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2024

SRAM Physically Unclonable Function Extracting Static Entropy from Every Bitcell Transistor for 6 bit/bitcell and Data Fingerprinting Capability for Provenance Assurance (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A SRAM macro uniquely extracting static entropy from every transistor in unmodified 6T bitcells is presented, achieving for the first time 6 bit/bitcell entropy. When operating as a conventional physically unclonable function (PUF), it achieves state-of-the-art 296% PUF-to-SRAM capacity ratio without any error correcting code (ECC), retaining its energy and area efficiency at system level. In addition, the PUF output has native cryptographic-grade quality after one-time self-calibration, uniquely suppressing any entropy post-processing circuitry.
As further operating mode, the proposed SRAM macro performs data fingerprinting by exploiting its unique data-dependent response. Data fingerprinting represents an additional layer of security supporting provenance assurance of data, and user authentication in real time or in retrospect. Competitive 134 F2/bit area efficiency is demonstrated in 28 nm with minor modification of conventional SRAM periphery.

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2024

Backscattered Software-Defined Radio for Flexible, Reusable and Upgradeable Transmitters with 34-58 pJ/bit Energy Across Common Standards (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A software-defined backscattered transmitter is introduced for die reuse across applications and upgrade over device lifespan. World record-breaking energy down to few tens of pJ/bit for pervasive connectivity is achieved via a DAC-less architecture comprising 1) time-based jitter/power-tunable symbol generation for PSK/FSK/ASK modulations, 2) eFPGA supporting full and flexible PHY and filter-less over-sampling for pulse shaping (e.g., GFSK in BLE). 180-nm demonstration of communications with commodity hardware in several standards is shown. This is the first transmitter bridging the gap between backscattering and software-defined radios.

2024

Event-Driven Voltage Reference with Bandgap-Class Temperature Coefficient Across-Corner with 100-pW Power at 0.8 V (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A voltage reference with event-driven temperature compensation is proposed to reduce power under common-case moderate fluctuations. An accurate reference is occasionally turned on to fine-tune a low-power clone when temperature changes by more than a threshold. As opposed to sample and hold references whose period is limited by capacitor discharge due to leakage, the proposed reference has unrestrictedly long off-periods and reduce power below 100 pW. At the same time, bandgap-class temperature coefficient and low process sensitivity across corners are achieved in 180nm.

2024

CogniVision: End-to-End SoC for Always-on Smart Vision with mW Power in 40nm (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A full vision system on chip with hierarchical execution from sensor to AI and communications is presented. Always-on system power is aggressively reduced to the mW level through activity reduction, gating any subsequent vision pipeline stage starting from the lowest semantic level of the scene. The system comprises an imager with dual-architecture in/near-sensor saliency detection, on-the-fly novelty detection, DNN with on-chip scheduler for weight memory reduction, WiFi transmitter, wake-up receiver for cloud-pushed DNN model update and software programmable orchestration via RISC-V.

Compared to prior art, the proposed work is the only full imager-to-communication system. Peak throughput in DNN is 2.25 TOPS at 1.1 V, which is 4.4-1785.7X better than prior art, and 7.6X lower than [ISSCC 2022] which has 6.2X larger die size. The DNN energy efficiency is 29.45 TOPS/W at 8 bit, which is 1.1-19.6X better than recent prior art, and 1.6X worse than the in-memory accelerator-only without vision capabilities in [VLSI 2023]. State-of-the-art average (worst-case) full system energy/frame.pixel of 1.16 nJ (1.6 nJ) at 30fps is 1.4-10.8X better than recent prior art, while still having always-on visual monitoring.

Average 2.1-mW power at 30 fps is achieved with a SqueezeNet V1.0 model running entirely on chip. This demonstrates for the first time the ability to execute the entire vision pipeline from sensing to analytics in an always-on manner (no missed events), while keeping the power in the mW range.

2024

122.7 TOPS/W Stdcell-Based DNN Accelerator Based on Transition Density Data Representation, Clock-Less MAC Operation, Pseudo-Sparsity Exploitation in 40 nm (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A DNN whose activation magnitude is represented by digital transition density is introduced for low energy, under the proposed Dyadic Digital Transition Modulation (DDTM). MAC operations are simplified into transition counting, enabling 1) activation pseudo-sparsity for lower energy, 2) clock-less neuron operation via simple up-down asynchronous counters.

Compared to prior art, the 40-nm testchip achieves the state-of-the-art energy efficiency of 122.7 TOPS/W in 4 bit using SqueezeNet v1.0. In absolute terms, this represents a 4.4X improvement at 4 bit over prior time-domain data representations, and 111.5X over conventional digital representation in comparable technologies. Energy efficiency is also improved by 1.28X at 4-bit over the prior best in 5 nm, in spite of using a much more advanced technology (6 CMOS generations ahead). As key finding, macro energy efficiency reaches (or exceeds) the 100 TOPS/W range and is hence equivalent to recent state-of-the-art in-memory DNN accelerators.

2024

E-Textile Battery-Less Walking Step Counting System with <23 pW Power, Dual-Function Harvesting from Breathing, and No High-Voltage CMOS Process (YouTube video demo, IEEE manuscript, Scholar Bank draft)

An e-textile walking step counting full system integrated on a smart T-shirt is presented. Harvesting from co-designed low-voltage triboelectric nanogenerator (TENG) pushes over-voltage protection/rectification on chip. The TENG has been kindly fabricated at Prof. Lee Pooi See’s lab (NTU). Conformability and minimal off-chip components are achieved via dual-function harvester/sensor reuse and battery/passive elimination. Always-on power reduction to pWs enables uninterrupted operation while solely powered by breathing harvesting, suppressing the need for energy storage altogether.

Compared with prior art, the proposed architecture has the lowest peak power of 23 pW (3.4 pW always-on) during walking events sensing (data retention). This is >230X improvement over all prior demonstrations, which include only part of the system. The minimum harvesting frequency is the lowest by one to two orders of magnitude over prior art, as it is not limited by an LC resonator. The level of integration is the highest for true system conformability and low cost for embedment into textiles.

2024

Imager with In-Sensor Event Detection and Morphological Transformations with 2.9 pJ/pixel×frame Object Segmentation FOM for Always-On Surveillance in 40 nm (IEEE manuscript, Scholar Bank draft)

An imager with in-sensor event detection and object segmentation with 2.9 pJ/pixel*frame energy is presented. Tile-level analog circuitry for background subtraction and event detection enable readout suppression in uninteresting regions, and include erosion for activity reduction. Conventional object integrity degradation due to erosion is counteracted by introducing dilation in the same tile, restoring missing parts to preserve accuracy in segmentation and subsequent system-level recognition.

Comparing with the state of the art, the proposed imager is the only imager that includes dilation, whose combination with erosion allows image opening and closing transformations. Such morphological transformations reduce redundant activations due to background motion and enable more accurate event detection, while keeping the object size intact. This enables a saliency-gated object-segmented readout operation reducing the conversion activity by 6.1X, hence reducing power. The resulting object segmentation FOM is 2.9 pJ/pixel×frame, which represents a 3.9-16.4X improvement over the two lowest in prior art, and orders of magnitude better than all others.

2023

Single-Antenna Backscattered BLE5 Transmitter with up to 97m Range, 10.6 µW Peak Power for Purely-Harvested Green Systems (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This paper introduces a backscattered BLE5 transmitter for low-cost single-antenna green systems solely powered by mm-scale harvesters. Peak power reduction to 10.6 µW is achieved while enabling BLE-compliant spectral mask up to the maximum allowed backscattered power for range extension. Peak power is reduced via an approximate GFSK modulator architecture based on a non-uniform self-sampling digitally controlled oscillator (DCO) with period pruning/clustering, in place of a power-hungry Gaussian filter and PLL used in conventional GFSK modulators. A 180-nm testchip shows 97-m range with commodity receiver at 4X power and 3X range improvement with respect to prior art.

2023

Voltage Scaling-Agnostic Counteraction of Side-Channel Neural Net Reverse Engineering via Machine Learning Compensation and Multi-Level Shuffling (IEEE manuscript, Scholar Bank draft)

This work proposes a voltage scaling-agnostic counteraction against neural network weight reverse engineering via side-channel attacks. Multi-level shuffling and machine learning-based dual power compensation are introduced. State-of-the-art protection with >200 million MTD represents an improvement by 42,600× and 52,600× over unprotected baseline, and 100× better than prior best. The power overhead of 1.76× is also improved by 3.7× over prior best, thanks to the synergistic and relatively simple nature of the proposed techniques (in addition to lower power, thanks to the enabled voltage scaling support).

The same level of security was achieved under voltage scaling at very different dynamic/static power ratios caused by changes in clock frequency and supply voltage. Instead, conventional single power compensation expectedly fails under voltage scaling, with MTD degradation by 1,000×. As further benefit, the proposed protection technique comes with zero latency overhead.

→ TOP ACADEMIC CONTRIBUTOR AWARD AT VLSI SYMPOSIUM

2023

A 0.4-V 12-bit Self-Calibrated SAR ADC with Offset Injection Assist Achieving 0.43 fJ/conv-step (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A 12-bit hybrid SAR ADC with SAR search assisted by comparator offset injection is presented. The proposed ADC architecture reuses the offset calibration circuitry for conversion of the last two LSBs, reducing the capacitive DAC dynamic range requirement for a given unit capacitance , and hence pushing the total capacitance down towards its thermal noise limit. The proposed ADC is self-calibrated with no need for accurate input generation for ubiquitous adoption. The 40-nm testchip achieves a Walden energy FoM of 0.43 fJ/conv-step (lowest in SAR ADCs with ENOB>10 bit), while delivering an SNDR>64.7dB and SFDR>74.3dB.

2023

Super-Cutoff Analog Building Blocks for pW/Stage Operation and Demonstration of 78-pW Battery-Less Light-Harvested Wake-Up Receiver down to Moonlight (YouTube video demo, IEEE manuscript, Scholar Bank draft)

Techniques to lower power in analog and sensor interfaces well below regular transistor leakage (VGS=0V) are introduced. The proposed circuit techniques enable pW power in super-cutoff (VGS<0V) building blocks from current mirrors to OTAs, pseudo-resistors and bias. A LiFi optical wake-up receiver with 78-pW power is demonstrated with continuous operation solely powered by an unregulated 1-mm2 solar cell down to 1 lux (moonlight).

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

→ TOP ACADEMIC CONTRIBUTOR AWARD AT VLSI SYMPOSIUM

2023

38.4-pW, 0.14-mm2 Body-Driven Temperature-to-Digital Converter and Voltage Reference with 0.6-1.6-V Unregulated Supply for Battery-Less Systems (IEEE manuscript, Scholar Bank draft)

A temperature-to-digital converter for low-cost purely-harvested systems is introduced. Its architecture is based on an oscillator pair with PTAT and CTAT frequency via body-driven control, and compact temperature sensors with implicit self-regulation. Supply regulation is eliminated for true sub-100 pW power. State-of-the-art power of 38.4 pW is achieved in 180 nm at 0.6 V with 0.49-°C resolution at 0.14-mm2 area.

→ TOP ACADEMIC CONTRIBUTOR AWARD AT VLSI SYMPOSIUM

2023

Self-Referenced Design-Agnostic Laser Voltage Probing Attack Detection with 100% Protection Coverage, 58% Area Overhead for Automated Design (IEEE manuscript, Scholar Bank draft)

A self-referenced distributed on-chip scheme is introduced to achieve continuous detection of laser voltage probing (LVP) attacks against digital IPs with full-area coverage via temperature sensing. Calibration-free, automated and design-agnostic adoption are enabled by a stdcell-based approach, offering a 2.5X area overhead reduction compared to prior art.

→ TOP ACADEMIC CONTRIBUTOR AWARD AT VLSI SYMPOSIUM

2023

ECC-Less Multi-Level SRAM Physically Unclonable Function and 127% PUF-to-Memory Capacity Ratio with No Bitcell Modification in 28nm (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A multi-level (2 bits/bitcell) SRAM PUF is introduced to uniquely enable ECC-less operation with PUF capacity exceeding storage capacity at no cell modification. The first PUF bit is generated from steady-state post-reset bitcell value with >4X higher stability than conventional power-up. The second is simultaneously extracted from the transient response. Above-storage capacity and improved stability eliminate ECC down to the SRAM V_min (0.6 V) at 75-fJ/bit energy and 3.3% area overhead in 28 nm.

→ TOP ACADEMIC CONTRIBUTOR AWARD AT VLSI SYMPOSIUM

2023

Visual Content-Agnostic Novelty Detection Engine with 2.4 pJ/pixel Energy and Two-Order of Magnitude DNN Activity Reduction in 40 nm (IEEE manuscript, Scholar Bank draft)

An engine to identify frames with novel content in a video stream is proposed as additional vision pipeline stage following conventional saliency detection. Based on connected component analysis with mean-center tracking, its complexity is reduced to linear compared to quadratic in prior art. This introduces frame-level temporal sparsity for subsequent DNN activity/power reduction (177X beyond saliency detection). 2.4 pJ/pixel energy is achieved in 40 nm.

Compared to prior art in near-sensor architectures, the 40-nm novelty detection engine reduces the memory requirement by 22.8-99X over prior vest event detectors. The low memory requirement stems from the on-the-fly architecture arrangement with minimal storage among stages, and the linear computational complexity. The 60-fps max frame rate (3 pJ/pixel energy of saliency+novelty) improves over prior art by 2.4-60X (7.6X or better). Activity reduction via NDE cumulates with saliency for total 465X reduction in a typical face recognition application.

→ TOP ACADEMIC CONTRIBUTOR AWARD AT VLSI SYMPOSIUM

2023

55-pW/pixel Peak Power Imager with Near-Sensor Novelty/Edge Detection and DC-DC Converter-Less MPPT for Purely-Harvested Sensor Nodes (IEEE manuscript, Scholar Bank draft)

A μW-range low-cost imager with on-chip detection of Regions of Interest (ROI) with novel content and power adaptation to purely-harvested sources is proposed. Novel content detection is achieved via inexpensive edge detection and edge point counting. Frame rate adaptation to light is enabled by a DC-DC converter-less MPPT loop for reduced system complexity. 55-pW/pixel always-on power is achieved in a 180-nm testchip powered by a 3.3×3.3-mm2 solar cell down to 100 lux.

2023

Dual-Mode Conversion Gating, Comparator Merging and Reference-Less Calibration for 2.7X Energy Reduction in SAR ADCs under Low-Activity Inputs (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This work introduces a SAR ADC architecture that reduces energy by skipping conversion whenever samples lie within a pre-defined activity window Δ of previous data conversion(s), while simultaneously providing an uncommonly flexible window for signal specificity exploitation and ample design reuse, minimally invasive design at system level, and suppressing any additional accurate circuitry for windowing. In detail, the proposed ADC has a uniquely flexible activity window (both center and width are tunable) and is adjustable, instead of having center/width rigidly set at design time and/or by fixed absolute thresholds. Also, the proposed ADC is minimally invasive at system level as 1) it guarantees uniform sampling and conversion completion at every sample without missing samples, 2) it does not require any additional accurate circuitry such as voltage reference or DAC. A reference-less calibration based on pulse counting is introduced to accurately set the activity window threshold with sub-LSB granularity. Comparator merging inherently compensates offset in normal conversions, reusing calibrated comparators used for ±Δ windowing. A 12-bit 40nm ADC testchip with an energy FOM of 0.95 fJ/convstep shows 2.7X energy saving over a SAR baseline at 22% area overhead.

2022

Picowatt-Power Analog Gain Stages in Super-Cutoff Region with Purely-Harvested Demonstration (YouTube video demo, IEEE manuscript, Scholar Bank draft)

In this work, gain stages with power down to the sub-pW/stage range are introduced to enable always-on mm-scale systems based on either pure harvesting across all practical environmental conditions, or micro-battery with near-shelf life lifetime (e.g., 20 years). The proposed circuit techniques suppress the need for supply voltage regulation, allowing direct harvesting (i.e., no intermediate DC-DC conversion). A CMOS 180-nm fully-differential OTA is shown to consume 0.43-1.26 pW at 0.4-0.6 V as voltage harvested across all practical conditions from 2.16-mm2 solar cell down to 1 lux. The OTA shows 0.8-mV input offset and 18-µV input-referred noise. As an example of its application, a pW-power human grip/touch detection system for always-on event monitoring is demonstrated at light harvesting down to moonlight.

2022

Capacitance-Based Voltage Regulation- and Reference-Free Temperature-to-Digital Converter down to 0.3 V and 2.5 nW for Direct Harvesting (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A temperature-to-digital converter for direct harvesting is proposed, where no DC-DC conversion is required between the DC harvester and the system. Temperature-induced capacitance differences are read out through ring oscillator frequency. PVT variations are suppressed by the differential nature of the temperature sensor architecture, whereas mismatch is compensated via a self-referenced calibration procedure. No reference, regulator, digital post-processing and digital direct temperature readout is needed to retain true-nW and low-Vmin operation. A 180-nm testchip tested across corner wafers shows 7bit ENOB, 2.5-4.5nW from solar and thermal direct harvesting at 0.3-0.5 V, as representative of a very wide range of environmental conditions.

2022

Imager with Dynamic LSB Adaptation and Ratiometric Readout for Low-Bit Depth 5-µW Peak Power in Purely-Harvested Systems (YouTube video demo, IEEE manuscript, Scholar Bank draft)

An imager with µW peak power is introduced for purely-harvested operation. The LSB is dynamically adapted to the light intensity of the scene for aggressive bit depth down-scaling, avoiding traditional dynamic range over-margining across practical light intensities under fixed LSB. Ratiometric readout of pixel current cancels threshold voltage mismatch. A 256×256-pixel 180-nm imager shows 5-uW power at 1 fps and 4 bits, while keeping ImageNet classification accuracy drop to percentage points under 75-dB ambient light range, across original and brightness-adjusted images.

2022

On-Chip Laser Voltage Probing Attack Detection with 100% Area Coverage at Above/Below the Bandgap Wavelength and Fully-Automated Design (YouTube video demo, IEEE manuscript, Scholar Bank draft)

An on-chip detection scheme against Laser Voltage Probing (LVP) attacks is introduced. It enables digital IP full-area coverage, and its architecture preserves automated design and stdcell layout discipline (including restricted design rules). Stdcells with laser sensing are proposed with photocurrent sensitivity up to 10 pA/m2 both above/below the bandgap wavelength, inherent PVT resilience, and no calibration.

2022

Fully-Digital Broadband Calibration-Less Impedance Monitor for Probe Insertion Detection against Power Analysis Attacks (YouTube video demo, IEEE manuscript, Scholar Bank draft)

In this work, a broadband supply impedance monitor is proposed to detect insertion of probing devices or package/PCB modifications in secure systems. The fully-digital architecture allows automated and portable design, and its compact area allows under-pad placement for inexpensive adoption. Ratiometric measurements suppress variations at no calibration, temporal zooming enhances sensitivity to reactance. Detection of several attacks with different probing devices is demonstrated with a 28-nm testchip up to 2 GHz.

2022

DDPMnet: All-Digital Pulse Density-Based DNN Architecture with 228 Gate Equivalents/MAC Unit, 28-TOPS/W and 1.5-TOPS/mm2 in 40nm (YouTube video demo, IEEE manuscript, Scholar Bank draft)

Relentless advances in DNN accelerator energy and area efficiency are demanded in low-cost edge devices. Both directly benefit from the reduction in the complexity of MAC units (neurons), thanks to the reduction in area and energy of computations and the interconnect fabric. In this work, the all-digital DDPMnet architecture for DNN acceleration based on a pulse density data representation is introduced to reduce the gate count/MAC unit from the thousand range to few hundreds. The proposed architecture removes any arithmetic block from MAC units (e.g., multipliers), while retaining the advantages of standard cell based design.

2022

Side-Channel Attack Counteraction via Machine Learning Targeted Power Compensation for Post-Silicon HW Security Patching (YouTube video demo, IEEE manuscript, Scholar Bank draft)

Machine learning-based side-channel attack counteraction is presented for security upgradeability via retraining, upon vulnerability discovery after deployment. Based on stdcell design, direct compensation of information-leaking power contributions reduces the power overhead over conventional indiscriminate compensation of total power fluctuations. A 40nm chip demonstrates design reuse across crypto algorithms, patching against a new attack to PRESENT, and AES protection under >1.2B traces.

→ BEST PAPER AWARD AT ISSCC 2023

2021

A 109TOPS/mm2 and 749-1,459TOPS/W SRAM Buffer with In-Memory Inference and Prediction-Less Bitline Activity Reduction in 28nm (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This work presents an SRAM macro for continuous data buffering, simultaneous (always-on) in-memory computing for area- and energy-efficient event detection, and energy-efficient bulk read upon event occurrence for off-memory deeper data insights. The bitcell with non-precharged operation enables signed current summing for energy-efficient inference, 90% bitline activity reduction in conventional reads without any data prediction circuitry. Hence, the proposed SRAM uniquely enhances energy efficiency both in read access and in-memory compute operation. The proposed architecture allows simultaneous inference and buffering (write) of incoming data for continuous-sensing applications with uninterrupted input data streams or samples (e.g., for computer vision). Circuit reuse and sharing make the area overhead low (17.6%) over the same array without in-memory compute capabilities. The proposed SRAM achieves 109TOPS/mm2 area efficiency and 749-1,459TOPS/W energy efficiency in 28nm under neural net workloads.

2021

A 1448-Mpixel/s, 84-pJ/pixel Display Stream Compression Encoder in 28 nm for 4K Video Resolution (IEEE manuscript, Scholar Bank draft)

A VESA Display Stream Compression (DSC) video encoder architecture in 28 nm for 4K-resolution virtual reality headsets and smartphones is presented. The pipelined prediction loop shortens the critical path and enables time-interleaving, same-row serial slice processing and logic reuse across frame slices. A component-wise memory architecture with dynamic allocation reduces the required buffer capacity by 48%. Energy and area efficiency improvements of up to 2.5X and 1.75X are achieved compared to conventional parallel multi-slice architectures. This first published DSC encoder chip achieves 1,448 Mpixels/s and down to 84 pJ/pixel at 4K UHDTV, enabling integration in battery-powered portable and wearable systems.

2021

Trimming-Less 0.2-V, 3.2-pW Voltage Reference Based on Corner-Aware Replica Combination with 1.6% Process Sensitivity, 1.4-mV Accuracy across PVT and Wafers (IEEE manuscript, Scholar Bank draft)

This work introduces a class of voltage references able to operate down to 3.2 pW and 0.2-V supply for energy harvesting with relaxed or suppressed voltage regulation (direct harvesting). Inherent wafer-to-wafer process sensitivity limitations and effect of process corners in deep sub-threshold are mitigated via a selection/combination of circuit replicas driven by a process sensor, at zero testing effort and trimming. A 180-nm testchip shows 1.6% process sensitivity (including wafer-to-wafer variations), 60.7-µV/V line sensitivity, and 34.9-µV/oC temperature coefficient, leading to 1.4-mV overall accuracy across corner wafers.

2021

Battery-Less IoT Sensor Node with PLL-Less WiFi Backscattering Communications in a 2.5-µW Peak Power Envelope (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A system on chip including 802.11b WiFi communications is introduced to demonstrate battery-less operation for low-cost mm-scale sensor nodes. µW peak power is enabled by PLL-less WiFi backscattering communications and event-driven frequency regulation to compensate environmental variations. A 180nm testchip integrating the entire signal chain from any of four sensor interfaces to wireless communications with a commercial WiFi router exhibits 2.5µW total power.

2021

Fully-Digital Self-Calibrating Decoder with Sub-µW, 1.6fJ/convstep and 0.0075mm2 per Receptor for Scaling to Human-Like Tactile Sensing Density (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This work presents an area- and energy-efficient decoder for tactile e-skin sensing encoding to scale up receptor density to the human scale. A fully-digital signal-adaptive receptor interface and event decoder architecture are introduced, leveraging temporal/spatial tactile signal sparsity to dynamically reduce activity and time resolution at negligible accuracy degradation. A novel reference-less self-calibrating senseamp is introduced to cancel offset by exploiting the statistical balance of spread-spectrum tactile pulses and noise. The 40nm testchip shows 1.6-fJ/convstep energy (0.0075mm2 area) per receptor with 50X (5X) improvement over prior art, and 80-receptor e-skin aggregation on a single pad.

2021

Rail-to-Rail Dynamic Voltage Comparator Scalable down to pW-Range Power and 0.15-V Supply (IEEE manuscript, Scholar Bank draft)

An ultra-low voltage, ultra-low power rail-to-rail dynamic voltage comparator solely based on digital standard cells is presented. Thanks to its digital nature, the comparator can be designed and integrated with fully-automated digital design flows and can operate at very low voltages down to deep sub-threshold. Measurements on an 180nm testchip show correct operation under rail-to-rail common-mode input at a supply voltage ranging from 0.6V down to 0.15V. The minimum supply voltage and power are the lowest reported to date, and make the circuit suitable for direct powering from mm-scale harvesters.

2021

A 0.6-to-1.8V Trimming-Less CMOS Current Reference with Near-100% Power Utilization (IEEE manuscript, Scholar Bank draft)

In this work, a current reference is proposed to introduce the new capability of operating under wide supply voltage ranges and at near-100% power utilization, as necessary in resource-constrained systems such as IoT sensor nodes. Operation from near-threshold (0.6 V) to nominal voltage (1.8 V) is demonstrated. The proposed reference uniquely limits the power absorbed by the peripheral circuitry to only 0.1% of the overall power, thus utilizing 99.9% of it for the intended output current. As demonstrated in a 180-nm testchip (15 tested dice from the same manufacturing lot), the near-100% power utilization with its compact area of 4,000 um2 allows power- and area-frugal reference current generation.

2021

A 300mV-Supply, sub-nW-Power Digital-Based Operational Transconductance Amplifier (IEEE manuscript, Scholar Bank draft)

An ultra-low voltage and ultra-low power Digital-Based Operational Transconductance Amplifier (DB-OTA) is presented and demonstrated on silicon in 180 nm CMOS. The DB-OTA is designed using digital standard cells, hence benefitting from technology scaling as much as digital circuits, while also being technology- and design-portable, and requiring minimal design and integration effort compared to conventional analog-intensive OTAs. The fabricated DB-OTA testchip occupies a compact area of 1,426 μm2, operates at supply voltages down to 300 mV, and consumes only 590 pW while driving a capacitive load of 80pF. Its measured Total Harmonic Distortion (THD) is lower than 5% at a 100-mV input signal swing. Based on these results, the proposed DB-OTA achieves 2,101 V-1 small-signal figure of merit (FOMS) and 1,070 large-signal figure of merit (FOML). To the best of the authors’ knowledge, the power is the lowest reported to date in an OTA, and the achieved figures of merit are the best in sub-500 mV OTAs reported to date. The low cost, the low design effort and the high power efficiency of DB-OTA make it well suited for purely harvested low-frequency analog interfaces in sensor nodes.

2021

Unified In-Memory Dynamic (TRNG) and Multi-Bit Static (PUF) Entropy Generation for Ubiquitous Hardware Security (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This work introduces the first unified in-memory True Random Number Generator (TRNG) and Physically Unclonable Function (PUF) for complete and inexpensive key generation within an SRAM array. TRNG is based on time-to-digital conversion of jitter accumulated at bitline discharge from leakage. Multi-bit per bitcell PUF is achieved by binning the discharge rate difference of bitline pairs. A 28 nm testchip shows TRNG at 16,000 F2 area per output stream, and 2-bit/bitcell PUF with 6.4 Gbps, 78 fJ/bit energy.

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2021

Capacitance-to-Digital Converter for Operation under Uncertain Harvested Voltage down to 0.3V with No Trimming, Reference and Voltage Regulation (YouTube video demo, IEEE manuscript, Scholar Bank draft)

This work introduces the first capacitance-to-digital converter (CDC) for low-cost systems that are directly powered by a harvester. The CDC is based on swappable oscillators and does not require any additional circuitry, suppressing any reference and voltage regulation. Load-agnostic self-calibration eliminates the need for a specific test load and testing-time trimming. A 180nm testchip shows 7-bit ENOB down to 0.3V and 1.37-nW overall power, when powered by a 1-mm2 indoor solar cell.

2020

Fully-Digital Rail-to-Rail OTA with Sub-1,000 µm2 Area, 250-mV Minimum Supply and nW Power at 150-pF Load in 180nm (IEEE manuscript, Scholar Bank draft)

A fully-digital operational transconductance amplifier (DIGOTA) architecture for tightly energy-constrained low-cost systems is presented. A 180nm DIGOTA testchip exhibits an area below the 1,000-µm2 wall, and 2.4-nW power under 150pF load, and a minimum supply voltage Vmin of 0.25 V. In the 0.3-0.5 V supply range, DIGOTA improves the area-normalized small (large) signal energy FoM by at least 836X (267X) over prior sub-500mV OTAs, while reducing area by 27-85X. The low-Vmin and nW-power features are shown to enable direct harvesting at the mm scale.

2020

A Robust, High-Speed and Energy-Efficient Ultralow-Voltage Level Shifter (IEEE manuscript, Scholar Bank draft)

This work presents a robust level shifter design able to convert input voltages from the deep sub-threshold regime (about 100 mV) up to the nominal supply voltage (1.8 V). The proposed circuit is based on a self-biased low-voltage cascode current mirror (CM) topology that features diode-connected PMOS and NMOS transistors to drive the split-input inverting buffer used as output stage with high energy efficiency. Experimental results across corner wafers demonstrate the effectiveness of the proposed level shifter as compared to prior art. The proposed circuit allows a voltage up-conversion from a 0.4-V 100-kHz input pulse to 1.8 V with an average switching delay of 7.6 ns and an average energy per transition of only 69 fJ. This is achieved at an area of 82 µm2 for a standard cell-based design.

2020

Fully-Synthesizable All-Digital Unified Dynamic Entropy Generation, Extraction and Utilization within the Same Cryptographic Core (IEEE manuscript, Scholar Bank draft)

This work introduces a novel class of fully-synthesizable all-digital True Random Number Generators (TRNGs) using the same private-key cryptographic core for raw dynamic entropy generation, its extraction via post-processing, and its utilization as crypto-key for constrained secure systems. Endogenous random bit generation is achieved via clock pulsewidth overstretching in the digital implementation of private-key cryptographic algorithms using pulsed-latch pipelines, leveraging inherent Shannon confusion and diffusion. Demonstration on a 40-nm testchip based on a SIMON cryptographic core shows 64-bit key encryption down to 0.25 pJ/bit at 0.45 V, random number generation with cryptographic-grade entropy at 2.5 pJ/bit across manufacturing lots, dice, voltages and temperature corners. The overall area is kept well below the 1E6 F2 area wall (F = minimum feature size).

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2020

Broad-Purpose In-Memory Computing for Signal Monitoring and Machine Learning Workloads (IEEE manuscript, Scholar Bank draft)

In this work, a broad-purpose compute-in-memory solution (±CIM) able to handle arbitrary sign in both inputs/features and weights/coefficients is introduced. The ability to operate on arbitrary sign and under variable precision on both operands enables a wide range of applications, ranging from conventional neural networks to digital signal processing and monitoring. The ±CIM pipelined architecture, the reconfigurable row encoder and the adoption of a commercial 2-port bitcell allow uninterrupted memory availability for conventional read/write, even when performing in-memory computations. A 40nm testchip shows the ability of the ±CIM architecture to perform both neural network computations and classical signal processing. At 6-bit precision, the measured worst-case mismatch (noise) is 0.38 (0.62) LSB. The achieved accuracy when executing a LeNet-5 neural net workload is 98.3%, which is within 1.3% of state-of-the-art software implementations. As example of signal processing workload, 91.7% accuracy is achieved in voice activity detection, which is within 2.8% of a software implementation. Overall, the energy efficiency (throughput) of 41 TOPS/W (122 GOPS) is achieved at 38% area overhead, over a conventional SRAM with the same 4-KB capacity.

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2020

Voice Activity Detection with >83% Accuracy under SNR down to -3dB at 1.19µW and 0.07mm2 in 40nm (IEEE manuscript, Scholar Bank draft)

This work presents a voice activity detector for keyword spotting in self-powered speech interfaces with sub-syllable latency. A simple decision stump classifier and time averaging are introduced to provide >83% accuracy in noisy environments with SNR down to -3dB for reliable operation under a wide range of usage contexts (8-15dB lower than prior art). 1.19-µW power and 0.07mm2 area are shown in 40nm.

2020

Multi-Sensor Platform with Five-Order-of-Magnitude System Power Adaptation down to 3.1nW and Sustained Operation under Moonlight Harvesting (Always-On even without Battery) (YouTube video demo, IEEE manuscript, Scholar Bank draft)

A sensor node with system power tuning is presented for 5-order-of-magnitude adaptation to harvested power. Coordinated tuning of unified voltage/capacitive/light sensor interface, MCU and direct MPPT with no intermediate power conversion scales system power to 3.1nW at 0.3V. Operation at 1lux (moonlight) with 4.1×4.1mm2 light harvester is shown.

The power (frequency) dynamic range is 110,000× and down to 480 pW (50,000× up to 2 MHz). Power-speed scaling of the sensor interface is similar to the MCU across 5 orders of magnitude, across which no sub-system sets a rigid power floor down to 3.1nW. The proposed platform operates under direct harvesting with a solar cell with 4.1mm×4.1mm active area down to 1lux, corresponding to moonlight harvesting for the first time. This paves the way for next-generation battery-light and battery-less always-on systems that do not miss any physical event in spite of the highly-fluctuating nature of energy harvesting.

2020

Voltage Reference with Lowest Operating Voltage down to 0.25 V and pW Power for Direct Harvesting and Battery-Less Systems (IEEE manuscript, Scholar Bank draft)

This work introduces a compact voltage reference operating at pW-power and 250-mV supply (e.g., direct harvester-powered). Body biasing assisted by replica biasing enables 25µV/oC temperature coefficient, 140µV/V line sensitivity, and 0.42mV process sensitivity in 180nm. 2.55-mV overall accuracy is achieved at 2,200µm2 area, without trimming. Operation at such low voltage and power introduces the capability to suppress the power-hungry intermediate DC-DC conversion stage of conventional sensor node architectures, and suppression of the battery altogether.

2020

Ultra-Compact Current- and Voltage-Input Analog-to-Digital Converters with Minimal Design Effort (“ADC in a day”) (IEEE manuscript, Scholar Bank draft)

Fully-synthesizable Successive Approximation Register (SAR) Analog-to-Digital Converters (ADCs) suitable for low-cost integrated systems are proposed both for voltage and current input. The proposed fully-digital ADC architectures enable low-effort design, silicon area reduction, and voltage scaling down to the near-threshold region. Compared to traditional analog-intensive designs, their digital nature allows easy technology and design porting, digital-like area shrinking across CMOS technology generations, and also drastically reduced system integration effort through immersed-in-logic ADC design. The voltage-input ADC architecture is demonstrated with a 40-nm testchip showing 3,000-μm2 area, 6.4-bit ENOB, 2.8kS/s sampling rate, 40.4dB SNDR, 49.7dB SFDR, and 3.1μW power at 1V. A current-input ADC is also demonstrated for direct current readout without requiring a trans-resistance stage. 40-nm testchip measurements show a 5-nA to 1-μA input range, 4,970μm2 area, 6.7-bit ENOB and 2.2-kS/s sample rate, at 0.94-μW power. Compared to the state of the art, the proposed ADC architecture exhibits the highest level of design automation (standard cell), lowest area, and the unique ability to cover direct acquisition of both voltage and current inputs, suppressing the need for transresistance amplifier in current readout.

2020

Low-Energy Voice Activity Detection via Energy-Quality Scaling from Data Conversion to Machine Learning (IEEE manuscript, Scholar Bank draft)

In this work, voice activity detection (VAD) systems with system-level energy-quality (EQ) scaling have been demonstrated. Compared to prior single-knob EQ scaling, multiple EQ knobs are selectively inserted into the entire signal chain from end to end (i.e., from data conversion to classification). EQ knobs are dynamically co-optimized to minimize energy for a given quality target. Multi-knob energy-quality scaling makes quality degradation more graceful than single-knob, allowing for more aggressive energy reduction under a given quality target, while retaining the ability to operate at full quality. Also, proper system-level EQ optimization enhances fitting in machine learning-based systems (e.g., decision tree-based), suppressing both underfitting and overfitting. Measurements on a 28nm testchip show that system-level EQ scaling can reduce energy by up to 3.5X at 2% accuracy degradation in 10-dB noise, compared to full quality. Iso-technology comparison shows that the minimum energy of 51.9 nJ/frame is lower than prior art by 1.9-74.4X at comparable speech/non-speech hit rates.

2020

Deep sub-pJ/bit sub-10^6 F^2 energy-security scalable SIMON crypto-core (IEEE manuscript, Scholar Bank draft)

This work introduces an energy-security scalable crypto-core for private-key cryptography in low-end sensor nodes based on SIMON cipher. Energy and area footprints are reduced through techniques at the algorithm, microarchitectural and gate level. The 40 nm testchip shows energy down to 0.31 pJ/bit at 0.45 V with 64-bit key and 0.79E6 F2 area (F = process minimum feature size). The proposed crypto-core is well suited for ubiquitous security in energy/area-constrained platforms (e.g., low-end sensor nodes, RFIDs), while preserving full 256-bit security when necessary.

2019

Reconfigurable microcontroller/memory organization for energy-performance extension beyond voltage scaling (IEEE manuscript, Scholar Bank draft)

This work introduces reconfigurable thread count augmentation for existing microcontroller architectures, and row aggregation for their dedicated SRAM memory, to extend their energy-performance tradeoff beyond traditional voltage scaling, while at minimal design effort (“drop-in”). The proposed techniques are architecture-agnostic as the added reconfigure-ability does not modify the original instruction execution down to the cycle level. Reconfiguration permits to occasionally boost the throughput of simple architectures that were originally not conceived to allow multi-thread operation, while allowing the original single-thread operation in less performance-critical tasks. From a design viewpoint, thread count augmentation is fully automated and directly manipulates the gate-level netlist of an existing single-thread processor, allowing its application to commercial Intellectual Property cores (even if obfuscated by the IP vendor). Similarly, SRAM row aggregation can be applied on commercially compiled 6T SRAM arrays with minor modification in the row decoder. A 40nm ARM Cortex-M0 testchip shows 1.8X (1.4X) core (memory) performance boost beyond a baseline at nominal voltage, 1.4X lower minimum energy point at only 16% (4%) area (timing) overhead, and lowest energy/cycle to date.

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2019

First Physically Unclonable Function with design margin reduction via in-situ and PVT sensor fusion for low-cost hardware security (IEEE manuscript, Scholar Bank draft)

This work introduces a Physically Unclonable Function-based (PUF) key generation scheme with run-time in-situ instability detection and process/voltage/temperature (PVT) sensors. Such sensors are fused to evaluate the sufficient number of correction bits NECC required by Error Correcting Code (ECC) to make the PUF output stable, and meet a given bit error rate target. Run-time sensing overcomes the substantial ECC energy penalty associated with the traditional design-time margin of NECC for worst-case word, die, voltage and temperature. ECC with tunable NECC is introduced to enable energy saving in typical cases where NECC is lower than its worst-case value. Sensor fusion via simple linear regression estimates the required NECC at run-time. A testchip in 40nm demonstrates the concept, based on a static monostable current mirror PUF with NECC = 0…4. Average energy reduction by 1.8X is shown compared to a traditional margined design, at an area overhead of less than 20%. As additional benefit of adjustable NECC, such energy savings can be further expanded under applications having less stringent stability requirements.

2019

Standard Cell-Based Ultra-Compact DACs in 40-nm CMOS (IEEE manuscript, Scholar Bank draft)

Ultra-compact, high-resolution, standard cell-based DACs based on the Dyadic Digital Pulse Modulation (DDPM) are presented. As fundamental contribution, an optimal sampling condition is analytically derived to enhance conversion with inherent suppression of spurious harmonics. Operation under such optimal condition is experimentally demonstrated to assure resolution up to 16 bits, with 9.4-239X area reduction compared to prior art. The digital nature of the circuits also allows extremely low design effort in the order of 10 man-hours, portability across CMOS generations, and operation at the lowest supply voltage reported to date. A DAC for DC calibration achieving 16-bit resolution with 3.1-LSB INL, 2.5-LSB DNL, 45µW power, at only 530µm2 area is demonstrated in 40nm CMOS.

2019

First energy-quality scalable Network on Chip with best-in-class energy (down to 6.9fJ/bit), while still being using conventional low-swing transmitter/receiver circuits (IEEE manuscript, Scholar Bank draft)

A new class of ultra-low energy on-chip links is introduced. Through the use of sub-word ranking and non-uniform swing, the proposed links allow graceful energy-quality tradeoff in intra-chip communication links for noise-resilient applications such as machine learning and video processing. The proposed techniques are demonstrated in a 28nm testchip that achieves up to 4.5X energy saving over conventional full-quality links, and up to 2.2X over approximate links at iso-quality. Conventional operation with no quality degradation is also allowed for data packets that require full quality.

2019

First always-on system architecture for widely adaptive and power-scalable MCU/PMU from sub-mW to true nW, enabling battery-less and battery-indifferent operation (IEEE manuscript, Scholar Bank draft)

The proposed integrated system architecture consists of a power management unit (PMU) driving a microcontroller, and controlling a novel power knob that enables adaptation to the sensed power availability over an ultra-wide range, well beyond voltage scaling and down to nW level. Conventional battery-powered operation is augmented with pure harvesting. Wide power adaptation is enabled by comparator delay self-biasing and zero-current switching scheme shared among all power modes with single-cycle convergence.

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2019

First DAC architecture with digital-like shrinking under scaled technologies, and exhibiting graceful degradation under voltage/frequency overscaling (IEEE manuscript, Scholar Bank draft)

The proposed DAC allows very low design effort, enables digital-like shrinkage across CMOS generations, low area at down-scaled technologies, and operation down to near-threshold voltages. The proposed DAC can operate at supply voltages that are significantly lower and/or at clock frequencies that are significantly greater than the intended design point, at the expense of moderate resolution degradation. In a 12-bit 40-nm testchip, graceful degradation of 0.3bit/100mV is achieved when is over-scaled down to 0.8V, and 1.4bit/100mV when further scaled down to 0.6V.
The proposed DAC enables dynamic power-resolution tradeoff with 3X (2X) power saving for 1-bit resolution degradation at iso-sample rate (iso-resolution).

2018

Relaxation oscillator for sensor nodes with lowest power to date (pW-range), operating under 0.3V-1.8V unregulated supply without any reference/bias circuitry (IEEE manuscript, Scholar Bank draft)

A pW-power versatile relaxation oscillator operating from sub-threshold (0.3V) to nominal voltage (1.8V) is presented, having Hz-range frequency under sub-pF capacitor. The wide voltage and low sensitivity of frequency/absorbed current to the supply allow the suppression of the voltage regulator, and direct powering from harvesters (e.g., solar cell, thermal from machines) or 1.2-1.5V batteries. A 180nm testchip exhibits a frequency of 4Hz, 10%/V supply sensitivity at 0.3-1.8V, 8-18pA current, 4%/°C thermal drift from -20°C to 40°C.

2018

The first microcontroller (MSP430) that can operate at the minimum-energy or the minimum-power point, with minimum power of 595pW (purely harvested in minimum-power mode) (IEEE manuscript, Scholar Bank draft)

This work presents an MSP430-compatible microcontroller with dual-mode standard cells enabling minimum-power and minimum-energy mode in 180nm. Minimum-power mode with sub-leakage power (595pW) allows purely energy harvested operation with sub-mm2 harvester. Minimum-energy mode (14-33pJ/cycle) maximizes battery lifetime, when battery-powered. Power management with ripple power gating self-startup allows cold start with on-chip 0.54mm2 solar cell at 55lux light condition.

2017

First sub-mW feature extraction engine for ubiquitous computer vision and IoT (IEEE manuscript, Scholar Bank draft)

An energy-quality scalable (EQSCALE) feature extraction accelerator for IoT vision applications is presented. Knobs are introduced to dynamically adjust the tradeoff between energy and feature extraction quality, leveraging the intrinsic redundancy in video frames and the robustness of object recognition against missing features. The active area of the accelerator is 0.55mm2. EQSCALE enables at least 5.7X energy improvement and 1.8X area reduction over state-of-the-art accelerators. To the best of our knowledge, EQSCALE is the first feature extraction accelerator operating in the sub-mW range (0.51mW at VGA resolution and 30 fps, and 0.19mW at 5 fps).

2017

First fully-synthesizable PUF (“PUF design in a day”) and active temperature compensation with native 2.8% BER, 1.02fJ/b at 0.8-1.0V in 40nm (IEEE manuscript, Scholar Bank draft)

A fully-synthesizable Physically Unclonable Function (PUF) with hysteresis-enhanced stability and active compensation of temperature variations is proposed. To reduce undesired bit flips, hysteretic behavior is obtained through the insertion of a Muller C-element output stage. A feedback scheme is also introduced to compensate the effect of temperature variations at run time. Native worst-case BER of 2.8% is measured under 0.8-1.0V and 25-85°C, with instability degradation with temperature being 0.15% per 10°C. The PUF bitcell consumes 1.02fJ/b at 0.9V. This PUF can be designed with fully automated standard cell-based flows, thus enabling substantial design effort reduction compared to prior art based on custom design styles.

→ INVITED PAPER ON IEEE Journal of Solid-State Circuits

2017

First reconfigurable microarchitecture down to the pipestage level for wide energy/voltage scaling (demonstration on FFT engine) (IEEE manuscript, Scholar Bank draft)

Dynamically adaptable pipelines with its full integration with automated digital flows at design time and with dynamic voltage scaling schemes at run time is demonstrated with a 256-point radix-4 fixed-point FFT engine on a 40-nm test chip. Measurements show energy savings up to 30% (38%) at iso throughput (iso-voltage). Area and worst-case performance penalty are 5% and 11%, respectively.

2017

First demonstration of reconfigurable clock networks for adaptation under wide voltage scaling (IEEE manuscript, Scholar Bank draft)

A reconfigurable clock network design for operation from sub-threshold to nominal voltage is presented. The number of llevels is adjusted with more levels at nominal voltage to mitigate the impact of wire delay, and fewer in sub-threshold to mitigate the dominant random skew due to repeaters. Clock skew is reduced by up to 2.5 standard deviations and enables 110mV Vmin reduction at 1.8% area penalty in an FFT 40nm testchip, compared to traditional clock networks.

2015

15-fJ/bit Static Physically Unclonable Functions for Secure Chip Identification with <2% Native Bit Instability and 140X Intra/Inter PUF Hamming Distance Separation in 65nm (IEEE manuscript, Scholar Bank draft)

A static class of Physically Unclonable Functions for secure key generation and chip identification is presented. Energy down to 15 fJ/bit is achieved, key reproducibility and uniqueness meet inter/intra-PUF Hamming distance separation of 140X or greater, randomness passes all NIST tests. Native unstable bits are less than 2% at nominal conditions and less than 5% in 0.7-1 V voltage and 25-85 oC temperature range, before applying any further post-silicon technique for stability enhancement.