Research Programme in Assuring Hardware Security by Design in Systems on Chip

Follow us on

Message from the lead principal investigator

Project Overview
The grand goal of “CogniVision” is to enable the unprecedented capability of performing ubiquitous real-time vision through novel silicon chips that are untethered, always-on and nearlyperpetual, ultra-miniaturized (<100 mm3 ), inexpensive (~1$). From a broad viewpoint, CogniVision introduces a new class of cameras that are “cognitive” and “attentive”. CogniVision cameras are cognitive as they are able to constantly make sense of the scene through extremely energyefficient circuits for best-in-class machine learning algorithms, i.e. deep learning based on convolutional networks. In the last few years, deep learning and convolutional networks have been extensively demonstrated to achieve outstanding accuracy, and to exhibit an uncommon degree of flexibility as they can be restructured (e.g., adjusting number of layers and weight values) to perform a very wide range of vision tasks. Indeed, deep learning has become the de facto standard framework for image and video processing, with remarkable success in content understanding, face detection, object detection and tracking, image classification and segmentation, pedestrian detection, loiterer detection, abandoned luggage detection. Deep learning is an ideal framework for silicon accelerators due to easy upgradeability, and generality of its framework. A given neural network is able to perform either a specific or a range of tasks (e.g., multi-task networks), but it cannot cover the entire range of all possible applications of distributed vision. To achieve broad coverage, the straightforward solution of storing a wide variety of networks on the same cognitive camera chip is not feasible, given the large amount of memory generally required for each network, and the limited memory available on chip (various MBs, currently). 

Also, this approach would prohibit important capabilities such as 
1) respond to time-varying requirements of the “cloud” server gathering the output of many cameras (e.g., request to perform a new task or occasionally send entire frames, as triggered by events captured by neighboring cameras, based on global understanding of the cloud) 
2) upgrade the neural network, using its innate ability to be refined via retraining with new data 
3) save power when degraded quality in processing (e.g., approximations) is tolerable for less visually demanding tasks (e.g., optical character recognition simpler than object detection). 

A suitable approach to achieve these capabilities is to allow the cloud to push neural network configurations onto individual cameras, which in turn need to be responsive and receptive of the related commands from the cloud. Accordingly, cognitive cameras also need to be attentive, i.e. listen to commands wirelessly sent by the cloud, hence requiring an always-on radio receiver. In general, nearly-perpetual always-on operation is pursued by harvesting power from the environment, which limits the power consumption of CogniVision cameras to ~1 milliwatt to maintain the system volume well below 100mm3 (e.g., provided by a 0.1-mm thick, 5-cent, 1-2 cm wide organic photovoltaic foil attached to a wall, with a stacked 0.4-mm equally sized battery and on-foil printed antenna, all commercially available). Reducing the power consumption of cognitive cameras down to the 1mW range is the fundamental objective of this project. This entails a power reduction by at least 20-30X compared to the most power-efficient existing cameras that constantly monitor the scene with resolution and frame rate that are adequate for distributed monitoring and surveillance (e.g., VGA resolution, 30 frames/s). Cognitive cameras with power down to 1mW will be enabled by drastically limiting the amount of data transmitted wirelessly to the server cloud that makes sense of the scene, thus substantially reducing the traditionally large power due to the transmission of entire video frames (e.g., 40-50 mW with MPEG-compressed VGA frame, Bluetooth Low Energy transmission). This is accomplished by embedding substantial sensemaking capability (e.g., object detection) into the camera silicon chip, leveraging the recent impetuous advances in deep learning and convolutional neural networks (widely adopted by Google, Facebook, Microsoft). As paradigm shift, CogniVision moves sensemaking from the cloud to cognitive cameras, keeping the power in the mW range in spite of the traditionally high computational complexity of deep learning. This will be achieved via innovation on energy-efficient circuits/architectures for sensemaking (see “Approach” section), including a novel digital energyquality scalable architecture for general-purpose on-chip acceleration of convolutional networks with energy efficiency of 50TOPS/W or better, i.e. 10-20X more energy-efficient than the state of the art. Its ability to execute any convolutional network makes it applicable to the very wide (and ever-expanding) range of applications of convolutional networks, as long as the network fits the on-chip available memory and processing array size, as discussed in the “Subprojects” section. 

Being “attentive”, CogniVision cameras have also the capability to be responsive to the cloud, and occasionally be reprogrammed by the cloud in the following ways: 
1) transmit a short series of frames to be processed directly by the cloud (e.g., if the visual task exceeds the cognitive capabilities of the camera); 
2) update the neural network to a different one (i.e., uploading layer structure and weights), when the cloud requests a substantial change in the visual task executed by the camera (e.g., the cloud needs to identify very specific objects in a given area being covered by some of the cameras); 
3) statically adjust on-chip energy-quality knobs that can save energy in vision tasks where lower processing accuracy or arithmetic precision are tolerable (e.g., less demanding visual tasks such as optical character recognition, as compared to more challenging tasks such as object detection. 

As side benefit, cognitive cameras solve the traditional issue of data deluge in distributed vision systems. Indeed, frames from cameras are traditionally transmitted wirelessly to the cloud, involving large data volumes (~20 cameras exhaust the capacity of a wireless LAN, Internet video traffic is increasing alarmingly fast). This is avoided in cognitive cameras, as the transmitted data volume is reduced by several orders of magnitude (from preliminary simulations, they transmit at a data rate of ~1-10kbps on average, as opposed to several MBs in traditional cameras). Regarding the timeliness of the CogniVision project, embedding vision in energy-autonomous nodes has been pursued for a decade with very limited success, due to the excessive power consumption required by on-chip processing. We are now witnessing the convergence of three technology trends, which are reshaping the areas of machine learning for computer vision and ultra-low power chips. On one hand, deep convolutional neural networks have made tremendous advances in terms of vision capability, although at substantial power and memory cost that is beyond the capabilities of energy-autonomous systems. Their power power is now reaching the tens of mW range after two very intense years of research in deep learning accelerators. Simultaneously, fundamental advances have been recently made in the area of energy-quality scalable integrated circuits and systems (including deep learning accelerators and vision processors), where substantial reduction in the intensity of computation and energy is achieved when moderate reduction in the quality of processing/sensing (e.g., arithmetic precision) is tolerable by the vision task at hand. Also, fundamental advances have been recently made in image sensor design, introducing the ability to embed simple in-sensor processing with low energy cost, limiting the expensive centralized processing requiring full frame readout. As convergence of the above trends, CogniVision leverages the well-known exceptional robustness of deep learning/vision against inaccuracies to exploit energyquality scaling and simple in-sensor processing, which justify the timeliness of the project. Recent market trends confirm the timeliness of CogniVision, and the expectable importance that smart untethered cameras will have in the years to come. For example, in December 2017 Amazon has acquired the wireless camera company Blink; in October 2017 Google has released the CLIPS wireless camera. Although the capabilities of such cameras are currently limited (e.g., actual lifetime from 3-5 hours with continuous shooting to 2- 5 weeks, they simply record clips when event occur), this clearly shows a technological and market interest in ubiquitous vision. In 2017 Qualcomm announced the intention to pursue a research project on low-resolution (320x240) cameras for smart toys/appliances with low recognition capabilities (e.g., single object detection, ambient light sensing). None of the available cameras can interact with the cloud in real time (i.e., they are not “attentive”). As another example, in March 2018 Sony and other companies formed the NICE alliance to support the creation of a prospective generation of cameras with on-board analytics. Ubiquitous cognitive cameras can provide novel technological capabilities and societal benefits, enabling for the first time situational awareness with fine spatial granularity across wide areas (from building to city scale). Examples of targeted applications are ubiquitous/augmented surveillance, vehicle/pedestrian detection, intelligent transportation, crowd monitoring, industrial plant monitoring, warehouse management, detection of dangerous objects, disaster management, among the others. In short, CogniVision empowers the Internet of Things (IoT) (i.e., ubiquitous sensor augmentation of the Internet) with the sense of vision, for the first time. As IoT is the next “big wave” of technology (45% annual growth, global value of 11T$ by 2025), CogniVision will leverage its capabilities and potential growth to create economic value in Singapore, accelerating the Smart Nation vision.

The success of CogniVision will provide a unique technological competitive advantage, in view of the demonstration of the first camera chip with nearly-perpetual operation, fully untethered, energy-harvested, millimeter-sized, capable of on-chip real-time sensemaking, low cost ($ range). The on-chip sensemaking also fundamentally solves the challenges of data delouge and privacy, which are currently faced with distributed (tethered) cameras. Accordingly, CogniVision accelerates the Smart Nation vision, and contributes to make Singapore a global hub for IoT sensing technologies, and in particular high added-value technologies such as visual sensing. To reach the intended impact, local enterprises working on or using distributed sensors (e.g., belonging to the recently formed IoT Consortium of the Singapore Semiconductor Industry Association (SSIA)), will be engaged during the project via demonstration in our labs. On a global scale, the Embedded Vision Alliance will be engaged to reach out to leading companies in image sensing applications. These companies can indeed be technological or venture partners in the successive translation of CogniVision into a commercial technology. The support of agencies is key to the success of the project, as Singapore is a natural testbed for CogniVision, and will benefit from the introduction of ubiquitous vision capability in the Smart Nation vision . Their expertise will facilitate alignment with compelling applications and use cases. At the end of the project, a workshop will be organized to share findings and to demonstrate the outcomes of CogniVision. To make our technologies widely available, we will consider the opportunity of spinning off a company based in Singapore for commercialization of CogniVision. The CogniVision project will leverage the synergy with local industry in the IoT space, starting from the project industrial partners, which cover the key areas related to CogniVision, i.e. system integration (Panasonic) and chips for IoT (Mediatek). As key factor that promises significant impact of CogniVision is the relevance to a very wide range of diverse applications and verticals, ranging from consumer to security, smart cities, industry, and others

The project is structured in four sub-projects, which all converge into the final demonstration in sub-project #1 of the CogniVision system on chip. Subprojects are organized in an inter-disciplinary manner, and are centered around the interaction between sub-systems and levels of abstraction.

1) System modeling, exploration, integration, demonstration of cognitive/attentive cameras (led by M. Alioto, joined by all) 
This sub-project addresses the system-level challenges and unifies the efforts of the other subprojects into a cohesive modelling, design and verification framework. Regarding the system modelling, a high-level simulation framework will be developed and shared among all PIs to evaluate the functionality, the performance and the energy efficiency of individual components, as well as their impact at the system level. Energy per operation will also be modelled using proprietary models, to preliminarily estimate the benefit of each innovative technique before performing timeconsuming circuit and architectural design. The same environment will be used to share a common database of benchmarks for quantitative assessment, and to perform experiments in a controlled environment shared by all researchers in the team. Tentatively, the environment will be in OpenCVPython as a compromise between Python’s code readability (as needed in collaborative efforts) and availability of OpenCV libraries (which has also been used by the PIs to generate some preliminary results). Such environment will also be used to generate test vectors for chip testing. This sub-project also covers the system design, integration and demonstration aspects in CogniVision, once the above preliminary exploration is performed, and circuit/architectural techniques are investigated and developed for silicon implementation in other sub-projects. System integration will be first performed as a System on Board (SoB), assembling the stand-alone chips that are generated in the various sub-projects for two silicon rounds. The final demonstration is instead performed in the form of a single System on Chip (SoC). Accordingly, chip design partitioning and floorplan will be preliminarily performed, and a mixed-signal simulation/verification environment will be developed to verify the design from behavioral down to gate-level and some selected circuit simulations, when designs become available over time for the blocks in the CogniVision SoC. Also, this sub-project focuses on the silicon infrastructure for chip configuration and testing, based on the CogniVision chip architecture. Once verified and taped out, the CogniVision chip will be fabricated by a commercial silicon foundry (e.g., GlobalFoundries) and tested in a real-world environment to assure that the ultimate quantitative targets achieved. The targeted use cases are well within the capabilities of CogniVision, both in terms of memory (2MBs) and throughput (<20,000MOPS). The on-chip microprocessor (tentatively PULPino by ETHZ, also team collaborator) does not affect the performance, as it is only configures the accelerators and weights into the on-chip memory.

2) Energy-centric circuit techniques and interaction at imager-sensemaking and wirelesssensemaking boundary (led by K. S. Yeo, joined by PI M. Alioto and collaborator S. Chen)
In sub-project #2, the interaction of sensemaking with the image sensor on one side, and the wireless interface on the other side is investigated. From the perspective of the irrelevant activity skipping, imager architectures with in-sensor saliency and relevance table generation will be explored, while systematically taking its interaction with feature extraction into account. The image sensor will include novelty (the above in-sensor saliency detection circuitry), whereas the pixel and array architecture will be taken from prior designs from Prof. Yeo’s group to de-risk the demonstration, considering that the energy efficiency of the imager is not critical for the system. Also, the wireless communication circuits will be developed while incorporating their interaction with sensemaking, in particular with the deep network configuration, which is uploaded by the cloud into the on-chip memory for reconfiguration purposes. In this sub-project, the image sensor and wireless transceiver are first explored from an architectural point of view. This is followed by two rounds of chip demonstration and testing to first validate the fundamental ideas and translate it into circuits, and then refine the design in preparation for the final System on Chip (SoC) demonstration. In the latter phase, the effort is focused mostly on the fine-tuning and integration with the other blocks. A characterization of the final prototype will be performed, and correlated with silicon measurements in the two previous versions, evaluating the effect of process/voltage/temperature corners.

3) Energy-centric machine learning-circuit co-design (led by J. Feng, joined by M. Alioto and the collaborator Prof. Luca Benini)
This sub-project focuses on the algorithm-circuit interaction, through the investigation of a novel class of deep neural networks that will be designed and trained by including power consumption as explicit metric/cost function, as opposed to conventional machine learning methods focusing on pure accuracy. Also, a novel class of ultra-efficient deep learning accelerators based on the DDPM modulation will be investigated. In this sub-project, we investigate systematic energy-aware model design and training schemes, introducing the energy cost within the training objective of the deep learning model. Being circuit/architecture parameters within the network optimization loop, this creates an interdependence and ultimately a synergy that is of particular interest for this sub-project. At the same time, low-activity SRAM memories will be explored and demonstrated. Machine learning circuit techniques will be explored that smartly allocate energy between training and sensemaking, adding run-time criteria for early termination of the computation, without incurring further unnecessary energy cost while accuracy is plateauing. The developed energy-centric machine learning algorithm-circuit co-design will be validated in terms of accuracy and energy in applications for processing images at the resolution from 1,000x1,000 to 80x80 to assess the scalability of the proposed techniques. The resulting models will be validated and integrated in the final silicon prototype first in a controlled environment, and then in a real-world setting. Benchmarks provided by our project partners (see letters of support from agencies) will be used to this purpose, covering human and object recognition, in addition to the popular AlexNet benchmark.

4) Irrelevant activity skipping/EQ-scalable sensemaking circuits/architectures (led by Alioto, joined by all, including the collaborator D. Sylvester)
This sub-project focuses on the circuit and architectural implications on the sensemaking of the three research directions. Regarding the irrelevant activity skipping, the processing elements will be organized both logically (architecture) and physically (floorplan) in a regular fashion that maps the imager tiles (see sub-project #2) onto the sub-systems that perform the corresponding computation. To this aim, novel chip design methodologies pursuing vertical integration from physical level to architecture will be developed in this sub-project, with the goal of assuring data locality (to limit the large energy cost of signal distribution) and maximizing the reuse of memory accesses (to limit the large energy cost of multiple accesses to the same memory address). In regard to the energy-quality scalability, this novel capability will be introduced in all components of the SoC. The fundamental vision algorithm parameters will be evaluated as primary candidates for being used as energy-quality knobs, and their impact on energy and quality will be preliminarily assessed through high-level simulations (e.g., OpenCV. Also, this sub-project involves the translation of the expected research results into measurable chip demonstrators of saliency pre-assessment, feature extraction, novelty assessment, and deep learning. These circuits are designed and tested in two rounds, respectively for initial validation and further refinement. The very final version of their design will be integrated in the final System on Chip (SoC) demonstration, and its characterization will be again cross-correlated with the silicon measurements in the two previous versions, evaluating the effect of process/voltage/temperature corners and in both a controlled and real-world environment.

The Team

The team will collaborate with industrial partners and agencies supporting various aspects of the project, from in-kind contribution of 0.7M$ in terms of silicon manufacturing support, to realworld datasets, domain expertise and hardware/cloud services for large-scale computation (see letters of support). Their support assures relevance to industrial interest, and alignment with the fast-changing landscape of distributed sensing. Industrial partners cover the key areas that the proposal aims to make an impact on.

Project Progress Updates

Students and research staff:
Research Publications:
Patent applications:
Public speeches:



News & press


Demos and public materials


Subscribe to mailing list


Career opportunities


Industry engagement



 PI  Massimo Alioto
 Admin  Hephzibah Solomon

 Yong Fu Sheng Melvin,
 Research-related matters in ECE Department  Tang Tin Yan, Ariel
 Account related/SAP – Finance  Patricia Ang 

 Khoo Shi Yun
 VDRO  Loo Shi Wei
 ODPRT  Soh Li Yan






Visitor Counter

Subpages (1): CogniVision - People