Also, this approach would prohibit important capabilities such as
1) respond to time-varying requirements of the “cloud” server gathering
the output of many cameras (e.g., request to perform a new task or occasionally
send entire frames, as triggered by events captured by neighboring cameras,
based on global understanding of the cloud)
2) upgrade the neural network, using its innate ability to be refined
via retraining with new data
3) save power when degraded quality in processing (e.g., approximations)
is tolerable for less visually demanding tasks (e.g., optical character
recognition simpler than object detection).
A suitable approach to achieve these capabilities is to allow the cloud
to push neural network configurations onto individual cameras, which in turn
need to be responsive and receptive of the related commands from the cloud.
Accordingly, cognitive cameras also need to be attentive, i.e. listen to
commands wirelessly sent by the cloud, hence requiring an always-on radio
receiver. In general, nearly-perpetual always-on operation is pursued by
harvesting power from the environment, which limits the power consumption of
CogniVision cameras to ~1 milliwatt to maintain the system volume well below
100mm3 (e.g., provided by a 0.1-mm thick, 5-cent, 1-2 cm wide organic
photovoltaic foil attached to a wall, with a stacked 0.4-mm equally sized
battery and on-foil printed antenna, all commercially available). Reducing the
power consumption of cognitive cameras down to the 1mW range is the fundamental
objective of this project. This entails a power reduction by at least 20-30X
compared to the most power-efficient existing cameras that constantly monitor
the scene with resolution and frame rate that are adequate for distributed
monitoring and surveillance (e.g., VGA resolution, 30 frames/s). Cognitive
cameras with power down to 1mW will be enabled by drastically limiting the
amount of data transmitted wirelessly to the server cloud that makes sense of
the scene, thus substantially reducing the traditionally large power due to the
transmission of entire video frames (e.g., 40-50 mW with MPEG-compressed VGA
frame, Bluetooth Low Energy transmission). This is accomplished by embedding
substantial sensemaking capability (e.g., object detection) into the camera
silicon chip, leveraging the recent impetuous advances in deep learning and
convolutional neural networks (widely adopted by Google, Facebook, Microsoft).
As paradigm shift, CogniVision moves sensemaking from the cloud to cognitive
cameras, keeping the power in the mW range in spite of the traditionally high
computational complexity of deep learning. This will be achieved via innovation
on energy-efficient circuits/architectures for sensemaking (see “Approach”
section), including a novel digital energyquality scalable architecture for
general-purpose on-chip acceleration of convolutional networks with energy
efficiency of 50TOPS/W or better, i.e. 10-20X more energy-efficient than the
state of the art. Its ability to execute any convolutional network makes it applicable
to the very wide (and ever-expanding) range of applications of convolutional
networks, as long as the network fits the on-chip available memory and
processing array size, as discussed in the “Subprojects” section.
Being “attentive”, CogniVision cameras have also the capability to be
responsive to the cloud, and occasionally be reprogrammed by the cloud in the
following ways:
1) transmit a short series of frames to be processed directly by the
cloud (e.g., if the visual task exceeds the cognitive capabilities of the
camera);
2) update the neural network to a different one (i.e., uploading layer
structure and weights), when the cloud requests a substantial change in the
visual task executed by the camera (e.g., the cloud needs to identify very
specific objects in a given area being covered by some of the cameras);
3) statically adjust on-chip energy-quality knobs that can save energy
in vision tasks where lower processing accuracy or arithmetic precision are
tolerable (e.g., less demanding visual tasks such as optical character
recognition, as compared to more challenging tasks such as object
detection.
As side benefit, cognitive cameras solve the traditional issue of data
deluge in distributed vision systems. Indeed, frames from cameras are
traditionally transmitted wirelessly to the cloud, involving large data volumes
(~20 cameras exhaust the capacity of a wireless LAN, Internet video traffic is
increasing alarmingly fast). This is avoided in cognitive cameras, as the
transmitted data volume is reduced by several orders of magnitude (from
preliminary simulations, they transmit at a data rate of ~1-10kbps on average,
as opposed to several MBs in traditional cameras). Regarding the timeliness of
the CogniVision project, embedding vision in energy-autonomous nodes has been
pursued for a decade with very limited success, due to the excessive power
consumption required by on-chip processing. We are now witnessing the
convergence of three technology trends, which are reshaping the areas of
machine learning for computer vision and ultra-low power chips. On one hand,
deep convolutional neural networks have made tremendous advances in terms of
vision capability, although at substantial power and memory cost that is beyond
the capabilities of energy-autonomous systems. Their power power is now
reaching the tens of mW range after two very intense years of research in deep
learning accelerators. Simultaneously, fundamental advances have been recently
made in the area of energy-quality scalable integrated circuits and systems
(including deep learning accelerators and vision processors), where substantial
reduction in the intensity of computation and energy is achieved when moderate
reduction in the quality of processing/sensing (e.g., arithmetic precision) is
tolerable by the vision task at hand. Also, fundamental advances have been
recently made in image sensor design, introducing the ability to embed simple
in-sensor processing with low energy cost, limiting the expensive centralized
processing requiring full frame readout. As convergence of the above trends,
CogniVision leverages the well-known exceptional robustness of deep
learning/vision against inaccuracies to exploit energyquality scaling and
simple in-sensor processing, which justify the timeliness of the project.
Recent market trends confirm the timeliness of CogniVision, and the expectable
importance that smart untethered cameras will have in the years to come. For
example, in December 2017 Amazon has acquired the wireless camera company
Blink; in October 2017 Google has released the CLIPS wireless camera.
Although the capabilities of such cameras are currently limited (e.g., actual
lifetime from 3-5 hours with continuous shooting to 2- 5 weeks, they simply
record clips when event occur), this clearly shows a technological and market
interest in ubiquitous vision. In 2017 Qualcomm announced the intention to
pursue a research project on low-resolution (320x240) cameras for smart
toys/appliances with low recognition capabilities (e.g., single object
detection, ambient light sensing). None of the available cameras can interact
with the cloud in real time (i.e., they are not “attentive”). As another
example, in March 2018 Sony and other companies formed the NICE alliance to
support the creation of a prospective generation of cameras with on-board
analytics. Ubiquitous cognitive cameras can provide novel technological capabilities
and societal benefits, enabling for the first time situational awareness with
fine spatial granularity across wide areas (from building to city scale).
Examples of targeted applications are ubiquitous/augmented surveillance,
vehicle/pedestrian detection, intelligent transportation, crowd monitoring,
industrial plant monitoring, warehouse management, detection of dangerous
objects, disaster management, among the others. In short, CogniVision empowers
the Internet of Things (IoT) (i.e., ubiquitous sensor augmentation of the
Internet) with the sense of vision, for the first time. As IoT is the next “big
wave” of technology (45% annual growth, global value of 11T$ by 2025),
CogniVision will leverage its capabilities and potential growth to create economic
value in Singapore, accelerating the Smart Nation vision.
The success of CogniVision will provide a unique technological
competitive advantage, in view of the demonstration of the first camera chip
with nearly-perpetual operation, fully untethered, energy-harvested,
millimeter-sized, capable of on-chip real-time sensemaking, low cost ($ range).
The on-chip sensemaking also fundamentally solves the challenges of data
delouge and privacy, which are currently faced with distributed (tethered)
cameras. Accordingly, CogniVision accelerates the Smart Nation vision, and
contributes to make Singapore a global hub for IoT sensing technologies, and in
particular high added-value technologies such as visual sensing. To reach the
intended impact, local enterprises working on or using distributed sensors
(e.g., belonging to the recently formed IoT Consortium of the Singapore
Semiconductor Industry Association (SSIA)), will be engaged during the project
via demonstration in our labs. On a global scale, the Embedded Vision Alliance
will be engaged to reach out to leading companies in image sensing
applications. These companies can indeed be technological or venture partners
in the successive translation of CogniVision into a commercial technology. The
support of agencies is key to the success of the project, as Singapore is a
natural testbed for CogniVision, and will benefit from the introduction of
ubiquitous vision capability in the Smart Nation vision . Their expertise will
facilitate alignment with compelling applications and use cases. At the end of
the project, a workshop will be organized to share findings and to demonstrate
the outcomes of CogniVision. To make our technologies widely available, we will
consider the opportunity of spinning off a company based in Singapore for
commercialization of CogniVision. The CogniVision project will leverage the
synergy with local industry in the IoT space, starting from the project
industrial partners, which cover the key areas related to CogniVision, i.e.
system integration (Panasonic) and chips for IoT (Mediatek). As key factor that
promises significant impact of CogniVision is the relevance to a very wide
range of diverse applications and verticals, ranging from consumer to security,
smart cities, industry, and others.