# tinyML. Talks

Enabling Ultra-low Power Machine Learning at the Edge

"How to design a power frugal hardware for AI - the bioinspiration path" Alexandre Valentian - CEA

September 23, 2021



www.tinyML.org



# tinyML Talks Sponsors and Strategic Partners

Deeplite

# AONdevices

tinyML Strategic Partner



tinyML Strategic Partner



tinyML Strategic Partner



tinyML Strategic Partner



tinyML Strategic Partner

tinyML Strategic Partner

tinyML Strategic Partner

arm



tinyML Strategic Partner



tinyML Strategic Partner

Qualcom

tinyML Strategic Partner



ense SYNTIANT

maxim

integrated

K

tinyML Strategic Partner

tinyML Strategic Partner

Additional Sponsorships available – contact Olga@tinyML.org for info

tinyML Strategic Partner



🔁 EDGE IMPULSE

tinyML Strategic Partner





tinyML Strategic Partner

# Arm: The Software and Hardware Foundation for tinyML



Ø @ArmSoftwareDev

Resources: developer.arm.com/solutions/machine-learning-on-arm

4 © 2020 Arm Limited (or its affiliates)

+ + + +

arm



# WE USE AI TO MAKE OTHER AI FASTER, SMALLER AND MORE POWER EFFICIENT



**Automatically compress** SOTA models like MobileNet to <200KB with **little to no drop in accuracy** for inference on resource-limited MCUs



**Reduce** model optimization trial & error from weeks to days using Deeplite's **design space exploration** 



**Deploy more** models to your device without sacrificing performance or battery life with our **easy-to-use software** 

# BECOME BETA USER <a href="https://bit.ly/testdeeplite">bit.ly/testdeeplite</a>



# TinyML for all developers



www.edgeimpulse.com



# The Eye in IoT Edge Al Visual Sensors

info@emza-vs.com





Computer Vision hardware accelerators

- Machine Learning algorithm
- <1MB memory footprint</li>
- Microcontrollers computing power
- Trained algorithm
- Processing of low-res images
- Human detection and other classifiers

## **Enabling the next generation of Sensor and Hearable products**

## to process rich data with energy efficiency



### Wearables / Hearables



### Battery-powered consumer electronics



### IoT Sensors





# **Distributed infrastructure for TinyML apps**





**Develop at warp speed** 

**Automate deployments** 

**Device orchestration** 

HOTG is building the distributed infrastructure to pave the way for AI enabled edge applications



# Adaptive AI for the Intelligent Edge

Latentai.com



## Maxim Integrated: Enabling Edge Intelligence

### Advanced AI Acceleration IC



The new MAX78000 implements AI inferences at low energy levels, enabling complex audio and video inferencing to run on small batteries. Now the edge can see and hear like never before.

www.maximintegrated.com/MAX78000

Low Power Cortex M4 Micros



Large (3MB flash + 1MB SRAM) and small (256KB flash + 96KB SRAM, 1.6mm x 1.6mm) Cortex M4 microcontrollers enable algorithms and neural networks to run at wearable power levels.

www.maximintegrated.com/microcontrollers

Sensors and Signal Conditioning



Health sensors measure PPG and ECG signals critical to understanding vital signs. Signal chain products enable measuring even the most sensitive signals.

www.maximintegrated.com/sensors



# Qeexo AutoML

## Automated Machine Learning Platform that builds tinyML solutions for the Edge using sensor data

### **Key Features**

- Supports 17 ML methods:
  - Multi-class algorithms: GBM, XGBoost, Random
     Forest, Logistic Regression, Gaussian Naive Bayes,
     Decision Tree, Polynomial SVM, RBF SVM, SVM, CNN,
     RNN, CRNN, ANN
  - Single-class algorithms: Local Outlier Factor, One Class SVM, One Class Random Forest, Isolation Forest
- Labels, records, validates, and visualizes time-series sensor data
- On-device inference optimized for low latency, low power consumption, and small memory footprint applications
- Supports Arm<sup>®</sup> Cortex<sup>TM</sup>- M0 to M4 class MCUs

### **End-to-End Machine Learning Platform**

#### MODE FEATURI MODEL MODEL CONVERSION ETER SPECIFIC MI EXTRACTION SELECTION VALIDATION REPROCESSING PTIMIZATION AND SELECTION (E.G. TO C) AutoML 🐞 AUTOMATED COLLECT/ UPLOAD DEPLOY/ DOWNLOAD **DEFINE PROJECT** SELECT SENSORS AND MACHINE LEARNING E.G. CLASSIFICATION TARGET HARDWARE DATA **ML PACKAGE**

### For more information, visit: www.qeexo.com

### **Target Markets/Applications**

- Industrial Predictive Maintenance
  Automotive
- Smart Home
- Wearables IoT



Mobile

Qualcorm Al research

# Advancing Al research to make efficient Al ubiquitous

### Power efficiency

### Personalization E

Model design, compression, quantization, algorithms, efficient hardware, software tool Continuous learning, contextual, always-on, privacy-preserved, distributed learning

### Efficient learning

Robust learning through minimal data, unsupervised learning, on-device learning

# A platform to scale Al across the industry



Perception Object detection, speech

recognition, contextual fusion



#### Reasoning Scene understand

Scene understanding, language understanding, behavior prediction



#### Action

Reinforcement learning for decision making



Cloud

Edge cloud



IoT/IIoT

Automotive

Mobile

Qualcomm AI Research is an initiative of Qualcomm Technologies, Inc.



# Add Advanced Sensing to your Product with Edge AI / TinyML

https://reality.ai

info@reality.ai

✓@SensorAl in Reality Al

# Pre-built Edge Al sensing modules, plus tools to build your own

## **Reality AI solutions**

Prebuilt sound recognition models for indoor and outdoor use cases

Solution for industrial anomaly detection

Pre-built automotive solution that lets cars "see with sound"

## **Reality AI Tools<sup>®</sup> software**

Build prototypes, then turn them into real products

Explain ML models and relate the function to the physics

Optimize the hardware, including sensor selection and placement



# Build Smart IoT Sensor Devices From Data

SensiML pioneered TinyML software tools that auto generate AI code for the intelligent edge.

- End-to-end AI workflow
- Multi-user auto-labeling of time-series data
- Code transparency and customization at each step in the pipeline

We enable the creation of productiongrade smart sensor devices.



# sensiml.com



# **SynSense**

SynSense builds sensing and inference hardware for ultra-lowpower (sub-mW) embedded, mobile and edge devices. We design systems for real-time always-on smart sensing, for audio, vision, IMUs, bio-signals and more.

https://SynSense.ai



# SYNTIANT

## **Neural Decision Processors**

- At-Memory Compute
- Sustained High MAC Utilization
- Native Neural Network Processing

## **ML Training Pipeline**

• Enables Production Quality Deep Learning Deployments



End-to-End Deep Learning Solutions

for

TinyML & Edge Al

# Data Platform

- Reduces Data Collection Time and Cost
- Increases Model
   Performance









# LIVE ONLINE November 2-5, 2021

(9-11:30 am China Standard time) https://www.tinyml.org/event/asia-2021/

### **Technical Programm Committee**





Wei Xiao NVIDIA.

Evgeni GOUSEV Qualcomm Research, USA



Mark CHEN Himax Technologie





Joo-Young KIM KAIST



Nicholas NICOLOUDIS

SAP

Eric PAN Seeed Studio and Chaihuo makerspace







Chetan SINGH THAKUR Shouyi YIN 尹首

Yu WANG

**Register today!** 



Free event courtesy of our sponsors and strategic partners







### FOUNDATION

### Focus on:

(i) developing new use cases/apps for tinyML vision; and (ii) promoting tinyML tech & companies in the developer community





Submissions accepted until September 17<sup>th</sup>, 2021 Winners announced on October 5<sup>th</sup>, 2021 (\$6k value) Sponsorships available: *sponsorships@tinyML.org* 





# Next tinyML Talks

| Date                    | Presenter                  | Topic / Title                                                                                  |
|-------------------------|----------------------------|------------------------------------------------------------------------------------------------|
| Friday,<br>September 24 | Peter Ing,<br>Edge Impulse | An Introduction to TinyML for all<br>backgrounds with hands on introduction to<br>Edge Impulse |

### Webcast start time is 8 am Pacific time

Please contact <u>talks@tinyml.org</u> if you are interested in presenting



# Reminders

Slides & Videos will be posted tomorrow



tinyml.org/forums youtube.com/tinyml

Please use the Q&A window for your questions





# **Alexandre Valentian**



After an MSc and a PhD in microelectronics. Alexandre Valentian joined CEA LETI in 2005. His past research activities included design technology co-optimization, promoting the FDSOI technology (notably through his participation in the SOI Academy), 2.5D/3D integration technologies and non-volatile memory technology. He is currently pursuing the development of bio-inspired circuits for AI, combining memory technology, information encoding and dedicated learning methods. Since 2020, he heads the Systems-on-Chip and Advanced Technologies (LSTA) laboratory. Dr Valentian has authored or coauthored 80 conference and journal papers.

# ceatech to industry

### HOW TO DESIGN A POWER FRUGAL HARDWARE FOR AI – THE BIO-INSPIRATION PATH

Alexandre VALENTIAN









## Ceatech SOLVING THE ENERGY CHALLENGE: COST OF MOVING DATA



Bill Dally, "To ExaScale and Beyond", 2010

# Ceatech TRENDS IN EDGE COMPUTING

Increased computing efficiency

### Weight quantization

Reduced bit accuracy

- Smaller memory footprint
- Lighter operations

### Variable bit precision

Handling higher bit accuracy when needed

• For higher inference precision

### Sparsity

Skip MAC operations

• When weight or intermediate result is 0

Increased storage efficiency

#### **Near memory computing**

Avoid external memory accesses

#### Weights

Embedded Non-Volatile Memory

Intermediate results

SRAM or Embedded DRAM

#### **In-Memory computing**

SRAM or Embedded NVM

Digital or analog

## Ceatech ACTIVATIONS - IN-MEMORY COMPUTING



Ceatech WEIGHTS - NON-VOLATILE MEMORIES

\* *IT. Wu. ISSCC 2019*]

**RRAM** technology compatible with advanced logic

Scalable to sub-20nm

Multilevel cell \*

• From 1 bit to 4 bit and beyond

Roadmap for increasing embedded cell density

 From 40F<sup>2</sup> down to 4F<sup>2</sup> thanks to new selector technology

Examples with technology available today

- ResNet50 (74 MB of weights) → 15 mm<sup>2</sup> of memory
- YoloV3 (101MB of weights)  $\rightarrow$  20 mm<sup>2</sup> of memory



28nm RRAM integration



Selector and RRAM integration [IEDM 2019]





Ceatech CHALLENGE OF ONLINE LEARNING

**Back-propagation algorithm** 

- Necessitates to keep all intermediate results (activations)
- With a batch size of more than one
  - To not cycle too much the non-volatile memories

This requires a tremendous amount of activation memory

- Example YoloV3
  - A batch of 20 images requires 800MB of memory



## Ceatech TECHNOLOGY SOLUTION - 3D INTEGRATION

**Advantages** 

Increasing computing & memory capacity

### Trends

- « Denser Integration » : tight memory <> logic computing paradigm
- « Chipletization » : Generic computing templates, Heterogeneous technologies



# Ceatech DISTRIBUTED MEMORY-CENTRIC EDGE AI COMPUTING ARCHITECTURE

**Memory-Centric architecture** 

- No more global buffers
- No more power-hungry caches
- Fully distributed memory and control
- Energy efficient use of memory using 3D technology

### Edge Al architecture, using

- Generic PE engines
- Vertically and horizontally connected computing clusters
- In-Memory Computing tiles (IMC)
- Dense NVM for storage
- DRAM for online learning



## Ceatech ENERGY EFFICIENCY IS FAR FROM BIOLOGICAL SYSTEMS



## Ceatech BRAIN-INSPIRED SOLUTIONS MIGHT BE THE KEY

Human brain

- Massively parallel
  - 10<sup>11</sup> neurons and 10<sup>15</sup> synapses
- Doing processing using memory elements
- Analog computation
  - Neuron soma = synaptic current integrator
- Digital communication
  - Spikes = unary events, very robust to noise

Brain inspired

- High density storage, close to neurons
  - Computational storage
- Analog neuron
  - Spike coding



- Spike coding
- **RRAM** synapses
  - Weighted input thanks to Ohm's law
- Analog neurons
  - Inputs summation thanks to Kirchhoff's law



Frequency coding of pixel intensity



Simplified schematic view

Ceatech NEURAL NETWORK TOPOLOGY

### Fully-connected neural network topology

- 10 output neurons: 1 neuron / class
- Each neuron is connected to the entire image: 144 synapses



Fully-connected topology



- Bio-inspired unsupervised learning rules
  - Such as the Spike Timing Dependent Plasticity one
- Give poorer results than the Gradient Descent algorithm
- Decision was made to do offline learning
  - In the classical coding domain
- And then to transcode into spikes





Ceatech MATHEMATICAL EQUIVALENCE

- "Classical" neural network model
  - Multiply-Accumulate (MAC)
  - Non-linear operation (TANH)



Approx. tanh equivalency



#### • "Spiking", rate-based equivalent model

- Integrate & Fire (IF) neuron model
- Two thresholds, positive and negative
- Refractory period

### Ceatech LEARNED NEURONS RECEPTIVE FIELDS

- Excitatory synapses are represented in green
  - The greener, the higher
- Inhibitory synapses are represented in red
  - The more red, the higher





- RRAMs are used in binary mode (LRS and HRS states)
- Four RRAMs encode a positive weight and four others a negative weight
  - Nine synaptic weights are thus available : -4, -3, -2, -1, 0, 1, 2, 3, 4





- Synapses are arranged in a matrix, for sharing
  - Word Line
  - Source Line
  - and Bit Line drivers



Ceatech NEURON DESIGN

#### Goal: ensure mathematical equivalence to TANH model

- Two thresholds (positive and negative)
- Peculiar Reset of the membrane voltage



Neuron schematic, with reset paths for ensuring model equivalence



Voltage levels in membrane



- BULK 130nm base wafers
- RRAM Post-process between M4 and M5



**Cross-section** 



Chip micrograph



#### • Classification accuracy = 84%

Compared to 88% in simulation

#### Energy

- 180pJ / synaptic event
  - 3,6pJ at RRAM + neuron level
- 136 spikes, on average, to classify an image
  - 24,5nJ / image
- Energy gain 5X
  - Compared to classical coding





- This spiking SNN is used in a live demo
- High energy efficiency: 24,5nJ / digit classification
  - Less than 1 spike / synaptic connection



A. Valentian, et. Al., "Fully Integrated Spiking Neural Network with Analog Neurons and RRAM Synapses," *IEEE International Electron Devices Meeting (IEDM)*, San Francisco, CA, USA, December 2019

Ceatech IMPROVEMENT BY MOVING TO 28NM FDSOI

#### • Area of RRAM matrix

- Divided by 36X
- Area of neurons
  - Divided by 17X
- Energy per event
  - Divided by 10X



Ceatech RRAM TECHNOLOGICAL PATH FOR IMPROVEMENT

#### • Multiple level cells

• Enable to increase synapse density by 4X

#### Synapse implementation

- One Single Level Cell (SLC) for the Sign
- One MLC for the weight value



## **Ceatech** Comparison to the state-of-the-art

|                          | Science<br>2014<br>[4] | Micro<br>2018<br>[5] | VLSI<br>2018<br>[6] | VLSI<br>2018<br>[6] | This<br>work | This work<br>scaled | This work<br>scaled +<br>multivalued |
|--------------------------|------------------------|----------------------|---------------------|---------------------|--------------|---------------------|--------------------------------------|
| Technology               | 28nm                   | 14nm                 | 40nm                | 180nm               | 130nm        | 28nm                | 28nm                                 |
| Coding                   | Spike                  | Spike                | Formal              | Formal              | Spike        | Spike               | Spike                                |
| Weight<br>storage        | SRAM                   | SRAM                 | RRAM                | RRAM                | RRAM         | RRAM                | RRAM                                 |
| Synapses                 | 256M                   | 130M                 | 4M                  | 2M                  | 13.5K        | 13.5K               | -                                    |
| Synapses/mm <sup>2</sup> | 195K                   | 2000K                | 1480K               | 160K                | 16K          | 575K                | 2300К                                |
| Power                    | 63mW                   | -                    | 9.9mW               | 15.8m<br>W          | 1.5mW        | -                   | -                                    |
| Energy/syn.<br>event     | 27pJ                   | 105pJ                | N/A                 | N/A                 | 180pJ        | 17,1pJ              | -                                    |

Ceatech PATH TOWARDS DEEPER NETWORKS

#### Variability issue needs to be tackled

- Cost 2% of classification accuracy
  - On a shallow analog network
- Would be way to high for a deep network

#### Two solutions arise

- Online retraining, for coping with variability
  - Not an "industrial" solution: would be too time consuming, thus expensive
- Digital implementation
  - Enables consistency with cycle-accurate simulation

Ceatech REQUIREMENTS DEFINITION

#### Need to perform

- Detection
- Classification
- Segmentation

#### • Points mostly towards Convolution network

- UNet network
  - Convolution + Deconvolution
- Conv layers represent the majority of the computation workload
  - Focus of this implementation



Ceatech IMPLEMENTED CIRCUIT

#### • Key parameters of interest

- Technology: 28nm FDSOI + RRAM
- Area: 3mm<sup>2</sup>
- 8 Convolutional SNN cores
- 131k neurons, 73k weights, 75M synapses

#### Total computing power

- 25.6 GOPS (synaptic operations per second)
- 128 Processing Engines @ 200 MHz
- Energy efficiency
  - 1pJ per synaptic event



Multicore architecture for spiking convolution operations



**Trends in Edge AI applications** 

- Inference first
- Then lifelong local learning

Main challenge is to reduce data movement

This can be solved thanks to a combination of architecture and technology

- Combination of In-Memory Computing
- Non-volatile memory for synaptic weights
- 3D technology for heterogeneous integration

Brain-inspired solutions might just be the Key for high energy efficiency solutions





Contact information: Dr Alexandre VALENTIAN

email: alexandre.valentian@cea.fr

### THANK YOU FOR YOUR ATTENTION





# **Copyright Notice**

This multimedia file is copyright © 2021 by tinyML Foundation. All rights reserved. It may not be duplicated or distributed in any form without prior written approval.

tinyML<sup>®</sup> is a registered trademark of the tinyML Foundation.

## www.tinyml.org



# **Copyright Notice**

This presentation in this publication was presented as a tinyML<sup>®</sup> Talks webcast. The content reflects the opinion of the author(s) and their respective companies. The inclusion of presentations in this publication does not constitute an endorsement by tinyML Foundation or the sponsors.

There is no copyright protection claimed by this publication. However, each presentation is the work of the authors and their respective companies and may contain copyrighted material. As such, it is strongly encouraged that any use reflect proper acknowledgement to the appropriate source. Any questions regarding the use of any materials presented should be directed to the author(s) or their companies.

tinyML is a registered trademark of the tinyML Foundation.

## www.tinyML.org