Skip to content

News & Events

Architecting High Performance Silicon Systems for Accurate and Efficient On-Chip Deep Learning

Thierry Tambe (Harvard University)

Colloquium

Thursday, March 30, 2023, 10:30 am

Abstract

The unabated pursuit for omniscient and omnipotent AI is levying hefty latency,
memory, and energy taxes at all computing scales. At the same time, the end of Dennard
scaling is sunsetting traditional performance gains commonly attained with reduction in
transistor feature size. Faced with these challenges, my research is building a heterogeneity of
solutions co-optimized across the algorithm, memory subsystem, hardware architecture, and
silicon stack to generate breakthrough advances in arithmetic performance, compute density
and flexibility, and energy efficiency for on-chip machine learning, and natural language
processing (NLP) in particular. I will start, in the algorithm front, by discussing award-winning
work on developing a novel floating-point based data type, AdaptivFloat, which enables resilient
quantized AI computations; and is particularly suitable for NLP networks with very large
parameter distribution. Then, I will describe a 16nm chip prototype that adopts AdaptivFloat in
the acceleration of noise-robust AI speech and machine translation tasks – and whose fidelity to
the front-end application is verified via a formal hardware/software compiler interface. Towards
the goal of lowering the prohibitive energy cost of inferencing large language models on TinyML
devices, I will describe a principled algorithm-hardware co-design solution, validated in a 12nm
chip tapeout, that accelerates Transformer workloads by tailoring the accelerator's latency and
energy expenditures according to the complexity of the input query it processes. Finally, I will
conclude with some of my current and future research efforts on further pushing the on-chip
energy-efficiency frontiers by leveraging specialized non-conventional dynamic memory
structures for on-device training -- and recently prototyped in a 16nm tapeout.

Bio

Thierry Tambe is a final year Electrical Engineering PhD candidate at Harvard University.
His current research interests focus on designing energy-efficient and high-performance
algorithms, hardware accelerators and systems for machine learning and natural language
processing in particular. He also bears a keen interest in agile SoC design methodologies. Prior
to debuting his doctoral studies, Thierry was an engineer at Intel in Hillsboro, Oregon, USA
designing various mixed-signal architectures for high-bandwidth memory and peripheral
interfaces on Xeon and Xeon-Phi HPC SoCs. He received a B.S. (2010) and M.Eng. (2012) in
Electrical Engineering from Texas A&M University. Thierry Tambe is a recipient of the Best
Paper Award at the 2020 ACM/IEEE Design Automation Conference, a 2021 NVIDIA Graduate
PhD Fellowship, and a 2022 IEEE SSCS Predoctoral Achievement Award.