event-icon
Description
Only recently have Neural Net models considered constraints such as model size, or total number of operations. Even then, such considerations fall short of a holistic consideration of trade-offs among accuracy, model size, inference speed, and energy consumption. At the same time even widely cited Neural Net accelerators may have only demonstrated their value on older Neural Net models running on smaller benchmarks. Thus, we often observe a mismatch between contemporary Neural Net accelerator architectures and state-of-the-art Neural Net models. In this work we demonstrate how closely tuning the Neural Net model and the Neural Net accelerator architecture can lead to 4x-8x improvements in energy efficiency. Moreover, we highlight key principles of the co-design of Neural Net models and Neural Net accelerators. In particular, we examine the impact of tuning data/memory flow of Neural Net models and the accelerators to support them.
Tags