MLIR-based Data Tiling and Packing for Ryzen AI NPU

The Ryzen AI NPUs consist of an array of vector processors and programmable interconnect to allow granular control of compute and data movement to achieve high performance and power efficiency. This talk presents a MLIR-based data tiling and packing design for these NPUs that leads to optimized machine instructions. Specifically, it shows how we can derive and optimize how data flows through the array from high-level tiling decisions and how we can efficiently utilize a high degree of data packing by leveraging low-level DMA control and capabilities.