With the newest release of TensorFlow 2.9, the open source, end-to-end platform for machine learning, the performance improvements are turned on by default. This applies to all Linux x86 packages and for CPUs with neural-network-focused hardware features (like AVX512_VNNI, AVX512_BF16, and AMX vector and matrix extensions that maximise AI performance through efficient compute resource usage, improved cache utilisation and efficient numeric formatting) found on 2nd Gen Intel® Xeon® Scalable processors and newer CPUs.
These optimisations enabled by oneDNN accelerate key performance-intensive operations such as convolution, matrix multiplication and batch normalisation, with up to 3 times performance improvements compared to versions without oneDNN acceleration. Those improvements are delivered by the Intel® oneAPI Deep Neural Network Library (oneDNN), which boosts performance-intensive procedures such as batch normalisation, convolution and matrix multiplication. This gives TensorFlow developers and practitioners working on deep learning applications and frameworks, a substantive acceleration to the performance of their work without them needing to change any of their code. The enhanced performance has implications for a range of use cases from image recognition to medical diagnosis.
“Thanks to the years of close engineering collaboration between Intel and Google, optimisations in the oneDNN library are now default for x86 CPU packages in TensorFlow…This is a critical step to deliver faster AI inference and training and will help drive AI everywhere,” said Wei Li, Intel vice president and general manager for AI and Analytics.