New 3rd Gen Intel® Xeon® Scalable Processors Demonstrate Machine Learning Performance Leadership with Intel® Extension for Scikit-learn*

MaryT_Intel · ‎06-22-2021

The newest 3rd Gen Intel® Xeon® Scalable processors, previously codenamed Ice Lake, deliver important capabilities to enhance artificial intelligence (AI), cloud computing, security, and other areas. Intel has also optimized a variety of software tools, libraries, and frameworks so applications can easily take full advantage of the latest platform advances.

The results are impressive. This blog focuses on one example, offering benchmark results for the popular Scikit-learn machine learning library for a range of machine learning algorithms. New 3rd Gen Intel Xeon Scalable processors and Intel® Extension for Scikit-learn* provide a range of 1.09 to 1.63 times the performance compared to the previous generation of Intel processors. These technologies also demonstrate industry leadership across a range of Scikit-learn algorithms. They deliver a range of 0.65 to 7.23 times the performance compared to NVIDIA A100 GPU and a range of 0.61 to 2.63 times the performance compared to AMD Milan.

Running on the new 3rd Gen Intel Xeon Scalable processors, the optimizations in Intel Extension for Scikit-learn accelerate Scikit-learn operations in tens and hundreds of times. And they’re accessible by changing just two lines of code in your Scikit-learn applications.

Intel Extension for Scikit-learn

Intel Extension for Scikit-learn contains drop-in replacement patching functionality for the Scikit-learn machine learning library for Python. The patches were originally available in the daal4py package. All future updates for the patching will be available only in Intel Extension for Scikit-learn. All performance claims obtained using daal4py are applicable for Intel Extension for Scikit-learn.

You can take advantage of the optimizations of Intel Extension for Scikit-learn by adding just two lines of code before the usual Scikit-learn imports:

from sklearnex import patch_sklearn
patch_sklearn()

# the start of the user’s code: 
from sklearn.cluster import DBSCAN
...

You can download the product using PyPI:

pip install scikit-learn-intelex

You can also download it using Conda:

conda install scikit-learn-intelex -c conda-forge

Intel Extension for Scikit-learn uses Intel® oneAPI Data Analytics Library (oneDAL) to achieve its acceleration. The optimizations aim for the efficient use of CPU resources. The library enables all the latest vector instructions, such as Intel® Advanced Vector Extensions (Intel AVX-512). It also uses cache-friendly data blocking, fast BLAS operations with Intel OneAPI Math Kernel Library (oneMKL), scalable multi-threading with Intel oneAPI Thread Building Blocks (oneTBB) library, and more.

For benchmarking, we used oneDAL on Intel and AMD architectures and RAPIDS cuML on NVIDIA. These libraries are the most optimized for CPUs and GPUs respectively.

Gen-to-Gen Improvements

Our benchmark tests ran the machine learning algorithms implemented in Intel Extension for Scikit-learn. We compared their performance on new 3rd Gen Intel Xeon Scalable processors and 2nd Gen Intel Xeon Scalable processors. Figure 1 shows the speedup for each tested case, with 2nd Gen processors pegged at 1.0. We observed speedups of from 1.09 times to 1.63 times across the training and prediction stages of the set of machine learning algorithms.

Figure 1. Performance gain of 3rd Gen Intel Xeon Scalable processors over 2nd Gen. We used Intel Extension for Scikit-learn for both generations.

Performance Leadership

To assess competitive performance, we compared new 3rd Gen Intel Xeon Scalable processors to the latest NVIDIA and AMD hardware: NVIDIA’s A100 and AMD’s Milan. The new Intel Xeon Scalable processors demonstrated performance leadership across a variety of machine learning algorithms. The Intel CPUs showed a range of 0.65 to 7.23 times the performance of NVIDIA A100 (Figure 2) and 0.61 to 2.63 times the performance of AMD Milan (Figure 3).

Figure 2. Speedup of new 3rd Gen Intel Xeon Scalable processors (Intel Extension for Scikit-learn) against NVIDIA A100 (RAPIDS cuML).

Figure 3. Speedup of new 3rd Gen Intel Xeon Scalable processors against AMD Milan. We used Intel Extension for Scikit-learn for both architectures.

3rd Gen Intel Xeon Scalable Processors Overview

The latest 3rd Gen Intel Xeon Scalable processors are Intel’s most advanced, most performant data center platforms. They feature a flexible architecture with built-in AI acceleration via Intel® DL Boost technology. Here are some of their other key features.

Faster memory. In the new processor generation, the number of memory channels per socket increased from 6 to 8, and the maximum frequency of memory increased from 2933MHz to 3200MHz. As a result, the DRAM memory bandwidth increased by up to 1.45 times. Data analytics workloads are often DRAM-bound because many operations must be performed in memory, so 3rd Gen Intel Xeon Scalable processors offer a significant improvement for these workloads.

More cores. Top-bin 3rd Gen Intel Xeon Scalable processors have 40 cores per socket, providing new opportunities for multithreaded data processing.

Advanced microarchitecture. The Instructions per Cycle (IPC) metric improved from 4 to 5, and the core of the new processors has 10 execution ports instead of 8. In addition, 3rd Gen Intel Xeon Scalable processors introduce new instruction sets, such as AVX512 BITALG, AVX512 VBMI2, and others. These instructions can improve single-core performance even at the same frequency.

Larger caches. The Intel Xeon Platinum 8380 processor provides 60MB of Last Level Cache (LLC)—56 percent larger than on the Intel Xeon Platinum 8280L processor (38.5MB). L2 cache increased from 1MB to 1.25MB per core, and L1 enlarged from 32KB to 48KB per core. Since some machine learning algorithms spend most of their time processing data residing in caches, caching improvements can have a significant impact on performance and simplicity.

New level of security. Machine learning algorithms often process confidential user data. To help protect this data, new Intel Xeon Scalable processors provide Intel® Software Guard Extension (Intel SGX), hardware-based memory encryption with granular control.

Conclusions

The software optimizations in Intel Extension for Scikit-learn, along with the advanced capabilities of new 3rd Gen Intel Xeon Scalable processors, deliver:

A range of 0.65 to 7.23 times the performance against NVIDIA A100
A range of 0.61 to 2.63 times the performance over AMD Milan
A range of 1.09 to 1.63 times performance gain against 2nd Gen Intel Xeon Scalable processors

These results show that 3rd Gen Intel Xeon Scalable processors are a great choice for classical machine learning and data analytics workloads. By leveraging 3rd Gen Intel Xeon Scalable processors and Intel Extension for Scikit-learn, organizations can enjoy excellent performance for their data science workloads. They can run enterprise applications and machine learning workloads on a single architecture, optimizing total cost of ownership for mixed workloads and bringing innovative solutions to market faster.

Platform	Model	Parameters	Testing date
3^rd Gen Intel Xeon Scalable processors	3^rd Gen Intel Xeon Platinum 8380 processor	2 sockets; 40 cores per socket; HT: on; Turbo: on; RAM: 512GB (16 slots / 32GB / 3200MHz)	3/19/2021
2^nd Gen Intel Xeon Scalable processors	2^nd Gen Intel Xeon Platinum 8280L processor	2 sockets; 28 cores per socket; HT: on; Turbo: on; RAM: 384GB (12 slots/ 32GB / 2933 MHz)	2/5/2021
AMD Milan	AMD EPYC™ 7763	AMD EPYC™ 7763 64-Core: 2 sockets; 64 cores per socket; HT: on; Turbo: on; RAM: 512GB (16 slots / 32GB / 3200MHz)	3/8/2021
NVIDIA A100	NVIDIA A100, AMD EPYC™ 7742	NVIDIA A100 Tensor (DGX-A100); AMD EPYC™ 7742 64-Core: 2 sockets; 64 cores per socket; HT: on; Turbo: on; RAM: 512GB (16 slots / 32GB / 3200MHz)	2/4/2021

Software	CPU workloads	GPU workload
Python version	3.7.9	3.7.9
Scikit-learn	Sklearn 0.24.1	-
Intel® Extension for Scikit-learn	2021.2.2	-
NVIDIA RAPIDS	-	RAPIDS 0.17
CUDA Toolkit	-	CUDA 11.0.221

Notices & Disclaimers

Performance varies by use, configuration and other factors. Learn more at www.intel.com/performanceindex.

Performance results are based on testing as of dates shown in configurations and may not reflect all publicly available updates. See backup for configuration details. No product or component can be absolutely secure.

Your costs and results may vary.

Intel technologies may require enabled hardware, software or service activation.

© Intel Corporation. Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. Other names and brands may be claimed as the property of others.