Which FFTW3 build to use on Discoverer
FFTW3 is an important library package that provides large number of HPC applications with a collection of external subroutines for computing the discrete Fourier transform (DFT) in one or more dimensions. In HPC environment it is a common case to have several different builds of FFTW3 available to support the variety of installed MPI libraries, threads libraries, and common models for application code building, linking, and executing. But in spite how many such builds are presented locally, the most compatible one should be carefully selected. Note that by not linking your FFTW3-dependable binaries against a compatible FFTW3 build (library), your application code might suffer execution problems (segmentation fault), low performance, or may produce wrong numerical results.
Discoverer HPC software repository provides access to eight (8) different FFTW3 builds. Separate access to each of them can be obtained by invoking the corresponding environment modules:
- compiled with AMD AOCC and linked against MPICH:
module load fftw/3/latest-aocc-mpich
- compiled with AMD AOCC and linked against OpenMPI:
module load fftw/3/latest-aocc-openmpi
- compiled with GCC and linked against MPICH:
module load fftw/3/latest-gcc-mpich
- compiled with GCC and linked against OpenMPI:
module load fftw/3/latest-gcc-openmpi
- compiled with Intel oneAPI classic compilers and linked against Intel MPI:
module load fftw/3/latest-intelmpi - compiled with NVIDIA HPC SDK compilers and linked against OpeMPI:
module load fftw/3/latest-nvidia-openmpi
- compiled with Intel oneAPI classic compilers and linked against OpeMPI:
module load fftw/3/3.3.10-intel-openmpi
- Intel oneAPI MPI and Intel oneAPI MKL:
module load fftw/3/latest-intelmpi
The FFTW3-dependable applications available in the Discoverer HPC software repository comes with environment modules that take care of loading the most compatible FFTW3 build. If you need to run one of those applications, then you do not need to care about the selection of FFTW3 build.
If your FFTW3-dependable application is already compiled elsewhere and then brought by you to Discoverer HPC infrastructure as a binary executable core, the easiest way to decide which FFTW3 build will contribute mostly to the productivity of the computations, is by having information about the pair of compilers and MPI library employed during the compilation. The variety of builds provided in the Discoverer HPC software repository should match most cases.
If your FFTW3-dependable application comes as a source code, which you have to compile it on Discoverer HPC compute nodes, it is important to make a decision regarding the adoption of which pair of compilers and FFTW3 build will mostly benefit the productivity of the produced binary executable code. That decision should be based on (i) the documentation provided by the code vendor/distributor, (ii) your personal experience, or (iii) the advice given by the Discoverer HPC support team. Note that different FFTW3-dependable codes show different affinity to compilers and MPI libraries. In many cases that affinity is due to the build script system. For instance, some configure or CMake-based setups for generating compilation configurations cannot recognize the FFTW3 library provided by Intel oneAPI MKL. That usually happens because those setups expect to find files like libfftw3f.so
in the lib
directory of Intel oneAPI MKL installation, instead of parsing libmkl_rt.so
. If that is the case and you cannot change the build configuration setup, you may run your compilation process based on the FFTW3 build compiled with Intel oneAPI classic compilers and linked against OpeMPI.
In case the FFTW3 build provided by Intel oneAPI MKL is employed for compiling the source code, it is necessary to consult the Intel oneAPI link advisor:
https://www.intel.com/content/www/us/en/developer/tools/oneapi/onemkl-link-line-advisor.html
about the correct set of compiler and linker flags.
Another important question to address here is about how to achieve maximum productivity in terms of speed of execution. The speed of execution (compared on the same hardware) depends mostly on the code quality, and the employed compilers and MPI library. Very often the code compiled with Intel oneAPI classic compilers and linked against Intel oneAPI MPI library shows highest productivity on AMD Zen2. The combination of NVIDIA HPC SDK compilers (those are LLVM compilers) and OpenMPI is next in line. Employed for compiling certain source code projects that combination may produce faster executable code, compared to the one build with Intel oneAPI MKL and Intel oneAPI MPI.
Using the quad precision FFTW3 build is rather rare. Only the GCC-based builds of FFTW3 offer that kind of precision. AMD AOCC, Intel oneAPI, and NVIDIA HPC SDK compilers cannot produce quad precision build of FFTW3. Note that the quad precision build of FFTW3 is not compatible with AVX2 SIMD instructions.
The recipe used for building and installing the first seven FFTW3 builds listed above is available online at:
https://gitlab.discoverer.bg/vkolev/recipes/-/tree/main/fftw/3