Compilers¶

The available compilers are accessed by loading the appropriate module.

To list all available compilers you can use the following module command and check for “compilers” and “parallel”

module avail
-------------- /apps/modulefiles/compilers ---------------
aocc/4.2.0   cuda/12.8.1  java/24      NVHPC_SDK/25.1   
aocc/5.0.0   gnu/13       llvm/12.0.1  NVHPC_SDK/25.3   
cuda/11.4    gnu/13.3.0   llvm/16.0.6  oneAPI/2025.0.1  
cuda/11.6    gnu/14       llvm/17.0.6  rust/1.81        
cuda/11.8    gnu/14.2.0   llvm/18.1.8  rust/1.85        
cuda/12.3.1  java/11      llvm/19.1.0  rust/1.86        
cuda/12.5.1  java/21      llvm/20.1.2

Compilers Overview¶

Overview of available compilers and supported languages.

Language	GNU	LLVM	NVHPC	File Extension
C	gcc	clang	nvc	.c
C++	g++	clang++	nvc++	.cpp, .cc, .C, .cxx
FORTRAN	gfortran	flang	nvfortran	.f,.for, .ftn, .f90, .f95, .fpp

GNU Compiler Collection¶

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (libstdc++, libgcj,…).

GCC was originally written as the compiler for the GNU operating system. The GNU system was developed to be 100% free software, free in the sense that it respects the user’s freedom.

To use GNU’s compilers collection, load gnu module.

module avail gnu

----------------------- /apps/modulefiles/compilers ----------------------- 
gnu/13  gnu/13.3.0  gnu/14  gnu/14.2.0

module load gnu

gcc --version
gcc (GCC) 13.3.0

Gnu Optimization flags¶

Option	Description
–help=optimizers	Show options that control optimizations
-Q -O[number] –help=optimizers	Show optimizers for each level O0-3
-O[0-3]	optimizer level
-Ofast	enables all -O3 optimizations plus -ffast-math, fno-protect-parens and -fstack-arrays
-Os	Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
-ffast-math	it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
march=[cputype]	GCC generate code for specific processor: native,znver3,…
mtune=[cputype]	Optimize code for specific processor: native,znver3,.. (march=native implies mtune=native)
-Q -march=native –help=target	Show details
-m[target]	Enable use of instructions sets, -mavx,-mavx2,-mfma…
-fomit-frame-pointer	Don’t keep the frame pointer in a register for functions that don’t need one.
-fp-model[name]	May enhance the consistency of floating point results by restricting certain optimizations.
-fno-alias/-fno-fnalias	Assumes no aliasing(within functions) in the program
-finline-functions	Consider all functions for inlining
-funroll-loops	Unroll loops whose number of iterations can be determined at compile time

Gnu Suggested optimization flags¶

gcc -O3 -march=znver3 -mtune=znver3

Check the full list of optimize options

LLVM¶

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines. The name “LLVM” itself is not an acronym; it is the full name of the project.

LLVM began as a research project at the University of Illinois, with the goal of providing a modern, SSA-based compilation strategy capable of supporting both static and dynamic compilation of arbitrary programming languages. Since then, LLVM has grown to be an umbrella project consisting of a number of subprojects, many of which are being used in production by a wide variety of commercial and open source projects as well as being widely used in academic research. Code in the LLVM project is licensed under the “Apache 2.0 License with LLVM exceptions”

module avail llvm

----------------------- /apps/modulefiles/compilers -----------------------
llvm/12.0.1  llvm/16.0.6  llvm/17.0.6  llvm/18.1.8  llvm/19.1.0  llvm/20.1.2

module load llvm

clang --version
clang version 19.1.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /gpfs/users/apps/compilers/llvm/19.1.0/bin

LLVM Optimization flags¶

Option	Description
–help=optimizers	Show options that control optimizations
-Q -O[number] –help=optimizers	Show optimizers for each level O0-3
-O[0-3]	optimizer level
-Ofast	enables all -O3 optimizations plus -ffast-math, fno-protect-parens and -fstack-arrays
-Os	Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
-ffast-math	it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
march=[cputype]	GCC generate code for specific processor: native,znver3,…
mtune=[cputype]	Optimize code for specific processor: native,znver3,.. (march=native implies mtune=native)
-Q -march=native –help=target	Show details
-m[target]	Enable use of instructions sets, -mavx,-mavx2,-mfma…
-fomit-frame-pointer	Don’t keep the frame pointer in a register for functions that don’t need one.
-fp-model[name]	May enhance the consistency of floating point results by restricting certain optimizations.
-fno-alias/-fno-fnalias	Assumes no aliasing(within functions) in the program
-finline-functions	Consider all functions for inlining
-funroll-loops	Unroll loops whose number of iterations can be determined at compile time

LLVM Suggested optimization flags¶

clang -O3 -march=znver3 -mtune=znver3

Check the full list of optimize options

NVHPC SDK¶

The NVIDIA HPC SDK C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC® directives, and CUDA®. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming. Performance profiling and debugging tools simplify porting and optimization of HPC applications, and containerization tools enable easy deployment on-premises or in the cloud. With support for NVIDIA GPUs and Arm or x86-64 CPUs running Linux, the HPC SDK provides the tools you need to build NVIDIA GPU-accelerated HPC applications.

module avail NVHPC_SDK
----------------------- /apps/modulefiles/compilers -----------------------
NVHPC_SDK/25.1  NVHPC_SDK/25.3

module load NVHPC_SDK/25.3
module avail 
-------------- /apps/modulefiles/StandAlone/compilers/nvhpc/25.3 ---------------
nvhpc-byo-compiler/25.3      nvhpc-hpcx-cuda12/25.3  nvhpc-nompi/25.3  
nvhpc-hpcx-2.20-cuda12/25.3  nvhpc-hpcx/25.3         nvhpc/25.3

nvc --version
nvc 25.3-0 64-bit target on x86-64 Linux -tp znver3 
NVIDIA Compilers and Tools
Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

NVHPC Optimization flags¶

Option	Description
–help=opt	Show options that control optimizations
-O[0-4]	optimizer level
-Ofast	Enable -O3 -Mfprelaxed -Mstack_arrays -Mno-nan -Mno-inf -fcx-limited-range
-fast	includes -O2 -Munroll=c:1 -Mlre -Mautoinline
-Minfo=opt	Display compile-time optimization info (e.g., inlining, vectorization, etc.)
-Munroll	Enable loop unrolling
-Minline	Inline all functions that were extracted
-Mvect=opt	Enable vectorization
-Mconcur=opt	Enable auto parallelization
-Mcache_align	Align large objects on cache-line boundaries
-Mflushz	Enable flushing of zeroed registers

NVHPC Suggested optimization flags¶

Info

The -O3 flag implements -O2 optimizations plus more aggressive code hoisting and scalar replacement, that may or may not be profitable, are performed. Implies -Mvect=simd, -Mflushz, and -Mcache_align.

nvc -O3 -march=znver3

Optimization Flags x86_64 processors¶

To achieve optimal performance of your application, please consider using appropriate compiler flags. Generally, the highest impact can be achieved by selecting an appropriate optimization level, by targeting the architecture of the computer (CPU, cache, memory system), and by allowing for inter-procedural analysis (inlining, etc.). There is no set of options that gives the highest speed-up for all applications. Consequently, different combinations have to be explored.

Here is an overview of the available optimization options for each compiler suite.

Optimization Level	Description
-O0	No optimization (default), generates unoptimized code but has the fastest compilation time. Debugging support if using -g
-O1	Moderate optimization, optimize for size
-O2	Optimize even more, maximize speed
-O3	Full optimization, more aggressive loop and memory-access optimizations.
-O4(NVHPC only)	All -O3 optimizations plus more aggressive hoisting of guarded expressions performed.
-Os	(LLVM, GNU) Optimize space usage (code and data) of resulting program.
-Ofast	Maximizes speed

Here is a list of some important compiler options that affect application performance, based on the target architecture, application behavior, loading, and debugging.

Please notice that optimization flags not always guarantee faster execution code time.

Option GNU/LLVM	NVHPC	Description
-O[0-3]	-O[0-4]	Optimizer level
-Os	-	Optimize space
-Ofast	-Ofast	Maximizes speed across the entire program.
-mtune,-march=native	-march=native	Compiler generates instructions for the ihighest instruction set available on the host processor. (AVX2)
-funroll-loops	-Munroll	Unroll loops
-finline-functions	-Minline	The compiler heuristically decides which functions are worth inlining.

Vectorization¶

The compiler will automatically check for vectorization opportunities when higher optimization levels are used. ARIS is capable AVX2 (Advanced Vector Extensions) recommended for AMD’s EPYC Milan processors.

Option GNU/LLVM	NVHPC	Description
-O[2-3], -Ofast	-O[2-3], -Ofast	Enable vectorization
-ftree-vectorize	-Mvect=simd	Specific enable
-fno-tree-vectorize	-Mnovect	Disable vectorization
-march=native	-march=native	Support AVX2
-mavx2	-mavx2	type of SIMD instructions

Full otpimization lists for each compiler.