Skip to content

Compilers

The available compilers are accessed by loading the appropriate module.

To list all available compilers you can use the following module command and check for “compilers” and “parallel”

module avail
-------------- /apps/modulefiles/compilers ---------------
aocc/4.2.0   cuda/12.8.1  java/24      NVHPC_SDK/25.1   
aocc/5.0.0   gnu/13       llvm/12.0.1  NVHPC_SDK/25.3   
cuda/11.4    gnu/13.3.0   llvm/16.0.6  oneAPI/2025.0.1  
cuda/11.6    gnu/14       llvm/17.0.6  rust/1.81        
cuda/11.8    gnu/14.2.0   llvm/18.1.8  rust/1.85        
cuda/12.3.1  java/11      llvm/19.1.0  rust/1.86        
cuda/12.5.1  java/21      llvm/20.1.2  

Compilers Overview

Overview of available compilers and supported languages.

Language GNU LLVM NVHPC File Extension
C gcc clang nvc .c
C++ g++ clang++ nvc++ .cpp, .cc, .C, .cxx
FORTRAN gfortran flang nvfortran .f,.for, .ftn, .f90, .f95, .fpp

GNU Compiler Collection

The GNU Compiler Collection includes front ends for C, C++, Objective-C, Fortran, Java, Ada, and Go, as well as libraries for these languages (libstdc++, libgcj,…).

GCC was originally written as the compiler for the GNU operating system. The GNU system was developed to be 100% free software, free in the sense that it respects the user’s freedom.

To use GNU’s compilers collection, load gnu module.

module avail gnu

----------------------- /apps/modulefiles/compilers ----------------------- 
gnu/13  gnu/13.3.0  gnu/14  gnu/14.2.0  
module load gnu

gcc --version
gcc (GCC) 13.3.0

Gnu Optimization flags

Option Description
–help=optimizers Show options that control optimizations
-Q -O[number] –help=optimizers Show optimizers for each level O0-3
-O[0-3] optimizer level
-Ofast enables all -O3 optimizations plus -ffast-math, fno-protect-parens and -fstack-arrays
-Os Optimize for size. -Os enables all -O2 optimizations that do not typically increase code size. It also performs further optimizations designed to reduce code size.
-ffast-math it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
march=[cputype] GCC generate code for specific processor: native,znver3,…
mtune=[cputype] Optimize code for specific processor: native,znver3,.. (march=native implies mtune=native)
-Q -march=native –help=target Show details
-m[target] Enable use of instructions sets, -mavx,-mavx2,-mfma…
-fomit-frame-pointer Don’t keep the frame pointer in a register for functions that don’t need one.
-fno-strict-aliasing Assumes no aliasing(within functions) in the program
-finline-functions Consider all functions for inlining
-funroll-loops Unroll loops whose number of iterations can be determined at compile time

Gnu Suggested optimization flags

gcc -O3 -march=znver3 -mtune=znver3 

Check the full list of optimize options

LLVM

The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Despite its name, LLVM has little to do with traditional virtual machines. The name “LLVM” itself is not an acronym; it is the full name of the project.

LLVM began as a research project at the University of Illinois, with the goal of providing a modern, SSA-based compilation strategy capable of supporting both static and dynamic compilation of arbitrary programming languages. Since then, LLVM has grown to be an umbrella project consisting of a number of subprojects, many of which are being used in production by a wide variety of commercial and open source projects as well as being widely used in academic research. Code in the LLVM project is licensed under the “Apache 2.0 License with LLVM exceptions”

module avail llvm

----------------------- /apps/modulefiles/compilers -----------------------
llvm/12.0.1  llvm/16.0.6  llvm/17.0.6  llvm/18.1.8  llvm/19.1.0  llvm/20.1.2  
module load llvm

clang --version
clang version 19.1.0
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /gpfs/users/apps/compilers/llvm/19.1.0/bin

LLVM Optimization flags

Option Description
–help Show compiler options
-O[0-3] optimizer level
-ffast-math it can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
-march=[cputype] LLVM generates code for specific processor: native,znver3,…
-mtune=[cputype] Optimize code for specific processor: native,znver3,.. (march=native implies mtune=native)
-m[target] Enable use of instructions sets, -mavx,-mavx2,-mfma…
-fomit-frame-pointer Don’t keep the frame pointer in a register for functions that don’t need one.
-fno-strict-aliasing Enable optimizations based on strict aliasing rules
-finline-functions Consider all functions for inlining
-funroll-loops Unroll loops whose number of iterations can be determined at compile time

LLVM Suggested optimization flags

clang -O3 -march=znver3 -mtune=znver3 

Check the full list of optimize options

AOCC

The AMD Optimizing C/C++ and Fortran Compiler (AOCC) is a high-performance compiler suite tuned specifically for AMD processors. Built upon the LLVM/Clang infrastructure, AOCC provides a compilation environment for C, C++, and Fortran. It offers target-dependent and target-independent optimizations, with a particular focus on AMD “Zen” processors. AOCC enhances standard LLVM optimizations and introduces AMD-specific features to accelerate scientific, high-performance computing (HPC).

module load aocc

clang --version

AMD clang version 17.0.6 (CLANG: AOCC_5.0.0-Build#1377 2024_09_24)
Target: x86_64-unknown-linux-gnu
Thread model: posix
InstalledDir: /apps/compilers/aocc/5.0.0/bin

AOCC Optimization flags

Option Description
–help Show compiler options
-O[0-3] Optimizer level
-Os or-Oz Similar to the level -O2, but with extra optimizations to reduce the code size. -Oz even further
-Ofast Enables all the optimizations from -O3 along with other aggressive optimizations that may violate strict compliance with language standards.
-ffast-math It can result in incorrect output for programs that depend on an exact implementation of IEEE or ISO rules/specifications for math functions. It may, however, yield faster code for programs that do not require the guarantees of these specifications.
-fopenmp Enables handling of the OpenMP directives and generates parallel code. The OpenMP library to be linked can be specified through the option -fopenmp=library.
-ffp-model={precise, fast} Specifies floating point behavior.
-march=[cputype] AOCC generates code for specific processor: native,znver3,…
-mtune=[cputype] Optimize code for specific processor: native,znver3,.. (march=native implies mtune=native)
-m[target] Enable use of instructions sets, -mavx,-mavx2,-mfma…
-fomit-frame-pointer Don’t keep the frame pointer in a register for functions that don’t need one.
-fp-model[name] May enhance the consistency of floating point results by restricting certain optimizations.
-fno-strict-aliasing Assumes no aliasing(within functions) in the program
-finline-functions Consider all functions for inlining
-funroll-loops Unroll loops whose number of iterations can be determined at compile time

AOCC Suggested optimization flags

clang -O3 -march=znver3 -mtune=znver3 

Check the full list of optimize options

NVHPC SDK

The NVIDIA HPC SDK C, C++, and Fortran compilers support GPU acceleration of HPC modeling and simulation applications with standard C++ and Fortran, OpenACC® directives, and CUDA®. GPU-accelerated math libraries maximize performance on common HPC algorithms, and optimized communications libraries enable standards-based multi-GPU and scalable systems programming. Performance profiling and debugging tools simplify porting and optimization of HPC applications, and containerization tools enable easy deployment on-premises or in the cloud. With support for NVIDIA GPUs and Arm or x86-64 CPUs running Linux, the HPC SDK provides the tools you need to build NVIDIA GPU-accelerated HPC applications.

module avail NVHPC_SDK
----------------------- /apps/modulefiles/compilers -----------------------
NVHPC_SDK/25.1  NVHPC_SDK/25.3
module load NVHPC_SDK/25.3
module avail 
-------------- /apps/modulefiles/StandAlone/compilers/nvhpc/25.3 ---------------
nvhpc-byo-compiler/25.3      nvhpc-hpcx-cuda12/25.3  nvhpc-nompi/25.3  
nvhpc-hpcx-2.20-cuda12/25.3  nvhpc-hpcx/25.3         nvhpc/25.3 
nvc --version
nvc 25.3-0 64-bit target on x86-64 Linux -tp znver3 
NVIDIA Compilers and Tools
Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES.  All rights reserved.

NVHPC Optimization flags

Option Description
–help=opt Show options that control optimizations
-O[0-4] optimizer level
-Ofast Enable -O3 -Mfprelaxed -Mstack_arrays -Mno-nan -Mno-inf -fcx-limited-range
-fast includes -O2 -Munroll=c:1 -Mlre -Mautoinline
-Minfo=opt Display compile-time optimization info (e.g., inlining, vectorization, etc.)
-Munroll Enable loop unrolling
-Minline Inline all functions that were extracted
-Mvect=opt Enable vectorization
-Mconcur=opt Enable auto parallelization
-Mcache_align Align large objects on cache-line boundaries
-Mflushz Enable flushing of zeroed registers

NVHPC Suggested optimization flags

Info

The -O3 flag implements -O2 optimizations plus more aggressive code hoisting and scalar replacement, that may or may not be profitable, are performed. Implies -Mvect=simd, -Mflushz, and -Mcache_align.

nvc -O3 -march=znver3

Optimization Flags x86_64 processors

To achieve optimal performance of your application, please consider using appropriate compiler flags. Generally, the highest impact can be achieved by selecting an appropriate optimization level, by targeting the architecture of the computer (CPU, cache, memory system), and by allowing for inter-procedural analysis (inlining, etc.). There is no set of options that gives the highest speed-up for all applications. Consequently, different combinations have to be explored.

Here is an overview of the available optimization options for each compiler suite.

Optimization Level Description
-O0 No optimization (default), generates unoptimized code but has the fastest compilation time. Debugging support if using -g
-O1 Moderate optimization, optimize for size
-O2 Optimize even more, maximize speed
-O3 Full optimization, more aggressive loop and memory-access optimizations.
-O4(NVHPC) All -O3 optimizations plus more aggressive hoisting of guarded expressions performed.
-Os (LLVM, GNU) Optimize space usage (code and data) of resulting program.
-Ofast Maximizes speed

Here is a list of some important compiler options that affect application performance, based on the target architecture, application behavior, loading, and debugging.

Please notice that optimization flags not always guarantee faster execution code time.

GNU LLVM/AOCC NVHPC Description
-O[0-3] -O[0-3] -O4 (AOCC) -O[0-4] Optimizer level
-Os -Os - Optimize space
-Ofast -Ofast -Ofast Maximizes speed across the entire program.
-mtune,-march=native -mtune, -march=native -march=native Compiler generates instructions for the highest instruction set available on the host processor. (AVX2)
-funroll-loops –funroll-loops -Munroll Unroll loops
-finline-functions -finline-functions -Minline The compiler heuristically decides which functions are worth inlining.

Vectorization

The compiler will automatically check for vectorization opportunities when higher optimization levels are used. ARIS is capable AVX2 (Advanced Vector Extensions) recommended for AMD’s EPYC Milan processors.

GNU LLVM/AOCC NVHPC Description
-O[2-3], -Ofast O[2-3], -Ofast -O[2-4], -Ofast Enable vectorization
-ftree-vectorize -ftree-vectorize -Mvect=simd Specific enable / Automatic vectorization is enabled by default in LLVM/AOCC for -O2 and higher
-fno-tree-vectorize -Mnovect Disable vectorization
-march=native -march=native -march=native Support AVX2
-mavx2 -mavx2 -mavx2 Type of SIMD instructions

Full otpimization lists for each compiler.