Clang Compiler User’s Manual — Clang 20.0.0git documentation (2024)

Introduction
- Terminology
- Basic Usage
Command Line Options
- Options to Control Error and Warning Messages
  - Formatting of Diagnostics
  - Individual Warning Groups
- Options to Control Clang Crash Diagnostics
- Options to Emit Optimization Reports
  - Current limitations
- Options to Emit Resource Consumption Reports
- Other Options
- Configuration files
Language and Target-Independent Features
- Controlling Errors and Warnings
  - Controlling How Clang Displays Diagnostics
  - Diagnostic Mappings
  - Diagnostic Categories
  - Controlling Diagnostics via Command Line Flags
  - Controlling Diagnostics via Pragmas
  - Controlling Diagnostics in System Headers
  - Controlling Deprecation Diagnostics in Clang-Provided C Runtime Headers
  - Enabling All Diagnostics
  - Controlling Static Analyzer Diagnostics
- Precompiled Headers
  - Generating a PCH File
  - Using a PCH File
  - Relocatable PCH Files
- Controlling Floating Point Behavior
  - Accessing the floating point environment
  - A note about crtfastmath.o
  - A note about __FLT_EVAL_METHOD__
  - A note about Floating Point Constant Evaluation
- Controlling Code Generation
- Profile Guided Optimization
  - Differences Between Sampling and Instrumentation
  - Using Sampling Profilers
    - Sample Profile Formats
    - Sample Profile Text Format
  - Profiling with Instrumentation
  - Fine Tuning Profile Collection
  - Disabling Instrumentation
  - Instrumenting only selected files or functions
    - Older Prefixes
  - Instrument function groups
  - Profile remapping
- GCOV-based Profiling
- Controlling Debug Information
  - Controlling Size of Debug Information
  - Controlling Macro Debug Info Generation
  - Controlling Debugger “Tuning”
- Controlling LLVM IR Output
  - Controlling Value Names in LLVM IR
- Comment Parsing Options
C Language Features
- Extensions supported by clang
- Differences between various standard modes
- GCC extensions not implemented yet
- Intentionally unsupported GCC extensions
- Microsoft extensions
C++ Language Features
- Controlling implementation limits
Objective-C Language Features
Objective-C++ Language Features
OpenMP Features
- Controlling implementation limits
OpenCL Features
- OpenCL Specific Options
- OpenCL Targets
  - Specific Targets
  - Generic Targets
- OpenCL Header
- OpenCL Extensions
- OpenCL-Specific Attributes
  - nosvm
  - opencl_unroll_hint
  - convergent
  - noduplicate
- C++ for OpenCL
  - Constructing and destroying global objects
  - Libraries
Target-Specific Features and Limitations
- CPU Architectures Features and Limitations
  - X86
  - ARM
  - PowerPC
  - Other platforms
- Operating System Features and Limitations
  - Windows
    - Cygwin
    - MinGW32
    - MinGW-w64
  - AIX
    - TOC Data Transformation
    - Default Visibility Export Mapping
- SPIR-V support
clang-cl
- Command-Line Options
  - The /clang: Option
  - The /Zc:dllexportInlines- Option
  - Finding Clang runtime libraries
  - Windows System Headers and Library Lookup
- Restrictions and Limitations compared to Clang
  - Strict Aliasing

Introduction¶

The Clang Compiler is an open-source compiler for the C family ofprogramming languages, aiming to be the best in class implementation ofthese languages. Clang builds on the LLVM optimizer and code generator,allowing it to provide high-quality optimization and code generationsupport for many targets. For more general information, please see theClang Web Site or the LLVM WebSite.

This document describes important notes about using Clang as a compilerfor an end-user, documenting the supported features, command lineoptions, etc. If you are interested in using Clang to build a tool thatprocesses code, please see “Clang” CFE Internals Manual. If you are interested in theClang Static Analyzer, please see its webpage.

Clang is one component in a complete toolchain for C family languages.A separate document describes the other pieces necessary toassemble a complete toolchain.

Clang is designed to support the C family of programming languages,which includes C, Objective-C, C++, andObjective-C++ as well as many dialects of those. Forlanguage-specific information, please see the corresponding languagespecific section:

C Language: K&R C, ANSI C89, ISO C90, ISO C94 (C89+AMD1), ISOC99 (+TC1, TC2, TC3).
Objective-C Language: ObjC 1, ObjC 2, ObjC 2.1, plusvariants depending on base language.
C++ Language
Objective C++ Language
OpenCL Kernel Language: OpenCL C 1.0, 1.1, 1.2, 2.0, 3.0,and C++ for OpenCL 1.0 and 2021.

In addition to these base languages and their dialects, Clang supports abroad variety of language extensions, which are documented in thecorresponding language section. These extensions are provided to becompatible with the GCC, Microsoft, and other popular compilers as wellas to improve functionality through Clang-specific features. The Clangdriver and language features are intentionally designed to be ascompatible with the GNU GCC compiler as reasonably possible, easingmigration from GCC to Clang. In most cases, code “just works”.Clang also provides an alternative driver, clang-cl, that is designedto be compatible with the Visual C++ compiler, cl.exe.

In addition to language specific features, Clang has a variety offeatures that depend on what CPU architecture or operating system isbeing compiled for. Please see the Target-Specific Features andLimitations section for more details.

The rest of the introduction introduces some basic compilerterminology that is used throughout this manual andcontains a basic introduction to using Clang as acommand line compiler.

Terminology¶

Front end, parser, backend, preprocessor, undefined behavior,diagnostic, optimizer

Basic Usage¶

Intro to how to use a C compiler for newbies.

compile + link compile then link debug info enabling optimizationspicking a language to use, defaults to C17 by default. Autosenses basedon extension. using a makefile

Command Line Options¶

This section is generally an index into other sections. It does not gointo depth on the ones that are covered by other sections. However, thefirst part introduces the language selection and other high leveloptions like -c, -g, etc.

Options to Control Error and Warning Messages¶

-Werror¶: Turn warnings into errors.

-Werror=foo

Turn warning “foo” into an error.

-Wno-error=foo¶: Turn warning “foo” into a warning even if -Werror is specified.

-Wfoo¶: Enable warning “foo”.See the diagnostics reference for a completelist of the warning flags that can be specified in this way.

-Wno-foo¶: Disable warning “foo”.

-w¶: Disable all diagnostics.

-Weverything¶: Enable all diagnostics.

-pedantic¶: Warn on language extensions.

-pedantic-errors¶: Error on language extensions.

-Wsystem-headers¶: Enable warnings from system headers.

-ferror-limit=123¶: Stop emitting diagnostics after 123 errors have been produced. The default is20, and the error limit can be disabled with -ferror-limit=0.

-ftemplate-backtrace-limit=123¶: Only emit up to 123 template instantiation notes within the templateinstantiation backtrace for a single warning or error. The default is 10, andthe limit can be disabled with -ftemplate-backtrace-limit=0.

Formatting of Diagnostics¶

Clang aims to produce beautiful diagnostics by default, particularly fornew users that first come to Clang. However, different people havedifferent preferences, and sometimes Clang is driven not by a human,but by a program that wants consistent and easily parsable output. Forthese cases, Clang provides a wide range of options to control the exactoutput format of the diagnostics that it generates.

-f[no-]show-column¶

Print column number in diagnostic.

This option, which defaults to on, controls whether or not Clangprints the column number of a diagnostic. For example, when this isenabled, Clang will print something like:

test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]#endif bad ^ //

When this is disabled, Clang will print “test.c:28: warning…” withno column number.

The printed column numbers count bytes from the beginning of theline; take care if your source contains multibyte characters.

-f[no-]show-source-location¶

Print source file/line/column information in diagnostic.

This option, which defaults to on, controls whether or not Clangprints the filename, line number and column number of a diagnostic.For example, when this is enabled, Clang will print something like:

test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]#endif bad ^ //

When this is disabled, Clang will not print the “test.c:28:8: ”part.

-f[no-]caret-diagnostics¶

Print source line and ranges from source code in diagnostic.This option, which defaults to on, controls whether or not Clangprints the source line, source ranges, and caret when emitting adiagnostic. For example, when this is enabled, Clang will printsomething like:

test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]#endif bad ^ //

-f[no-]color-diagnostics¶

This option, which defaults to on when a color-capable terminal isdetected, controls whether or not Clang prints diagnostics in color.

When this option is enabled, Clang will use colors to highlightspecific parts of the diagnostic, e.g.,

 test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens] #endif bad ^ //

When this is disabled, Clang will just print:

test.c:2:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]#endif bad ^ //

If the NO_COLOR environment variable is defined and not empty(regardless of value), color diagnostics are disabled. If NO_COLOR isdefined and -fcolor-diagnostics is passed on the command line, Clangwill honor the command line argument.

-fansi-escape-codes¶: Controls whether ANSI escape codes are used instead of the Windows ConsoleAPI to output colored diagnostics. This option is only used on Windows anddefaults to off.

-fdiagnostics-format=clang/msvc/vi¶

Changes diagnostic output format to better match IDEs and command line tools.

This option controls the output format of the filename, line number,and column printed in diagnostic messages. The options, and theiraffect on formatting a simple conversion diagnostic, follow:

clang (default)

t.c:3:11: warning: conversion specifies type 'char *' but the argument has type 'int'

msvc

t.c(3,11) : warning: conversion specifies type 'char *' but the argument has type 'int'

vi

t.c +3:11: warning: conversion specifies type 'char *' but the argument has type 'int'

-f[no-]diagnostics-show-option¶

Enable [-Woption] information in diagnostic line.

This option, which defaults to on, controls whether or not Clangprints the associated warning groupoption name when outputting a warning diagnostic. For example, inthis output:

test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]#endif bad ^ //

Passing -fno-diagnostics-show-option will prevent Clang fromprinting the [-Wextra-tokens] information inthe diagnostic. This information tells you the flag needed to enableor disable the diagnostic, either from the command line or through#pragma GCC diagnostic.

-fdiagnostics-show-category=none/id/name¶

Enable printing category information in diagnostic line.

This option, which defaults to “none”, controls whether or not Clangprints the category associated with a diagnostic when emitting it.Each diagnostic may or many not have an associated category, if ithas one, it is listed in the diagnostic categorization field of thediagnostic line (in the []’s).

For example, a format string warning will produce these threerenditions based on the setting of this option:

t.c:3:11: warning: conversion specifies type 'char *' but the argument has type 'int' [-Wformat]t.c:3:11: warning: conversion specifies type 'char *' but the argument has type 'int' [-Wformat,1]t.c:3:11: warning: conversion specifies type 'char *' but the argument has type 'int' [-Wformat,Format String]

This category can be used by clients that want to group diagnosticsby category, so it should be a high level category. We want dozensof these, not hundreds or thousands of them.

-f[no-]save-optimization-record[=<format>]¶

Enable optimization remarks during compilation and write them to a separatefile.

This option, which defaults to off, controls whether Clang writesoptimization reports to a separate file. By recording diagnostics in a file,users can parse or sort the remarks in a convenient way.

By default, the serialization format is YAML.

The supported serialization formats are:

-fsave-optimization-record=yaml: A structured YAML format.
-fsave-optimization-record=bitstream: A binary format based on LLVMBitstream.

The output file is controlled by -foptimization-record-file.

In the absence of an explicit output file, the file is chosen using thefollowing scheme:

<base>.opt.<format>

where <base> is based on the output file of the compilation (whetherit’s explicitly specified through -o or not) when used with -c or -S.For example:

clang -fsave-optimization-record -c in.c -o out.o will generateout.opt.yaml
clang -fsave-optimization-record -c in.c will generatein.opt.yaml

When targeting (Thin)LTO, the base is derived from the output filename, andthe extension is not dropped.

When targeting ThinLTO, the following scheme is used:

<base>.opt.<format>.thin.<num>.<format>

Darwin-only: when used for generating a linked binary from a source file(through an intermediate object file), the driver will invoke cc1 togenerate a temporary object file. The temporary remark file will be emittednext to the object file, which will then be picked up by dsymutil andemitted in the .dSYM bundle. This is available for all formats except YAML.

For example:

clang -fsave-optimization-record=bitstream in.c -o out will generate

/var/folders/43/9y164hh52tv_2nrdxrj31nyw0000gn/T/a-9be59b.o
/var/folders/43/9y164hh52tv_2nrdxrj31nyw0000gn/T/a-9be59b.opt.bitstream
out
out.dSYM/Contents/Resources/Remarks/out

Darwin-only: compiling for multiple architectures will use the followingscheme:

<base>-<arch>.opt.<format>

Note that this is incompatible with passing the-foptimization-record-file option.

-foptimization-record-file¶: Control the file to which optimization reports are written. This implies-fsave-optimization-record.
On Darwin platforms, this is incompatible with passing multiple-arch <arch> options.

-foptimization-record-passes¶

Only include passes which match a specified regular expression.

When optimization reports are being output (see-fsave-optimization-record), thisoption controls the passes that will be included in the final report.

If this option is not used, all the passes are included in the optimizationrecord.

-f[no-]diagnostics-show-hotness¶

Enable profile hotness information in diagnostic line.

This option controls whether Clang prints the profile hotness associatedwith diagnostics in the presence of profile-guided optimization information.This is currently supported with optimization remarks (seeOptions to Emit Optimization Reports). The hotness informationallows users to focus on the hot optimization remarks that are likely to bemore relevant for run-time performance.

For example, in this output, the block containing the callsite of foo wasexecuted 3000 times according to the profile data:

s.c:7:10: remark: foo inlined into bar (hotness: 3000) [-Rpass-analysis=inline] sum += foo(x, x - 2); ^

This option is implied when-fsave-optimization-record is used.Otherwise, it defaults to off.

-fdiagnostics-hotness-threshold¶

Prevent optimization remarks from being output if they do not have at leastthis hotness value.

This option, which defaults to zero, controls the minimum hotness anoptimization remark would need in order to be output by Clang. This iscurrently supported with optimization remarks (see Options to EmitOptimization Reports) when profile hotness information indiagnostics is enabled (see-fdiagnostics-show-hotness).

-f[no-]diagnostics-fixit-info¶

Enable “FixIt” information in the diagnostics output.

This option, which defaults to on, controls whether or not Clangprints the information on how to fix a specific diagnosticunderneath it when it knows. For example, in this output:

test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]#endif bad ^ //

Passing -fno-diagnostics-fixit-info will prevent Clang fromprinting the “//” line at the end of the message. This informationis useful for users who may not understand what is wrong, but can beconfusing for machine parsing.

-fdiagnostics-print-source-range-info¶

Print machine parsable information about source ranges.This option makes Clang print information about source ranges in a machineparsable format after the file/line/column number information. Theinformation is a simple sequence of brace enclosed ranges, where each rangelists the start and end line/column locations. For example, in this output:

exprs.c:47:15:{47:8-47:14}{47:17-47:24}: error: invalid operands to binary expression ('int *' and '_Complex float') P = (P-42) + Gamma*4; ~~~~~~ ^ ~~~~~~~

The {}’s are generated by -fdiagnostics-print-source-range-info.

The printed column numbers count bytes from the beginning of theline; take care if your source contains multibyte characters.

-fdiagnostics-parseable-fixits¶

Print Fix-Its in a machine parseable form.

This option makes Clang print available Fix-Its in a machineparseable format at the end of diagnostics. The following exampleillustrates the format:

fix-it:"t.cpp":{7:25-7:29}:"Gamma"

The range printed is a half-open range, so in this example thecharacters at column 25 up to but not including column 29 on line 7in t.cpp should be replaced with the string “Gamma”. Either therange or the replacement string may be empty (representing strictinsertions and strict erasures, respectively). Both the file nameand the insertion string escape backslash (as “\\”), tabs (as“\t”), newlines (as “\n”), double quotes(as “\””) andnon-printable characters (as octal “\xxx”).

The printed column numbers count bytes from the beginning of theline; take care if your source contains multibyte characters.

-fno-elide-type¶

Turns off elision in template type printing.

The default for template type printing is to elide as many templatearguments as possible, removing those which are the same in bothtemplate types, leaving only the differences. Adding this flag willprint all the template arguments. If supported by the terminal,highlighting will still appear on differing arguments.

Default:

t.cc:4:5: note: candidate function not viable: no known conversion from 'vector<map<[...], map<float, [...]>>>' to 'vector<map<[...], map<double, [...]>>>' for 1st argument;

-fno-elide-type:

t.cc:4:5: note: candidate function not viable: no known conversion from 'vector<map<int, map<float, int>>>' to 'vector<map<int, map<double, int>>>' for 1st argument;

-fdiagnostics-show-template-tree¶

Template type diffing prints a text tree.

For diffing large templated types, this option will cause Clang todisplay the templates as an indented text tree, one argument perline, with differences marked inline. This is compatible with-fno-elide-type.

Default:

t.cc:4:5: note: candidate function not viable: no known conversion from 'vector<map<[...], map<float, [...]>>>' to 'vector<map<[...], map<double, [...]>>>' for 1st argument;

With -fdiagnostics-show-template-tree:

t.cc:4:5: note: candidate function not viable: no known conversion for 1st argument; vector< map< [...], map< [float != double], [...]>>>

-fcaret-diagnostics-max-lines:¶: Controls how many lines of code clang prints for diagnostics. By default,clang prints a maximum of 16 lines of code.

-fdiagnostics-show-line-numbers:¶

Controls whether clang will print a margin containing the line number onthe left of each line of code it prints for diagnostics.

Default:

test.cpp:5:1: error: 'main' must return 'int' 5 | void main() {} | ^~~~ | int

With -fno-diagnostics-show-line-numbers:

test.cpp:5:1: error: 'main' must return 'int'void main() {}^~~~int

Individual Warning Groups¶

TODO: Generate this from tblgen. Define one anchor per warning group.

-Wextra-tokens¶

Warn about excess tokens at the end of a preprocessor directive.

This option, which defaults to on, enables warnings about extratokens at the end of preprocessor directives. For example:

test.c:28:8: warning: extra tokens at end of #endif directive [-Wextra-tokens]#endif bad ^

These extra tokens are not strictly conforming, and are usually besthandled by commenting them out.

-Wambiguous-member-template¶

Warn about unqualified uses of a member template whose name resolves toanother template at the location of the use.

This option, which defaults to on, enables a warning in thefollowing code:

template<typename T> struct set{};template<typename T> struct trait { typedef const T& type; };struct Value { template<typename T> void set(typename trait<T>::type value) {}};void foo() { Value v; v.set<double>(3.2);}

C++ [basic.lookup.classref] requires this to be an error, but,because it’s hard to work around, Clang downgrades it to a warningas an extension.

-Wbind-to-temporary-copy¶

Warn about an unusable copy constructor when binding a reference to atemporary.

This option enables warnings about binding areference to a temporary when the temporary doesn’t have a usablecopy constructor. For example:

struct NonCopyable { NonCopyable();private: NonCopyable(const NonCopyable&);};void foo(const NonCopyable&);void bar() { foo(NonCopyable()); // Disallowed in C++98; allowed in C++11.}

struct NonCopyable2 { NonCopyable2(); NonCopyable2(NonCopyable2&);};void foo(const NonCopyable2&);void bar() { foo(NonCopyable2()); // Disallowed in C++98; allowed in C++11.}

Note that if NonCopyable2::NonCopyable2() has a default argumentwhose instantiation produces a compile error, that error will stillbe a hard error in C++98 mode even if this warning is turned off.

Options to Control Clang Crash Diagnostics¶

As unbelievable as it may sound, Clang does crash from time to time.Generally, this only occurs to those living on the bleedingedge. Clang goes to greatlengths to assist you in filing a bug report. Specifically, Clanggenerates preprocessed source file(s) and associated run script(s) upona crash. These files should be attached to a bug report to easereproducibility of the failure. Below are the command line options tocontrol the crash diagnostics.

-fcrash-diagnostics=<val>¶

Valid values are:

off (Disable auto-generation of preprocessed source files during a clang crash.)
compiler (Generate diagnostics for compiler crashes (default))
all (Generate diagnostics for all tools which support it)

-fno-crash-diagnostics¶

Disable auto-generation of preprocessed source files during a clang crash.

The -fno-crash-diagnostics flag can be helpful for speeding the processof generating a delta reduced test case.

-fcrash-diagnostics-dir=<dir>¶: Specify where to write the crash diagnostics files; defaults to theusual location for temporary files.

CLANG_CRASH_DIAGNOSTICS_DIR=<dir>¶: Like -fcrash-diagnostics-dir=<dir>, specifies where to write thecrash diagnostics files, but with lower precedence than the option.

Clang is also capable of generating preprocessed source file(s) and associatedrun script(s) even without a crash. This is specially useful when trying togenerate a reproducer for warnings or errors while using modules.

-gen-reproducer¶: Generates preprocessed source files, a reproducer script and if relevant, acache containing: built module pcm’s and all headers needed to rebuild thesame modules.

Options to Emit Optimization Reports¶

Optimization reports trace, at a high-level, all the major decisionsdone by compiler transformations. For instance, when the inlinerdecides to inline function foo() into bar(), or the loop unrollerdecides to unroll a loop N times, or the vectorizer decides tovectorize a loop body.

Clang offers a family of flags which the optimizers can use to emita diagnostic in three cases:

When the pass makes a transformation (-Rpass).
When the pass fails to make a transformation (-Rpass-missed).
When the pass determines whether or not to make a transformation(-Rpass-analysis).

NOTE: Although the discussion below focuses on -Rpass, the exactsame options apply to -Rpass-missed and -Rpass-analysis.

Since there are dozens of passes inside the compiler, each of these flagstake a regular expression that identifies the name of the pass which shouldemit the associated diagnostic. For example, to get a report from the inliner,compile the code with:

$ clang -O2 -Rpass=inline code.cc -o codecode.cc:4:25: remark: foo inlined into bar [-Rpass=inline]int bar(int j) { return foo(j, j - 2); } ^

Note that remarks from the inliner are identified with [-Rpass=inline].To request a report from every optimization pass, you should use-Rpass=.* (in fact, you can use any valid POSIX regularexpression). However, do not expect a report from every transformationmade by the compiler. Optimization remarks do not really make senseoutside of the major transformations (e.g., inlining, vectorization,loop optimizations) and not every optimization pass supports thisfeature.

Note that when using profile-guided optimization information, profile hotnessinformation can be included in the remarks (see-fdiagnostics-show-hotness).

Current limitations¶

Optimization remarks that refer to function names will display themangled name of the function. Since these remarks are emitted by theback end of the compiler, it does not know anything about the inputlanguage, nor its mangling rules.
Some source locations are not displayed correctly. The front end hasa more detailed source location tracking than the locations includedin the debug info (e.g., the front end can locate code inside macroexpansions). However, the locations used by -Rpass aretranslated from debug annotations. That translation can be lossy,which results in some remarks having no location information.

Options to Emit Resource Consumption Reports¶

These are options that report execution time and consumed memory of differentcompilations steps.

-fproc-stat-report=¶

This option requests driver to print used memory and execution time of eachcompilation step. The clang driver during execution calls different tools,like compiler, assembler, linker etc. With this option the driver reportstotal execution time, the execution time spent in user mode and peak memoryusage of each the called tool. Value of the option specifies where the reportis sent to. If it specifies a regular file, the data are saved to this file inCSV format:

$ clang -fproc-stat-report=abc foo.c$ cat abcclang-11,"/tmp/foo-123456.o",92000,84000,87536ld,"a.out",900,8000,53568

The data on each row represent:

file name of the tool executable,
output file name in quotes,
total execution time in microseconds,
execution time in user mode in microseconds,
peak memory usage in Kb.

It is possible to specify this option without any value. In this case statisticsare printed on standard output in human readable format:

$ clang -fproc-stat-report foo.cclang-11: output=/tmp/foo-855a8e.o, total=68.000 ms, user=60.000 ms, mem=86920 Kbld: output=a.out, total=8.000 ms, user=4.000 ms, mem=52320 Kb

The report file specified in the option is locked for write, so this optioncan be used to collect statistics in parallel builds. The report file is notcleared, new data is appended to it, thus making possible to accumulate buildstatistics.

You can also use environment variables to control the process statistics reporting.Setting CC_PRINT_PROC_STAT to 1 enables the feature, the report goes tostdout in human readable format.Setting CC_PRINT_PROC_STAT_FILE to a fully qualified file path makes it reportprocess statistics to the given file in the CSV format. Specifying a relativepath will likely lead to multiple files with the same name created in differentdirectories, since the path is relative to a changing working directory.

These environment variables are handy when you need to request the statisticsreport without changing your build scripts or alter the existing set of compileroptions. Note that -fproc-stat-report take precedence over CC_PRINT_PROC_STATand CC_PRINT_PROC_STAT_FILE.

$ export CC_PRINT_PROC_STAT=1$ export CC_PRINT_PROC_STAT_FILE=~/project-build-proc-stat.csv$ make

Other Options¶

Clang options that don’t fit neatly into other categories.

-fgnuc-version=¶: This flag controls the value of __GNUC__ and related macros. This flagdoes not enable or disable any GCC extensions implemented in Clang. Settingthe version to zero causes Clang to leave __GNUC__ and otherGNU-namespaced macros, such as __GXX_WEAK__, undefined.

-MV¶

When emitting a dependency file, use formatting conventions appropriatefor NMake or Jom. Ignored unless another option causes Clang to emit adependency file.

When Clang emits a dependency file (e.g., you supplied the -M option)most filenames can be written to the file without any special formatting.Different Make tools will treat different sets of characters as “special”and use different conventions for telling the Make tool that the characteris actually part of the filename. Normally Clang uses backslash to “escape”a special character, which is the convention used by GNU Make. The -MVoption tells Clang to put double-quotes around the entire filename, whichis the convention used by NMake and Jom.

-femit-dwarf-unwind=<value>¶

When to emit DWARF unwind (EH frame) info. This is a Mach-O-specific option.

Valid values are:

no-compact-unwind - Only emit DWARF unwind when compact unwind encodingsaren’t available. This is the default for arm64.
always - Always emit DWARF unwind regardless.
default - Use the platform-specific default (always for allnon-arm64-platforms).

no-compact-unwind is a performance optimization – Clang will emit smallerobject files that are more quickly processed by the linker. This may causebinary compatibility issues on older x86_64 targets, however, so use it withcaution.

-fdisable-block-signature-string¶: Instruct clang not to emit the signature string for blocks. Disabling thestring can potentially break existing code that relies on it. Users shouldcarefully consider this possibiilty when using the flag.

Configuration files¶

Configuration files group command-line options and allow all of them to bespecified just by referencing the configuration file. They may be used, forexample, to collect options required to tune compilation for particulartarget, such as -L, -I, -l, --sysroot, codegen options, etc.

Configuration files can be either specified on the command line or loadedfrom default locations. If both variants are present, the default configurationfiles are loaded first.

The command line option --config= can be used to specify explicitconfiguration files in a Clang invocation. If the option is used multiple times,all specified files are loaded, in order. For example:

clang --config=/home/user/cfgs/testing.txtclang --config=debug.cfg --config=runtimes.cfg

Language and Target-Independent Features¶

Controlling Errors and Warnings¶

Clang provides a number of ways to control which code constructs causeit to emit errors and warning messages, and how they are displayed tothe console.

Controlling How Clang Displays Diagnostics¶

When Clang emits a diagnostic, it includes rich information in theoutput, and gives you fine-grain control over which information isprinted. Clang has the ability to print this information, and these arethe options that control it:

A file/line/column indicator that shows exactly where the diagnosticoccurs in your code [-fshow-column,-fshow-source-location].
A categorization of the diagnostic as a note, warning, error, orfatal error.
A text string that describes what the problem is.
An option that indicates how to control the diagnostic (fordiagnostics that support it)[-fdiagnostics-show-option].
A high-level category for the diagnosticfor clients that want to group diagnostics by class (for diagnosticsthat support it)[-fdiagnostics-show-category].
The line of source code that the issue occurs on, along with a caretand ranges that indicate the important locations[-fcaret-diagnostics].
“FixIt” information, which is a concise explanation of how to fix theproblem (when Clang is certain it knows)[-fdiagnostics-fixit-info].
A machine-parsable representation of the ranges involved (off bydefault)[-fdiagnostics-print-source-range-info].

For more information please see Formatting ofDiagnostics.

Diagnostic Mappings¶

All diagnostics are mapped into one of these 6 classes:

Ignored
Note
Remark
Warning
Error
Fatal

Diagnostic Categories¶

Though not shown by default, diagnostics may each be associated with ahigh-level category. This category is intended to make it possible totriage builds that produce a large number of errors or warnings in agrouped way.

Categories are not shown by default, but they can be turned on with the-fdiagnostics-show-category option.When set to “name”, the category is printed textually in thediagnostic output. When it is set to “id”, a category number isprinted. The mapping of category names to category id’s can be obtainedby running ‘clang --print-diagnostic-categories’.

Controlling Diagnostics via Command Line Flags¶

TODO: -W flags, -pedantic, etc

Controlling Diagnostics via Pragmas¶

Clang can also control what diagnostics are enabled through the use ofpragmas in the source code. This is useful for turning off specificwarnings in a section of source code. Clang supports GCC’s pragma forcompatibility with existing source code, so #pragma GCC diagnosticand #pragma clang diagnostic are synonyms for Clang. GCC will ignore#pragma clang diagnostic, though.

The pragma may control any warning that can be used from the commandline. Warnings may be set to ignored, warning, error, or fatal. Thefollowing example code will tell Clang or GCC to ignore the -Wallwarnings:

#pragma GCC diagnostic ignored "-Wall"

Clang also allows you to push and pop the current warning state. This isparticularly useful when writing a header file that will be compiled byother people, because you don’t know what warning flags they build with.

In the below example -Wextra-tokens is ignored for only a single lineof code, after which the diagnostics return to whatever state had previouslyexisted.

#if foo#endif foo // warning: extra tokens at end of #endif directive#pragma GCC diagnostic push#pragma GCC diagnostic ignored "-Wextra-tokens"#if foo#endif foo // no warning#pragma GCC diagnostic pop

The push and pop pragmas will save and restore the full diagnostic stateof the compiler, regardless of how it was set. It should be noted that while Clangsupports the GCC pragma, Clang and GCC do not support the exact same setof warnings, so even when using GCC compatible #pragmas there is noguarantee that they will have identical behaviour on both compilers.

Clang also doesn’t yet support GCC behavior for #pragma diagnostic popthat doesn’t have a corresponding #pragma diagnostic push. In this caseGCC pretends that there is a #pragma diagnostic push at the very beginningof the source file, so “unpaired” #pragma diagnostic pop matches thatimplicit push. This makes a difference for #pragma GCC diagnostic ignoredwhich are not guarded by push and pop. Refer toGCC documentationfor details.

Like GCC, Clang accepts ignored, warning, error, and fatalseverity levels. They can be used to change severity of a particular diagnosticfor a region of source file. A notable difference from GCC is that diagnosticnot enabled via command line arguments can’t be enabled this way yet.

Some diagnostics associated with a -W flag have the error severity bydefault. They can be ignored or downgraded to warnings:

// C only#pragma GCC diagnostic warning "-Wimplicit-function-declaration"int main(void) { puts(""); }

In addition to controlling warnings and errors generated by the compiler, it ispossible to generate custom warning and error messages through the followingpragmas:

// The following will produce warning messages#pragma message "some diagnostic message"#pragma GCC warning "TODO: replace deprecated feature"// The following will produce an error message#pragma GCC error "Not supported"

These pragmas operate similarly to the #warning and #error preprocessordirectives, except that they may also be embedded into preprocessor macros viathe C99 _Pragma operator, for example:

#define STR(X) #X#define DEFER(M,...) M(__VA_ARGS__)#define CUSTOM_ERROR(X) _Pragma(STR(GCC error(X " at line " DEFER(STR,__LINE__))))CUSTOM_ERROR("Feature not available");

Controlling Diagnostics in System Headers¶

Warnings are suppressed when they occur in system headers. By default,an included file is treated as a system header if it is found in aninclude path specified by -isystem, but this can be overridden inseveral ways.

The system_header pragma can be used to mark the current file asbeing a system header. No warnings will be produced from the location ofthe pragma onwards within the same file.

#if foo#endif foo // warning: extra tokens at end of #endif directive#pragma clang system_header#if foo#endif foo // no warning

The –system-header-prefix= and –no-system-header-prefix=command-line arguments can be used to override whether subsets of an includepath are treated as system headers. When the name in a #include directiveis found within a header search path and starts with a system prefix, theheader is treated as a system header. The last prefix on thecommand-line which matches the specified header name takes precedence.For instance:

$ clang -Ifoo -isystem bar --system-header-prefix=x/ \ --no-system-header-prefix=x/y/

Here, #include "x/a.h" is treated as including a system header, evenif the header is found in foo, and #include "x/y/b.h" is treatedas not including a system header, even if the header is found inbar.

A #include directive which finds a file relative to the currentdirectory is treated as including a system header if the including fileis treated as a system header.

Controlling Deprecation Diagnostics in Clang-Provided C Runtime Headers¶

Clang is responsible for providing some of the C runtime headers that cannot beprovided by a platform CRT, such as implementation limits or when compiling infreestanding mode. Define the _CLANG_DISABLE_CRT_DEPRECATION_WARNINGS macroprior to including such a C runtime header to disable the deprecation warnings.Note that the C Standard Library headers are allowed to transitively includeother standard library headers (see 7.1.2p5), and so the most appropriate useof this macro is to set it within the build system using -D or before anyinclude directives in the translation unit.

#define _CLANG_DISABLE_CRT_DEPRECATION_WARNINGS#include <stdint.h> // Clang CRT deprecation warnings are disabled.#include <stdatomic.h> // Clang CRT deprecation warnings are disabled.

Enabling All Diagnostics¶

In addition to the traditional -W flags, one can enable all diagnosticsby passing -Weverything. This works as expected with-Werror, and also includes the warnings from -pedantic. Somediagnostics contradict each other, therefore, users of -Weverythingoften disable many diagnostics such as -Wno-c++98-compat and -Wno-c++-compatbecause they contradict recent C++ standards.

Since -Weverything enables every diagnostic, we generally don’trecommend using it. -Wall -Wextra are a better choice for most projects.Using -Weverything means that updating your compiler is more difficultbecause you’re exposed to experimental diagnostics which might be of lowerquality than the default ones. If you do use -Weverything then weadvise that you address all new compiler diagnostics as they get added to Clang,either by fixing everything they find or explicitly disabling that diagnosticwith its corresponding Wno- option.

Note that when combined with -w (which disables all warnings),disabling all warnings wins.

Controlling Static Analyzer Diagnostics¶

While not strictly part of the compiler, the diagnostics from Clang’sstatic analyzer can also beinfluenced by the user via changes to the source code. See the availableannotations and theanalyzer’s FAQpage for moreinformation.

Precompiled Headers¶

Precompiled headersare a general approach employed by many compilers to reduce compilationtime. The underlying motivation of the approach is that it is common forthe same (and often large) header files to be included by multiplesource files. Consequently, compile times can often be greatly improvedby caching some of the (redundant) work done by a compiler to processheaders. Precompiled header files, which represent one of many ways toimplement this optimization, are literally files that represent anon-disk cache that contains the vital information necessary to reducesome of the work needed to process a corresponding header file. Whiledetails of precompiled headers vary between compilers, precompiledheaders have been shown to be highly effective at speeding up programcompilation on systems with very large system headers (e.g., macOS).

Generating a PCH File¶

To generate a PCH file using Clang, one invokes Clang with the-x <language>-header option. This mirrors the interface in GCCfor generating PCH files:

$ gcc -x c-header test.h -o test.h.gch$ clang -x c-header test.h -o test.h.pch

Using a PCH File¶

A PCH file can then be used as a prefix header when a -include-pchoption is passed to clang:

$ clang -include-pch test.h.pch test.c -o test

The clang driver will check if the PCH file test.h.pch isavailable; if so, the contents of test.h (and the files it includes)will be processed from the PCH file. Otherwise, Clang will report an error.

Note

Clang does not automatically use PCH files for headers that are directlyincluded within a source file or indirectly via -include.For example:

$ clang -x c-header test.h -o test.h.pch$ cat test.c#include "test.h"$ clang test.c -o test

In this example, clang will not automatically use the PCH file fortest.h since test.h was included directly in the source file and notspecified on the command line using -include-pch.

Relocatable PCH Files¶

It is sometimes necessary to build a precompiled header from headersthat are not yet in their final, installed locations. For example, onemight build a precompiled header within the build tree that is thenmeant to be installed alongside the headers. Clang permits the creationof “relocatable” precompiled headers, which are built with a given path(into the build directory) and can later be used from an installedlocation.

To build a relocatable precompiled header, place your headers into asubdirectory whose structure mimics the installed location. For example,if you want to build a precompiled header for the header mylib.hthat will be installed into /usr/include, create a subdirectorybuild/usr/include and place the header mylib.h into thatsubdirectory. If mylib.h depends on other headers, then they can bestored within build/usr/include in a way that mimics the installedlocation.

Building a relocatable precompiled header requires two additionalarguments. First, pass the --relocatable-pch flag to indicate thatthe resulting PCH file should be relocatable. Second, pass-isysroot /path/to/build, which makes all includes for your libraryrelative to the build directory. For example:

# clang -x c-header --relocatable-pch -isysroot /path/to/build /path/to/build/mylib.h mylib.h.pch

When loading the relocatable PCH file, the various headers used in thePCH file are found from the system header root. For example, mylib.hcan be found in /usr/include/mylib.h. If the headers are installedin some other system root, the -isysroot option can be used providea different system root from which the headers will be based. Forexample, -isysroot /Developer/SDKs/MacOSX10.4u.sdk will look formylib.h in /Developer/SDKs/MacOSX10.4u.sdk/usr/include/mylib.h.

Relocatable precompiled headers are intended to be used in a limitednumber of cases where the compilation environment is tightly controlledand the precompiled header cannot be generated after headers have beeninstalled.

Controlling Floating Point Behavior¶

Clang provides a number of ways to control floating point behavior, includingwith command line options and source pragmas. This sectiondescribes the various floating point semantic modes and the corresponding options.

Floating Point Semantic Modes¶
Mode	Values
ffp-exception-behavior	{ignore, strict, maytrap}
fenv_access	{off, on}	(none)
frounding-math	{dynamic, tonearest, downward, upward, towardzero}
ffp-contract	{on, off, fast, fast-honor-pragmas}
fdenormal-fp-math	{IEEE, PreserveSign, PositiveZero}
fdenormal-fp-math-fp32	{IEEE, PreserveSign, PositiveZero}
fmath-errno	{on, off}
fhonor-nans	{on, off}
fhonor-infinities	{on, off}
fsigned-zeros	{on, off}
freciprocal-math	{on, off}
fallow-approximate-fns	{on, off}
fassociative-math	{on, off}
fcomplex-arithmetic	{basic, improved, full, promoted}

This table describes the option settings that correspond to the threefloating point semantic models: precise (the default), strict, and fast.

Floating Point Models¶
Mode	Precise	Strict	Fast	Aggressive
except_behavior	ignore	strict	ignore	ignore
fenv_access	off	on	off	off
rounding_mode	tonearest	dynamic	tonearest	tonearest
contract	on	off	fast	fast
support_math_errno	on	on	off	off
no_honor_nans	off	off	off	on
no_honor_infinities	off	off	off	on
no_signed_zeros	off	off	on	on
allow_reciprocal	off	off	on	on
allow_approximate_fns	off	off	on	on
allow_reassociation	off	off	on	on
complex_arithmetic	full	full	promoted	basic

The -ffp-model option does not modify the fdenormal-fp-mathsetting, but it does have an impact on whether crtfastmath.o islinked. Because linking crtfastmath.o has a global effect on theprogram, and because the global denormal handling can be changed inother ways, the state of fdenormal-fp-math handling cannotbe assumed in any function based on fp-model. See A note about crtfastmath.ofor more details.

-ffast-math¶

Enable fast-math mode. This option lets thecompiler make aggressive, potentially-lossy assumptions aboutfloating-point math. These include:

Floating-point math obeys regular algebraic rules for real numbers (e.g.+ and * are associative, x/y == x * (1/y), and(a + b) * c == a * c + b * c),
No NaN or infinite values will be operands or results offloating-point operations,
+0 and -0 may be treated as interchangeable.

-ffast-math also defines the __FAST_MATH__ preprocessormacro. Some math libraries recognize this macro and change their behavior.With the exception of -ffp-contract=fast, using any of the optionsbelow to disable any of the individual optimizations in -ffast-mathwill cause __FAST_MATH__ to no longer be set.-ffast-math enables -fcx-limited-range.

This option implies:

-fno-honor-infinities
-fno-honor-nans
-fapprox-func
-fno-math-errno
-ffinite-math-only
-fassociative-math
-freciprocal-math
-fno-signed-zeros
-fno-trapping-math
-fno-rounding-math
-ffp-contract=fast

Note: -ffast-math causes crtfastmath.o to be linked with code unless-shared or -mno-daz-ftz is present. SeeA note about crtfastmath.o for more details.

-fno-fast-math¶

Disable fast-math mode. This options disables unsafe floating-pointoptimizations by preventing the compiler from making any transformations thatcould affect the results.

This option implies:

-fhonor-infinities
-fhonor-nans
-fno-approx-func
-fno-finite-math-only
-fno-associative-math
-fno-reciprocal-math
-fsigned-zeros
-ffp-contract=on

Also, this option resets following options to their target-dependent defaults.

-f[no-]math-errno

There is ambiguity about how -ffp-contract, -ffast-math,and -fno-fast-math behave when combined. To keep the value of-ffp-contract consistent, we define this set of rules:

-ffast-math sets ffp-contract to fast.
-fno-fast-math sets -ffp-contract to on (fast for CUDA andHIP).
If -ffast-math and -ffp-contract are both seen, but-ffast-math is not followed by -fno-fast-math, ffp-contractwill be given the value of whichever option was last seen.
If -fno-fast-math is seen and -ffp-contract has been seen at leastonce, the ffp-contract will get the value of the last seen value of-ffp-contract.
If -fno-fast-math is seen and -ffp-contract has not been seen, the-ffp-contract setting is determined by the default value of-ffp-contract.

Note: -fno-fast-math causes crtfastmath.o to not be linked with codeunless -mdaz-ftz is present.

-fdenormal-fp-math=<value>¶

Select which denormal numbers the code is permitted to require.

Valid values are:

ieee - IEEE 754 denormal numbers
preserve-sign - the sign of a flushed-to-zero number is preserved in the sign of 0
positive-zero - denormals are flushed to positive zero

The default value depends on the target. For most targets, defaults toieee.

-f[no-]strict-float-cast-overflow¶: When a floating-point value is not representable in a destination integertype, the code has undefined behavior according to the language standard.By default, Clang will not guarantee any particular result in that case.With the ‘no-strict’ option, Clang will saturate towards the smallest andlargest representable integer values instead. NaNs will be converted to zero.Defaults to -fstrict-float-cast-overflow.

-f[no-]math-errno¶

Require math functions to indicate errors by setting errno.The default varies by ToolChain. -fno-math-errno allows optimizationsthat might cause standard C math functions to not set errno.For example, on some systems, the math function sqrt is specifiedas setting errno to EDOM when the input is negative. On thesesystems, the compiler cannot normally optimize a call to sqrt to useinline code (e.g. the x86 sqrtsd instruction) without additionalchecking to ensure that errno is set appropriately.-fno-math-errno permits these transformations.

On some targets, math library functions never set errno, and so-fno-math-errno is the default. This includes most BSD-derivedsystems, including Darwin.

-f[no-]trapping-math¶

Control floating point exception behavior. -fno-trapping-math allows optimizations that assume that floating point operations cannot generate traps such as divide-by-zero, overflow and underflow.

The option -ftrapping-math behaves identically to -ffp-exception-behavior=strict.
The option -fno-trapping-math behaves identically to -ffp-exception-behavior=ignore. This is the default.

-ffp-contract=<value>¶

Specify when the compiler is permitted to form fused floating-pointoperations, such as fused multiply-add (FMA). Fused operations arepermitted to produce more precise results than performing the sameoperations separately.

The C standard permits intermediate floating-point results within anexpression to be computed with more precision than their type wouldnormally allow. This permits operation fusing, and Clang takes advantageof this by default. This behavior can be controlled with the FP_CONTRACTand clang fp contract pragmas. Please refer to the pragma documentationfor a description of how the pragmas interact with this option.

Valid values are:

fast (fuse across statements disregarding pragmas, default for CUDA)
on (fuse in the same statement unless dictated by pragmas, default for languages other than CUDA/HIP)
off (never fuse)
fast-honor-pragmas (fuse across statements unless dictated by pragmas, default for HIP)

-f[no-]honor-infinities¶

Allow floating-point optimizations that assume arguments and results arenot +-Inf.Defaults to -fhonor-infinities.

If both -fno-honor-infinities and -fno-honor-nans are used,has the same effect as specifying -ffinite-math-only.

-f[no-]honor-nans¶

Allow floating-point optimizations that assume arguments and results arenot NaNs.Defaults to -fhonor-nans.

If both -fno-honor-infinities and -fno-honor-nans are used,has the same effect as specifying -ffinite-math-only.

-f[no-]approx-func¶: Allow certain math function calls (such as log, sqrt, pow, etc)to be replaced with an approximately equivalent set of instructionsor alternative math function calls. For example, a pow(x, 0.25)may be replaced with sqrt(sqrt(x)), despite being an inexact resultin cases where x is -0.0 or -inf.Defaults to -fno-approx-func.

-f[no-]signed-zeros¶: Allow optimizations that ignore the sign of floating point zeros.Defaults to -fsigned-zeros.

-f[no-]associative-math¶: Allow floating point operations to be reassociated.Defaults to -fno-associative-math.

-f[no-]reciprocal-math¶: Allow division operations to be transformed into multiplication by areciprocal. This can be significantly faster than an ordinary divisionbut can also have significantly less precision. Defaults to-fno-reciprocal-math.

-f[no-]unsafe-math-optimizations¶

Allow unsafe floating-point optimizations.-funsafe-math-optimizations also implies:

-fapprox-func
-fassociative-math
-freciprocal-math
-fno-signed-zeros
-fno-trapping-math
-ffp-contract=fast

-fno-unsafe-math-optimizations implies:

-fno-approx-func
-fno-associative-math
-fno-reciprocal-math
-fsigned-zeros
-ffp-contract=on

There is ambiguity about how -ffp-contract,-funsafe-math-optimizations, and -fno-unsafe-math-optimizationsbehave when combined. Explanation in -fno-fast-math also appliesto these options.

Defaults to -fno-unsafe-math-optimizations.

-f[no-]finite-math-only¶

Allow floating-point optimizations that assume arguments and results arenot NaNs or +-Inf. -ffinite-math-only defines the__FINITE_MATH_ONLY__ preprocessor macro.-ffinite-math-only implies:

-fno-honor-infinities
-fno-honor-nans

-ffno-inite-math-only implies:

-fhonor-infinities
-fhonor-nans

Defaults to -fno-finite-math-only.

-f[no-]rounding-math¶

Force floating-point operations to honor the dynamically-set rounding mode by default.

The result of a floating-point operation often cannot be exactly represented in the result type and therefore must be rounded. IEEE 754 describes different rounding modes that control how to perform this rounding, not all of which are supported by all implementations. C provides interfaces (fesetround and fesetenv) for dynamically controlling the rounding mode, and while it also recommends certain conventions for changing the rounding mode, these conventions are not typically enforced in the ABI. Since the rounding mode changes the numerical result of operations, the compiler must understand something about it in order to optimize floating point operations.

Note that floating-point operations performed as part of constant initialization are formally performed prior to the start of the program and are therefore not subject to the current rounding mode. This includes the initialization of global variables and local static variables. Floating-point operations in these contexts will be rounded using FE_TONEAREST.

The option -fno-rounding-math allows the compiler to assume that the rounding mode is set to FE_TONEAREST. This is the default.
The option -frounding-math forces the compiler to honor the dynamically-set rounding mode. This prevents optimizations which might affect results if the rounding mode changes or is different from the default; for example, it prevents floating-point operations from being reordered across most calls and prevents constant-folding when the result is not exactly representable.

-ffp-model=<value>¶

Specify floating point behavior. -ffp-model is an umbrellaoption that encompasses functionality provided by other, singlepurpose, floating point options. Valid values are: precise, strict,fast, and aggressive.Details:

precise Disables optimizations that are not value-safe onfloating-point data, although FP contraction (FMA) is enabled(-ffp-contract=on). This is the default behavior. This value resets-fmath-errno to its target-dependent default.
strict Enables -frounding-math and-ffp-exception-behavior=strict, and disables contractions (FMA). Allof the -ffast-math enablements are disabled. EnablesSTDC FENV_ACCESS: by default FENV_ACCESS is disabled. This optionsetting behaves as though #pragma STDC FENV_ACCESS ON appeared at thetop of the source file.
fast Behaves identically to specifying -funsafe-math-optimizations,-fno-math-errno and -fcomplex-arithmetic=promotedffp-contract=fast
aggressive Behaves identically to specifying both -ffast-math andffp-contract=fast

Note: If your command line specifies multiple instancesof the -ffp-model option, or if your command line option specifies-ffp-model and later on the command line selects a floating pointoption that has the effect of negating part of the ffp-model thathas been selected, then the compiler will issue a diagnostic warningthat the override has occurred.

-ffp-exception-behavior=<value>¶

Specify the floating-point exception behavior.

Valid values are: ignore, maytrap, and strict.The default value is ignore. Details:

ignore The compiler assumes that the exception status flags will not be read and that floating point exceptions will be masked.
maytrap The compiler avoids transformations that may raise exceptions that would not have been raised by the original code. Constant folding performed by the compiler is exempt from this option.
strict The compiler ensures that all transformations strictly preserve the floating point exception semantics of the original code.

-ffp-eval-method=<value>¶

Specify the floating-point evaluation method for intermediate results withina single expression of the code.

Valid values are: source, double, and extended.For 64-bit targets, the default value is source. For 32-bit x86 targetshowever, in the case of NETBSD 6.99.26 and under, the default value isdouble; in the case of NETBSD greater than 6.99.26, with NoSSE, thedefault value is extended, with SSE the default value is source.Details:

source The compiler uses the floating-point type declared in the source program as the evaluation method.
double The compiler uses double as the floating-point evaluation method for all float expressions of type that is narrower than double.
extended The compiler uses long double as the floating-point evaluation method for all float expressions of type that is narrower than long double.

-f[no-]protect-parens¶

This option pertains to floating-point types, complex types withfloating-point components, and vectors of these types. Some arithmeticexpression transformations that are mathematically correct and permissibleaccording to the C and C++ language standards may be incorrect when dealingwith floating-point types, such as reassociation and distribution. Further,the optimizer may ignore parentheses when computing arithmetic expressionsin circ*mstances where the parenthesized and unparenthesized expressionexpress the same mathematical value. For example (a+b)+c is the samemathematical value as a+(b+c), but the optimizer is free to evaluate theadditions in any order regardless of the parentheses. When enabled, thisoption forces the optimizer to honor the order of operations with respectto parentheses in all circ*mstances.Defaults to -fno-protect-parens.

Note that floating-point contraction (option -ffp-contract=) is disabledwhen -fprotect-parens is enabled. Also note that in safe floating-pointmodes, such as -ffp-model=precise or -ffp-model=strict, this optionhas no effect because the optimizer is prohibited from making unsafetransformations.

-fexcess-precision:¶

The C and C++ standards allow floating-point expressions to be computed as ifintermediate results had more precision (and/or a wider range) than the typeof the expression strictly allows. This is called excess precisionarithmetic.Excess precision arithmetic can improve the accuracy of results (although notalways), and it can make computation significantly faster if the target lacksdirect hardware support for arithmetic in a particular type. However, it canalso undermine strict floating-point reproducibility.

Under the standards, assignments and explicit casts force the operand to beconverted to its formal type, discarding any excess precision. Because datacan only flow between statements via an assignment, this means that the useof excess precision arithmetic is a reliable local property of a singlestatement, and results do not change based on optimization. However, whenexcess precision arithmetic is in use, Clang does not guarantee strictreproducibility, and future compiler releases may recognize moreopportunities to use excess precision arithmetic, e.g. with floating-pointbuiltins.

Clang does not use excess precision arithmetic for most types or on mosttargets. For example, even on pre-SSE X86 targets where float anddouble computations must be performed in the 80-bit X87 format, Clangrounds all intermediate results correctly for their type. Clang currentlyuses excess precision arithmetic by default only for the following types andtargets:

_Float16 on X86 targets without AVX512-FP16.

The -fexcess-precision=<value> option can be used to control the use ofexcess precision arithmetic. Valid values are:

standard - The default. Allow the use of excess precision arithmeticunder the constraints of the C and C++ standards. Has no effect except onthe types and targets listed above.
fast - Accepted for GCC compatibility, but currently treated as analias for standard.
16 - Forces _Float16 operations to be emitted without using excessprecision arithmetic.

-fcomplex-arithmetic=<value>:¶

This option specifies the implementation for complex multiplication and division.

Valid values are: basic, improved, full and promoted.

basic Implementation of complex division and multiplication usingalgebraic formulas at source precision. No special handling to avoidoverflow. NaN and infinite values are not handled.
improved Implementation of complex division using the Smith algorithmat source precision. Smith’s algorithm for complex division.See SMITH, R. L. Algorithm 116: Complex division. Commun. ACM 5, 8 (1962).This value offers improved handling for overflow in intermediatecalculations, but overflow may occur. NaN and infinite values are nothandled in some cases.
full Implementation of complex division and multiplication using acall to runtime library functions (generally the case, but the BE mightsometimes replace the library call if it knows enough about the potentialrange of the inputs). Overflow and non-finite values are handled by thelibrary implementation. For the case of multiplication overflow will occur inaccordance with normal floating-point rules. This is the default value.
promoted Implementation of complex division using algebraic formulas athigher precision. Overflow is handled. Non-finite values are handled in somecases. If the target does not have native support for a higher precisiondata type, the implementation for the complex operation using the Smithalgorithm will be used. Overflow may still occur in some cases. NaN andinfinite values are not handled.

-fcx-limited-range:¶: This option is aliased to -fcomplex-arithmetic=basic. It enables thenaive mathematical formulas for complex division and multiplication with noNaN checking of results. The default is -fno-cx-limited-range aliased to-fcomplex-arithmetic=full. This option is enabled by the -ffast-mathoption.

-fcx-fortran-rules:¶: This option is aliased to -fcomplex-arithmetic=improved. It enables thenaive mathematical formulas for complex multiplication and enables applicationof Smith’s algorithm for complex division. See SMITH, R. L. Algorithm 116:Complex division. Commun. ACM 5, 8 (1962).The default is -fno-cx-fortran-rules aliased to-fcomplex-arithmetic=full.

Accessing the floating point environment¶

Many targets allow floating point operations to be configured to control thingssuch as how inexact results should be rounded and how exceptional conditionsshould be handled. This configuration is called the floating point environment.C and C++ restrict access to the floating point environment by default, and thecompiler is allowed to assume that all operations are performed in the defaultenvironment. When code is compiled in this default mode, operations that dependon the environment (such as floating-point arithmetic and FLT_ROUNDS) may haveundefined behavior if the dynamic environment is not the default environment; forexample, FLT_ROUNDS may or may not simply return its default value for the targetinstead of reading the dynamic environment, and floating-point operations may beoptimized as if the dynamic environment were the default. Similarly, it is undefinedbehavior to change the floating point environment in this default mode, for exampleby calling the fesetround function.C provides two pragmas to allow code to dynamically modify the floating point environment:

#pragma STDC FENV_ACCESS ON allows dynamic changes to the entire floatingpoint environment.
#pragma STDC FENV_ROUND FE_DYNAMIC allows dynamic changes to just the floatingpoint rounding mode. This may be more optimizable than FENV_ACCESS ON becausethe compiler can still ignore the possibility of floating-point exceptions by default.

Both of these can be used either at the start of a block scope, in which casethey cover all code in that scope (unless they’re turned off in a child scope),or at the top level in a file, in which case they cover all subsequent functionbodies until they’re turned off. Note that it is undefined behavior to entercode that is not covered by one of these pragmas from code that is coveredby one of these pragmas unless the floating point environment has been restoredto its default state. See the C standard for more information about these pragmas.

The command line option -frounding-math behaves as if the translation unitbegan with #pragma STDC FENV_ROUND FE_DYNAMIC. The command line option-ffp-model=strict behaves as if the translation unit began with #pragma STDC FENV_ACCESS ON.

Code that just wants to use a specific rounding mode for specific floating pointoperations can avoid most of the hazards of the dynamic floating point environmentby using #pragma STDC FENV_ROUND with a value other than FE_DYNAMIC.

A note about crtfastmath.o¶

-ffast-math and -funsafe-math-optimizations without the -sharedoption cause crtfastmath.o to beautomatically linked, which adds a static constructor that sets the FTZ/DAZbits in MXCSR, affecting not only the current compilation unit but all staticand shared libraries included in the program. This decision can be overriddenby using either the flag -mdaz-ftz or -mno-daz-ftz to respectivelylink or not link crtfastmath.o.

A note about __FLT_EVAL_METHOD__¶

The __FLT_EVAL_METHOD__ is not defined as a traditional macro, and so itwill not appear when dumping preprocessor macros. Instead, the value__FLT_EVAL_METHOD__ expands to is determined at the point of expansioneither from the value set by the -ffp-eval-method command line option orfrom the target. This is because the __FLT_EVAL_METHOD__ macrocannot expand to the correct evaluation method in the presence of a #pragmawhich alters the evaluation method. An error is issued if__FLT_EVAL_METHOD__ is expanded inside a scope modified by#pragma clang fp eval_method.

A note about Floating Point Constant Evaluation¶

In C, the only place floating point operations are guaranteed to be evaluatedduring translation is in the initializers of variables of static storageduration, which are all notionally initialized before the program beginsexecuting (and thus before a non-default floating point environment can beentered). But C++ has many more contexts where floating point constantevaluation occurs. Specifically: for static/thread-local variables,first try evaluating the initializer in a constant context, including in theconstant floating point environment (just like in C), and then, if that fails,fall back to emitting runtime code to perform the initialization (which mightin general be in a different floating point environment).

Consider this example when compiled with -frounding-math

constexpr float func_01(float x, float y) { return x + y;}float V1 = func_01(1.0F, 0x0.000001p0F);

The C++ rule is that initializers for static storage duration variables arefirst evaluated during translation (therefore, in the default rounding mode),and only evaluated at runtime (and therefore in the runtime rounding mode) ifthe compile-time evaluation fails. This is in line with the C rules;C11 F.8.5 says: All computation for automatic initialization is done (as if)at execution time; thus, it is affected by any operative modes and raisesfloating-point exceptions as required by IEC 60559 (provided the state for theFENV_ACCESS pragma is ‘‘on’’). All computation for initialization of objectsthat have static or thread storage duration is done (as if) at translationtime. C++ generalizes this by adding another phase of initialization(at runtime) if the translation-time initialization fails, but thetranslation-time evaluation of the initializer of succeeds, it will betreated as a constant initializer.

Controlling Code Generation¶

Clang provides a number of ways to control code generation. The optionsare listed below.

-f[no-]sanitize=check1,check2,...¶

Turn on runtime checks for various forms of undefined or suspiciousbehavior.

This option controls whether Clang adds runtime checks for variousforms of undefined or suspicious behavior, and is disabled bydefault. If a check fails, a diagnostic message is produced atruntime explaining the problem. The main checks are:

-fsanitize=address:AddressSanitizer, a memory errordetector.
-fsanitize=thread: ThreadSanitizer, a data race detector.
-fsanitize=memory: MemorySanitizer,a detector of uninitialized reads. Requires instrumentation of allprogram code.
-fsanitize=undefined: UndefinedBehaviorSanitizer,a fast and compatible undefined behavior checker.
-fsanitize=dataflow: DataFlowSanitizer, a general dataflow analysis.
-fsanitize=cfi: control flow integritychecks. Requires -flto.
-fsanitize=kcfi: kernel indirect call forward-edge control flowintegrity.
-fsanitize=safe-stack: safe stackprotection against stack-based memory corruption errors.
-fsanitize=realtime: RealtimeSanitizer,a real-time safety checker.

There are more fine-grained checks available: seethe list of specific kinds ofundefined behavior that can be detected and the listof control flow integrity schemes.

The -fsanitize= argument must also be provided when linking, inorder to link to the appropriate runtime library.

It is not possible to combine more than one of the -fsanitize=address,-fsanitize=thread, and -fsanitize=memory checkers in the sameprogram.

-f[no-]sanitize-recover=check1,check2,...¶

-f[no-]sanitize-recover[=all]¶

Controls which checks enabled by -fsanitize= flag are non-fatal.If the check is fatal, program will halt after the first errorof this kind is detected and error report is printed.

By default, non-fatal checks are those enabled byUndefinedBehaviorSanitizer,except for -fsanitize=return and -fsanitize=unreachable. Somesanitizers may not support recovery (or not support it by defaulte.g. AddressSanitizer), and always crash the program after the issueis detected.

Note that the -fsanitize-trap flag has precedence over this flag.This means that if a check has been configured to trap elsewhere on thecommand line, or if the check traps by default, this flag will not haveany effect unless that sanitizer’s trapping behavior is disabled with-fno-sanitize-trap.

For example, if a command line contains the flags -fsanitize=undefined-fsanitize-trap=undefined, the flag -fsanitize-recover=alignmentwill have no effect on its own; it will need to be accompanied by-fno-sanitize-trap=alignment.

-f[no-]sanitize-trap=check1,check2,...¶

-f[no-]sanitize-trap[=all]¶

Controls which checks enabled by the -fsanitize= flag trap. Thisoption is intended for use in cases where the sanitizer runtime cannotbe used (for instance, when building libc or a kernel module), or wherethe binary size increase caused by the sanitizer runtime is a concern.

This flag is only compatible with control flow integrity schemes and UndefinedBehaviorSanitizerchecks other than vptr.

This flag is enabled by default for sanitizers in the cfi group.

-fsanitize-ignorelist=/path/to/ignorelist/file¶: Disable or modify sanitizer checks for objects (source files, functions,variables, types) listed in the file. SeeSanitizer special case list for file format description.

-fno-sanitize-ignorelist¶: Don’t use ignorelist file, if it was specified earlier in the command line.

-f[no-]sanitize-coverage=[type,features,...]¶: Enable simple code coverage in addition to certain sanitizers.See SanitizerCoverage for more details.

-f[no-]sanitize-address-outline-instrumentation¶

Controls how address sanitizer code is generated. If enabled will always usea function call instead of inlining the code. Turning this option on couldreduce the binary size, but might result in a worse run-time performance.

See :doc: AddressSanitizer for more details.

-f[no-]sanitize-stats¶: Enable simple statistics gathering for the enabled sanitizers.See SanitizerStats for more details.

-fsanitize-undefined-trap-on-error¶: Deprecated alias for -fsanitize-trap=undefined.

-fsanitize-cfi-cross-dso¶: Enable cross-DSO control flow integrity checks. This flag modifiesthe behavior of sanitizers in the cfi group to allow checkingof cross-DSO virtual and indirect calls.

-fsanitize-cfi-icall-generalize-pointers¶: Generalize pointers in return and argument types in function type signatureschecked by Control Flow Integrity indirect call checking. SeeControl Flow Integrity for more details.

-fsanitize-cfi-icall-experimental-normalize-integers¶

Normalize integers in return and argument types in function type signatureschecked by Control Flow Integrity indirect call checking. SeeControl Flow Integrity for more details.

This option is currently experimental.

-fstrict-vtable-pointers¶: Enable optimizations based on the strict rules for overwriting polymorphicC++ objects, i.e. the vptr is invariant during an object’s lifetime.This enables better devirtualization. Turned off by default, because it isstill experimental.

-fwhole-program-vtables¶: Enable whole-program vtable optimizations, such as single-implementationdevirtualization and virtual constant propagation, for classes withhidden LTO visibility. Requires -flto.

-f[no]split-lto-unit¶

Controls splitting the LTO unit into regular LTO andThinLTO portions, when compiling with -flto=thin. Defaults to falseunless -fsanitize=cfi or -fwhole-program-vtables are specified, inwhich case it defaults to true. Splitting is required with fsanitize=cfi,and it is an error to disable via -fno-split-lto-unit. Splitting isoptional with -fwhole-program-vtables, however, it enables moreaggressive whole program vtable optimizations (specifically virtual constantpropagation).

When enabled, vtable definitions and select virtual functions are placedin the split regular LTO module, enabling more aggressive whole programvtable optimizations required for CFI and virtual constant propagation.However, this can increase the LTO link time and memory requirements overpure ThinLTO, as all split regular LTO modules are merged and LTO linkedwith regular LTO.

-fforce-emit-vtables¶: In order to improve devirtualization, forces emitting of vtables even inmodules where it isn’t necessary. It causes more inline virtual functionsto be emitted.

-fno-assume-sane-operator-new¶

Don’t assume that the C++’s new operator is sane.

This option tells the compiler to do not assume that C++’s globalnew operator will always return a pointer that does not alias anyother pointer when the function returns.

-fassume-nothrow-exception-dtor¶

Assume that an exception object’ destructor will not throw, and generateless code for catch handlers. A throw expression of a type with apotentially-throwing destructor will lead to an error.

By default, Clang assumes that the exception object may have a throwingdestructor. For the Itanium C++ ABI, Clang generates a landing pad todestroy local variables and call _Unwind_Resume for the codecatch (...) { ... }. This option tells Clang that an exception object’sdestructor will not throw and code simplification is possible.

-ftrap-function=[name]¶

Instruct code generator to emit a function call to the specifiedfunction name for __builtin_trap().

LLVM code generator translates __builtin_trap() to a trapinstruction if it is supported by the target ISA. Otherwise, thebuiltin is translated into a call to abort. If this option isset, then the code generator will always lower the builtin to a callto the specified function regardless of whether the target ISA has atrap instruction. This option is useful for environments (e.g.deeply embedded) where a trap cannot be properly handled, or whensome custom behavior is desired.

-ftls-model=[model]¶

Select which TLS model to use.

Valid values are: global-dynamic, local-dynamic,initial-exec and local-exec. The default value isglobal-dynamic. The compiler may use a different model if theselected model is not supported by the target, or if a moreefficient model can be used. The TLS model can be overridden pervariable using the tls_model attribute.

-femulated-tls¶

Select emulated TLS model, which overrides all -ftls-model choices.

In emulated TLS mode, all access to TLS variables are converted tocalls to __emutls_get_address in the runtime library.

-mhwdiv=[values]¶

Select the ARM modes (arm or thumb) that support hardware divisioninstructions.

Valid values are: arm, thumb and arm,thumb.This option is used to indicate which mode (arm or thumb) supportshardware division instructions. This only applies to the ARMarchitecture.

-m[no-]crc¶

Enable or disable CRC instructions.

This option is used to indicate whether CRC instructions are tobe generated. This only applies to the ARM architecture.

CRC instructions are enabled by default on ARMv8.

-mgeneral-regs-only¶

Generate code which only uses the general purpose registers.

This option restricts the generated code to use general registersonly. This only applies to the AArch64 architecture.

-mcompact-branches=[values]¶

Control the usage of compact branches for MIPSR6.

Valid values are: never, optimal and always.The default value is optimal which generates compact brancheswhen a delay slot cannot be filled. never disables the usage ofcompact branches and always generates compact branches wheneverpossible.

-f[no-]max-type-align=[number]¶

Instruct the code generator to not enforce a higher alignment than the givennumber (of bytes) when accessing memory via an opaque pointer or reference.This cap is ignored when directly accessing a variable or when the pointeetype has an explicit “aligned” attribute.

The value should usually be determined by the properties of the system allocator.Some builtin types, especially vector types, have very high natural alignments;when working with values of those types, Clang usually wants to use instructionsthat take advantage of that alignment. However, many system allocators donot promise to return memory that is more than 8-byte or 16-byte-aligned. Usethis option to limit the alignment that the compiler can assume for an arbitrarypointer, which may point onto the heap.

This option does not affect the ABI alignment of types; the layout of structs andunions and the value returned by the alignof operator remain the same.

This option can be overridden on a case-by-case basis by putting an explicit“aligned” alignment on a struct, union, or typedef. For example:

#include <immintrin.h>// Make an aligned typedef of the AVX-512 16-int vector type.typedef __v16si __aligned_v16si __attribute__((aligned(64)));void initialize_vector(__aligned_v16si *v) { // The compiler may assume that ‘v’ is 64-byte aligned, regardless of the // value of -fmax-type-align.}

-faddrsig, -fno-addrsig¶: Controls whether Clang emits an address-significance table into the objectfile. Address-significance tables allow linkers to implement safe ICF without the falsepositives that can result from other implementation techniques such asrelocation scanning. Address-significance tables are enabled by defaulton ELF targets when using the integrated assembler. This flag currentlyonly has an effect on ELF targets.

-f[no]-unique-internal-linkage-names¶

Controls whether Clang emits a unique (best-effort) symbol name for internallinkage symbols. When this option is set, compiler hashes the main sourcefile path from the command line and appends it to all internal symbols. If aprogram contains multiple objects compiled with the same command-line sourcefile path, the symbols are not guaranteed to be unique. This option isparticularly useful in attributing profile information to the correctfunction when multiple functions with the same private linkage name existin the binary.

It should be noted that this option cannot guarantee uniqueness and thefollowing is an example where it is not unique when two modules containsymbols with the same private linkage name:

$ cd $P/foo && clang -c -funique-internal-linkage-names name_conflict.c$ cd $P/bar && clang -c -funique-internal-linkage-names name_conflict.c$ cd $P && clang foo/name_conflict.o && bar/name_conflict.o

-fbasic-block-sections=[labels, all, list=<arg>, none]¶

Controls how Clang emits text sections for basic blocks. With values alland list=<arg>, each basic block or a subset of basic blocks can be placedin its own unique section. With the “labels” value, normal text sections areemitted, but a .bb_addr_map section is emitted which includes addressoffsets for each basic block in the program, relative to the parent functionaddress.

With the list=<arg> option, a file containing the subset of basic blocksthat need to placed in unique sections can be specified. The format of thefile is as follows. For example, list=spec.txt where spec.txt is thefollowing:

Profile Guided Optimization¶

Profile information enables better optimization. For example, knowing that abranch is taken very frequently helps the compiler make better decisions whenordering basic blocks. Knowing that a function foo is called morefrequently than another function bar helps the inliner. Optimizationlevels -O2 and above are recommended for use of profile guided optimization.

Clang supports profile guided optimization with two different kinds ofprofiling. A sampling profiler can generate a profile with very low runtimeoverhead, or you can build an instrumented version of the code that collectsmore detailed profile information. Both kinds of profiles can provide executioncounts for instructions in the code and information on branches taken andfunction invocation.

Regardless of which kind of profiling you use, be careful to collect profilesby running your code with inputs that are representative of the typicalbehavior. Code that is not exercised in the profile will be optimized as if itis unimportant, and the compiler may make poor optimization choices for codethat is disproportionately used while profiling.

Differences Between Sampling and Instrumentation¶

Although both techniques are used for similar purposes, there are importantdifferences between the two:

Profile data generated with one cannot be used by the other, and there is noconversion tool that can convert one to the other. So, a profile generatedvia -fprofile-generate or -fprofile-instr-generate must be used with-fprofile-use or -fprofile-instr-use. Similarly, sampling profilesgenerated by external profilers must be converted and used with -fprofile-sample-useor -fauto-profile.
Instrumentation profile data can be used for code coverage analysis andoptimization.
Sampling profiles can only be used for optimization. They cannot be used forcode coverage analysis. Although it would be technically possible to usesampling profiles for code coverage, sample-based profiles are toocoarse-grained for code coverage purposes; it would yield poor results.
Sampling profiles must be generated by an external tool. The profilegenerated by that tool must then be converted into a format that can be readby LLVM. The section on sampling profilers describes one of the supportedsampling profile formats.

Using Sampling Profilers¶

Sampling profilers are used to collect runtime information, such ashardware counters, while your application executes. They are typicallyvery efficient and do not incur a large runtime overhead. Thesample data collected by the profiler can be used during compilationto determine what the most executed areas of the code are.

Using the data from a sample profiler requires some changes in the waya program is built. Before the compiler can use profiling information,the code needs to execute under the profiler. The following is theusual build cycle when using sample profilers for optimization:

Build the code with source line table information. You can use all theusual build flags that you always build your application with. The onlyrequirement is that DWARF debug info including source line information isgenerated. This DWARF information is important for the profiler to be ableto map instructions back to source line locations. The usefulness of thisDWARF information can be improved with the -fdebug-info-for-profilingand -funique-internal-linkage-names options.
On Linux:
```
$ clang++ -O2 -gline-tables-only \ -fdebug-info-for-profiling -funique-internal-linkage-names \ code.cc -o code
```
While MSVC-style targets default to CodeView debug information, DWARF debuginformation is required to generate source-level LLVM profiles. Use-gdwarf to include DWARF debug information:
```
> clang-cl /O2 -gdwarf -gline-tables-only ^  /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^  code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf
```

Note

-funique-internal-linkage-namesgenerates unique names based on given command-line source file paths. Ifyour build system uses absolute source paths and these paths may changebetween steps 1 and 4, then the uniqued function names may change and resultin unused profile data. Consider omitting this option in such cases.

Run the executable under a sampling profiler. The specific profileryou use does not really matter, as long as its output can be convertedinto the format that the LLVM optimizer understands.
Two such profilers are the Linux Perf profiler(https://perf.wiki.kernel.org/) and Intel’s Sampling Enabling Product (SEP),available as part of Intel VTune.While Perf is Linux-specific, SEP can be used on Linux, Windows, and FreeBSD.
The LLVM tool llvm-profgen can convert output of either Perf or SEP. Anexternal project, AutoFDO, alsoprovides a create_llvm_prof tool which supports Linux Perf output.
When using Perf:
```
$ perf record -b -e BR_INST_RETIRED.NEAR_TAKEN:uppp ./code
```
If the event above is unavailable, branches:u is probably next-best.
Note the use of the -b flag. This tells Perf to use the Last BranchRecord (LBR) to record call chains. While this is not strictly required,it provides better call information, which improves the accuracy ofthe profile data.
When using SEP:
```
$ sep -start -out code.tb7 -ec BR_INST_RETIRED.NEAR_TAKEN:precise=yes:pdir -lbr no_filter:usr -perf-script brstack -app ./code
```
This produces a code.perf.data.script output which can be used withllvm-profgen’s --perfscript input option.
Convert the collected profile data to LLVM’s sample profile format. This iscurrently supported via the AutoFDOconverter create_llvm_prof. Once built and installed, you can convertthe perf.data file to LLVM using the command:
```
$ create_llvm_prof --binary=./code --out=code.prof
```
This will read perf.data and the binary file ./code and emitthe profile data in code.prof. Note that if you ran perfwithout the -b flag, you need to use --use_lbr=false whencalling create_llvm_prof.
Alternatively, the LLVM tool llvm-profgen can also be used to generatethe LLVM sample profile:
```
$ llvm-profgen --binary=./code --output=code.prof --perfdata=perf.data
```
When using SEP the output is in the textual format corresponding tollvm-profgen --perfscript. For example:
```
$ llvm-profgen --binary=./code --output=code.prof --perfscript=code.perf.data.script
```

Build the code again using the collected profile. This step feedsthe profile back to the optimizers. This should result in a binarythat executes faster than the original one. Note that you are notrequired to build the code with the exact same arguments that youused in the first step. The only requirement is that you build the codewith the same debug info options and -fprofile-sample-use.

On Linux:

$ clang++ -O2 -gline-tables-only \ -fdebug-info-for-profiling -funique-internal-linkage-names \ -fprofile-sample-use=code.prof code.cc -o code

On Windows:

> clang-cl /O2 -gdwarf -gline-tables-only ^  /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^  /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf

[OPTIONAL] Sampling-based profiles can have inaccuracies or missing block/edge counters. The profile inference algorithm (profi) can be used to infermissing blocks and edge counts, and improve the quality of profile data.Enable it with -fsample-profile-use-profi. For example, on Linux:

$ clang++ -fsample-profile-use-profi -O2 -gline-tables-only \ -fdebug-info-for-profiling -funique-internal-linkage-names \ -fprofile-sample-use=code.prof code.cc -o code

On Windows:

> clang-cl /clang:-fsample-profile-use-profi /O2 -gdwarf -gline-tables-only ^  /clang:-fdebug-info-for-profiling /clang:-funique-internal-linkage-names ^  /fprofile-sample-use=code.prof code.cc /Fe:code /fuse-ld=lld /link /debug:dwarf

Sample Profile Formats¶

Since external profilers generate profile data in a variety of custom formats,the data generated by the profiler must be converted into a format that can beread by the backend. LLVM supports three different sample profile formats:

ASCII text. This is the easiest one to generate. The file is divided intosections, which correspond to each of the functions with profileinformation. The format is described below. It can also be generated fromthe binary or gcov formats using the llvm-profdata tool.
Binary encoding. This uses a more efficient encoding that yields smallerprofile files. This is the format generated by the create_llvm_prof toolin https://github.com/google/autofdo.
GCC encoding. This is based on the gcov format, which is accepted by GCC. Itis only interesting in environments where GCC and Clang co-exist. Thisencoding is only generated by the create_gcov tool inhttps://github.com/google/autofdo. It can be read by LLVM andllvm-profdata, but it cannot be generated by either.

If you are using Linux Perf to generate sampling profiles, you can use theconversion tool create_llvm_prof described in the previous section.Otherwise, you will need to write a conversion tool that converts yourprofiler’s native format into one of these three.

Sample Profile Text Format¶

This section describes the ASCII text format for sampling profiles. It is,arguably, the easiest one to generate. If you are interested in generating anyof the other two, consult the ProfileData library in LLVM’s source tree(specifically, include/llvm/ProfileData/SampleProfReader.h).

function1:total_samples:total_head_samples offset1[.discriminator]: number_of_samples [fn1:num fn2:num ... ] offset2[.discriminator]: number_of_samples [fn3:num fn4:num ... ] ... offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ] offsetA[.discriminator]: fnA:num_of_total_samples offsetA1[.discriminator]: number_of_samples [fn7:num fn8:num ... ] offsetA1[.discriminator]: number_of_samples [fn9:num fn10:num ... ] offsetB[.discriminator]: fnB:num_of_total_samples offsetB1[.discriminator]: number_of_samples [fn11:num fn12:num ... ]

This is a nested tree in which the indentation represents the nesting levelof the inline stack. There are no blank lines in the file. And the spacingwithin a single line is fixed. Additional spaces will result in an errorwhile reading the file.

Any line starting with the ‘#’ character is completely ignored.

Inlined calls are represented with indentation. The Inline stack is astack of source locations in which the top of the stack represents theleaf function, and the bottom of the stack represents the actualsymbol to which the instruction belongs.

Function names must be mangled in order for the profile loader tomatch them in the current translation unit. The two numbers in thefunction header specify how many total samples were accumulated in thefunction (first number), and the total number of samples accumulatedin the prologue of the function (second number). This head samplecount provides an indicator of how frequently the function is invoked.

There are two types of lines in the function body.

Sampled line represents the profile information of a source location.offsetN[.discriminator]: number_of_samples [fn5:num fn6:num ... ]
Callsite line represents the profile information of an inlined callsite.offsetA[.discriminator]: fnA:num_of_total_samples

Each sampled line may contain several items. Some are optional (markedbelow):

Source line offset. This number represents the line numberin the function where the sample was collected. The line number isalways relative to the line where symbol of the function isdefined. So, if the function has its header at line 280, the offset13 is at line 293 in the file.
Note that this offset should never be a negative number. This couldhappen in cases like macros. The debug machinery will register theline number at the point of macro expansion. So, if the macro wasexpanded in a line before the start of the function, the profileconverter should emit a 0 as the offset (this means that the optimizerswill not be able to associate a meaningful weight to the instructionsin the macro).
[OPTIONAL] Discriminator. This is used if the sampled programwas compiled with DWARF discriminator support(http://wiki.dwarfstd.org/index.php?title=Path_Discriminators).DWARF discriminators are unsigned integer values that allow thecompiler to distinguish between multiple execution paths on thesame source line location.
For example, consider the line of code if (cond) foo(); else bar();.If the predicate cond is true 80% of the time, then the edgeinto function foo should be considered to be taken most of thetime. But both calls to foo and bar are at the same sourceline, so a sample count at that line is not sufficient. Thecompiler needs to know which part of that line is taken morefrequently.
This is what discriminators provide. In this case, the calls tofoo and bar will be at the same line, but will havedifferent discriminator values. This allows the compiler to correctlyset edge weights into foo and bar.
Number of samples. This is an integer quantity representing thenumber of samples collected by the profiler at this sourcelocation.
[OPTIONAL] Potential call targets and samples. If present, thisline contains a call instruction. This models both direct andnumber of samples. For example,
```
130: 7 foo:3 bar:2 baz:7
```
The above means that at relative line offset 130 there is a callinstruction that calls one of foo(), bar() and baz(),with baz() being the relatively more frequently called target.

As an example, consider a program with the call chain main -> foo -> bar.When built with optimizations enabled, the compiler may inline thecalls to bar and foo inside main. The generated profilecould then be something like this:

main:35504:01: _Z3foov:35504 2: _Z32bari:31977 1.1: 319772: 0

This profile indicates that there were a total of 35,504 samplescollected in main. All of those were at line 1 (the call to foo).Of those, 31,977 were spent inside the body of bar. The last lineof the profile (2: 0) corresponds to line 2 inside main. Nosamples were collected there.

Profiling with Instrumentation¶

Clang also supports profiling via instrumentation. This requires building aspecial instrumented version of the code and has some runtimeoverhead during the profiling, but it provides more detailed results than asampling profiler. It also provides reproducible results, at least to theextent that the code behaves consistently across runs.

Clang supports two types of instrumentation: frontend-based and IR-based.Frontend-based instrumentation can be enabled with the option -fprofile-instr-generate,and IR-based instrumentation can be enabled with the option -fprofile-generate.For best performance with PGO, IR-based instrumentation should be used. It hasthe benefits of lower instrumentation overhead, smaller raw profile size, andbetter runtime performance. Frontend-based instrumentation, on the other hand,has better source correlation, so it should be used with source line-basedcoverage testing.

The flag -fcs-profile-generate also instruments programs using the sameinstrumentation method as -fprofile-generate. However, it performs apost-inline late instrumentation and can produce context-sensitive profiles.

Here are the steps for using profile guided optimization withinstrumentation:

Build an instrumented version of the code by compiling and linking with the-fprofile-generate or -fprofile-instr-generate option.
```
$ clang++ -O2 -fprofile-instr-generate code.cc -o code
```
Run the instrumented executable with inputs that reflect the typical usage.By default, the profile data will be written to a default.profraw filein the current directory. You can override that default by using option-fprofile-instr-generate= or by setting the LLVM_PROFILE_FILEenvironment variable to specify an alternate file. If non-default file nameis specified by both the environment variable and the command line option,the environment variable takes precedence. The file name pattern specifiedcan include different modifiers: %p, %h, %m, %t, and %c.
Any instance of %p in that file name will be replaced by the processID, so that you can easily distinguish the profile output from multipleruns.
```
$ LLVM_PROFILE_FILE="code-%p.profraw" ./code
```
The modifier %h can be used in scenarios where the same instrumentedbinary is run in multiple different host machines dumping profile datato a shared network based storage. The %h specifier will be substitutedwith the hostname so that profiles collected from different hosts do notclobber each other.
While the use of %p specifier can reduce the likelihood for the profilesdumped from different processes to clobber each other, such clobbering can stillhappen because of the pid re-use by the OS. Another side-effect of using%p is that the storage requirement for raw profile data files is greatlyincreased. To avoid issues like this, the %m specifier can used in the profilename. When this specifier is used, the profiler runtime will substitute %mwith a unique integer identifier associated with the instrumented binary. Additionally,multiple raw profiles dumped from different processes that share a file system (can beon different hosts) will be automatically merged by the profiler runtime during thedumping. If the program links in multiple instrumented shared libraries, each librarywill dump the profile data into its own profile data file (with its unique integerid embedded in the profile name). Note that the merging enabled by %m is for rawprofile data generated by profiler runtime. The resulting merged “raw” profile datafile still needs to be converted to a different format expected by the compiler (see step 3 below).
```
$ LLVM_PROFILE_FILE="code-%m.profraw" ./code
```
See this sectionabout the %t, and %c modifiers.
Combine profiles from multiple runs and convert the “raw” profile format tothe input expected by clang. Use the merge command of thellvm-profdata tool to do this.
```
$ llvm-profdata merge -output=code.profdata code-*.profraw
```
Note that this step is necessary even when there is only one “raw” profile,since the merge operation also changes the file format.
Build the code again using the -fprofile-use or -fprofile-instr-useoption to specify the collected profile data.
```
$ clang++ -O2 -fprofile-instr-use=code.profdata code.cc -o code
```
You can repeat step 4 as often as you like without regenerating theprofile. As you make changes to your code, clang may no longer be able touse the profile data. It will warn you when this happens.

Note that -fprofile-use option is semantically equivalent toits GCC counterpart, it does not handle profile formats produced by GCC.Both -fprofile-use and -fprofile-instr-use accept profiles in theindexed format, regardeless whether it is produced by frontend or the IR pass.

-fprofile-generate[=<dirname>]¶

The -fprofile-generate and -fprofile-generate= flags will usean alternative instrumentation method for profile generation. Whengiven a directory name, it generates the profile filedefault_%m.profraw in the directory named dirname if specified.If dirname does not exist, it will be created at runtime. %m specifierwill be substituted with a unique id documented in step 2 above. In other words,with -fprofile-generate[=<dirname>] option, the “raw” profile data automaticmerging is turned on by default, so there will no longer any risk of profileclobbering from different running processes. For example,

$ clang++ -O2 -fprofile-generate=yyy/zzz code.cc -o code

When code is executed, the profile will be written to the fileyyy/zzz/default_xxxx.profraw.

To generate the profile data file with the compiler readable format, thellvm-profdata tool can be used with the profile directory as the input:

$ llvm-profdata merge -output=code.profdata yyy/zzz/

If the user wants to turn off the auto-merging feature, or simply override thethe profile dumping path specified at command line, the environment variableLLVM_PROFILE_FILE can still be used to overridethe directory and filename for the profile file at runtime.To override the path and filename at compile time, use-Xclang -fprofile-instrument-path=/path/to/file_pattern.profraw.

-fcs-profile-generate[=<dirname>]¶

The -fcs-profile-generate and -fcs-profile-generate= flags will usethe same instrumentation method, and generate the same profile as in the-fprofile-generate and -fprofile-generate= flags. The difference isthat the instrumentation is performed after inlining so that the resultedprofile has a better context sensitive information. They cannot be usedtogether with -fprofile-generate and -fprofile-generate= flags.They are typically used in conjunction with -fprofile-use flag.The profile generated by -fcs-profile-generate and -fprofile-generatecan be merged by llvm-profdata. A use example:

$ clang++ -O2 -fprofile-generate=yyy/zzz code.cc -o code$ ./code$ llvm-profdata merge -output=code.profdata yyy/zzz/

The first few steps are the same as that in -fprofile-generatecompilation. Then perform a second round of instrumentation.

$ clang++ -O2 -fprofile-use=code.profdata -fcs-profile-generate=sss/ttt \ -o cs_code$ ./cs_code$ llvm-profdata merge -output=cs_code.profdata sss/ttt code.profdata

The resulted cs_code.prodata combines code.profdata and the profilegenerated from binary cs_code. Profile cs_code.profata can be used by-fprofile-use compilation.

$ clang++ -O2 -fprofile-use=cs_code.profdata

The above command will read both profiles to the compiler at the identicalpoint of instrumentations.

-fprofile-use[=<pathname>]¶: Without any other arguments, -fprofile-use behaves identically to-fprofile-instr-use. Otherwise, if pathname is the full path to aprofile file, it reads from that file. If pathname is a directory name,it reads from pathname/default.profdata.

-fprofile-update[=<method>]¶: Unless -fsanitize=thread is specified, the default is single, whichuses non-atomic increments. The counters can be inaccurate under threadcontention. atomic uses atomic increments which is accurate but hasoverhead. prefer-atomic will be transformed to atomic when supportedby the target, or single otherwise.

Fine Tuning Profile Collection¶

The PGO infrastructure provides user program knobs to fine tune profilecollection. Specifically, the PGO runtime provides the following functionsthat can be used to control the regions in the program where profiles shouldbe collected.

void __llvm_profile_set_filename(const char *Name): changes the name ofthe profile file to Name.
void __llvm_profile_reset_counters(void): resets all counters to zero.
int __llvm_profile_dump(void): write the profile data to disk.
int __llvm_orderfile_dump(void): write the order file to disk.

For example, the following pattern can be used to skip profiling programinitialization, profile two specific hot regions, and skip profiling programcleanup:

int main() { initialize(); // Reset all profile counters to 0 to omit profile collected during // initialize()'s execution. __llvm_profile_reset_counters(); ... hot region 1 // Dump the profile for hot region 1. __llvm_profile_set_filename("region1.profraw"); __llvm_profile_dump(); // Reset counters before proceeding to hot region 2. __llvm_profile_reset_counters(); ... hot region 2 // Dump the profile for hot region 2. __llvm_profile_set_filename("region2.profraw"); __llvm_profile_dump(); // Since the profile has been dumped, no further profile data // will be collected beyond the above __llvm_profile_dump(). cleanup(); return 0;}

These APIs’ names can be introduced to user programs in two ways.They can be declared as weak symbols on platforms which supporttreating weak symbols as null during linking. For example, the user canhave

__attribute__((weak)) int __llvm_profile_dump(void);// Then later in the same source fileif (__llvm_profile_dump) if (__llvm_profile_dump() != 0) { ... }// The first if condition tests if the symbol is actually defined.// Profile dumping only happens if the symbol is defined. Hence,// the user program works correctly during normal (not profile-generate)// executions.

Alternatively, the user program can include the headerprofile/instr_prof_interface.h, which contains the API names. For example,

#include "profile/instr_prof_interface.h"// Then later in the same source fileif (__llvm_profile_dump() != 0) { ... }

The user code does not need to check if the API names are defined, becausethese names are automatically replaced by (0) or the equivalence of noopif the clang is not compiling for profile generation.

Such replacement can happen because clang adds one of two macros dependingon the -fprofile-generate and the -fprofile-use flags.

__LLVM_INSTR_PROFILE_GENERATE: defined when one of-fprofile[-instr]-generate/-fcs-profile-generate is in effect.
__LLVM_INSTR_PROFILE_USE: defined when one of-fprofile-use/-fprofile-instr-use is in effect.

The two macros can be used to provide more flexibiilty so a user programcan execute code specifically intended for profile generate or profile use.For example, a user program can have special logging during profile generate:

#if __LLVM_INSTR_PROFILE_GENERATEexpensive_logging_of_full_program_state();#endif

The logging is automatically excluded during a normal build of the program,hence it does not impact performance during a normal execution.

It is advised to use such fine tuning only in a program’s cold regions. The weaksymbols can introduce extra control flow (the if checks), while the macros(hence declarations they guard in profile/instr_prof_interface.h)can change the control flow of the functions that use them between profilegeneration and profile use (which can lead to discarded counters in suchfunctions). Using these APIs in the program’s cold regions introduces lessoverhead and leads to more optimized code.

Disabling Instrumentation¶

In certain situations, it may be useful to disable profile generation or usefor specific files in a build, without affecting the main compilation flagsused for the other files in the project.

In these cases, you can use the flag -fno-profile-instr-generate (or-fno-profile-generate) to disable profile generation, and-fno-profile-instr-use (or -fno-profile-use) to disable profile use.

Note that these flags should appear after the corresponding profileflags to have an effect.

Note

When none of the translation units inside a binary is instrumented, in thecase of Fuchsia the profile runtime will not be linked into the binary andno profile will be produced, while on other platforms the profile runtimewill be linked and profile will be produced but there will not be anycounters.

Instrumenting only selected files or functions¶

Sometimes it’s useful to only instrument certain files or functions. Forexample in automated testing infrastructure, it may be desirable to onlyinstrument files or functions that were modified by a patch to reduce theoverhead of instrumenting a full system.

This can be done using the -fprofile-list option.

-fprofile-list=<pathname>¶

This option can be used to apply profile instrumentation only to selectedfiles or functions. pathname should point to a file in theSanitizer special case list format which selects which files andfunctions to instrument.

$ clang++ -O2 -fprofile-instr-generate -fprofile-list=fun.list code.cc -o code

The option can be specified multiple times to pass multiple files.

$ clang++ -O2 -fprofile-instr-generate -fcoverage-mapping -fprofile-list=fun.list -fprofile-list=code.list code.cc -o code

Supported sections are [clang], [llvm], and [csllvm] representingclang PGO, IRPGO, and CSIRPGO, respectively. Supported prefixes are functionand source. Supported categories are allow, skip, and forbid.skip adds the skipprofile attribute while forbid adds thenoprofile attribute to the appropriate function. Usedefault:<allow|skip|forbid> to specify the default category.

$ cat fun.list# The following cases are for clang instrumentation.[clang]# We might not want to profile functions that are inlined in many places.function:inlinedLots=skip# We want to forbid profiling where it might be dangerous.source:lib/unsafe/*.cc=forbid# Otherwise we allow profiling.default:allow

Older Prefixes¶

An older format is also supported, but it is only able to add thenoprofile attribute.To filter individual functions or entire source files use fun:<name> orsrc:<file> respectively. To exclude a function or a source file, use!fun:<name> or !src:<file> respectively. The format also supportswildcard expansion. The compiler generated functions are assumed to be locatedin the main source file. It is also possible to restrict the filter to aparticular instrumentation type by using a named section.
# all functions whose name starts with foo will be instrumented.fun:foo*# except for foo1 which will be excluded from instrumentation.!fun:foo1# every function in path/to/foo.cc will be instrumented.src:path/to/foo.cc# bar will be instrumented only when using backend instrumentation.# Recognized section names are clang, llvm and csllvm.[llvm]fun:bar
When the file contains only excludes, all files and functions except for theexcluded ones will be instrumented. Otherwise, only the files and functionsspecified will be instrumented.

Instrument function groups¶

Sometimes it is desirable to minimize the size overhead of instrumentedbinaries. One way to do this is to partition functions into groups and onlyinstrument functions in a specified group. This can be done using the-fprofile-function-groups and -fprofile-selected-function-group options.

-fprofile-function-groups=<N>, -fprofile-selected-function-group=<i>¶

The following uses 3 groups

$ clang++ -Oz -fprofile-generate=group_0/ -fprofile-function-groups=3 -fprofile-selected-function-group=0 code.cc -o code.0$ clang++ -Oz -fprofile-generate=group_1/ -fprofile-function-groups=3 -fprofile-selected-function-group=1 code.cc -o code.1$ clang++ -Oz -fprofile-generate=group_2/ -fprofile-function-groups=3 -fprofile-selected-function-group=2 code.cc -o code.2

After collecting raw profiles from the three binaries, they can be merged intoa single profile like normal.

$ llvm-profdata merge -output=code.profdata group_*/*.profraw

Profile remapping¶

When the program is compiled after a change that affects many symbol names,pre-existing profile data may no longer match the program. For example:

switching from libstdc++ to libc++ will result in the mangled names of allfunctions taking standard library types to change
renaming a widely-used type in C++ will result in the mangled names of allfunctions that have parameters involving that type to change
moving from a 32-bit compilation to a 64-bit compilation may change theunderlying type of size_t and similar types, resulting in changes tomanglings

Clang allows use of a profile remapping file to specify that such differencesin mangled names should be ignored when matching the profile data against theprogram.

-fprofile-remapping-file=<file>¶: Specifies a file containing profile remapping information, that will beused to match mangled names in the profile data to mangled names in theprogram.

The profile remapping file is a text file containing lines of the form

fragmentkind fragment1 fragment2

where fragmentkind is one of name, type, or encoding,indicating whether the following mangled name fragments are<name>s,<type>s, or<encoding>s,respectively.Blank lines and lines starting with # are ignored.

For convenience, built-in <substitution>s such as St and Ssare accepted as <name>s (even though they technically are not <name>s).

For example, to specify that absl::string_view and std::string_viewshould be treated as equivalent when matching profile data, the followingremapping file could be used:

# absl::string_view is considered equivalent to std::string_viewtype N4absl11string_viewE St17basic_string_viewIcSt11char_traitsIcEE# std:: might be std::__1:: in libc++ or std::__cxx11:: in libstdc++name 3std St3__1name 3std St7__cxx11

Matching profile data using a profile remapping file is supported on abest-effort basis. For example, information regarding indirect call targets iscurrently not remapped. For best results, you are encouraged to generate newprofile data matching the updated program, or to remap the profile datausing the llvm-cxxmap and llvm-profdata merge tools.

Note

Profile data remapping is currently only supported for C++ mangled namesfollowing the Itanium C++ ABI mangling scheme. This covers all C++ targetssupported by Clang other than Windows.

GCOV-based Profiling¶

GCOV is a test coverage program, it helps to know how often a line of codeis executed. When instrumenting the code with --coverage option, somecounters are added for each edge linking basic blocks.

At compile time, gcno files are generated containing information aboutblocks and edges between them. At runtime the counters are incremented and atexit the counters are dumped in gcda files.

The tool llvm-cov gcov will parse gcno, gcda and source files to generatea report .c.gcov.

-fprofile-filter-files=[regexes]¶

Define a list of regexes separated by a semi-colon.If a file name matches any of the regexes then the file is instrumented.

$ clang --coverage -fprofile-filter-files=".*\.c$" foo.c

For example, this will only instrument files finishing with .c, skipping .h files.

-fprofile-exclude-files=[regexes]¶

Define a list of regexes separated by a semi-colon.If a file name doesn’t match all the regexes then the file is instrumented.

$ clang --coverage -fprofile-exclude-files="^/usr/include/.*$" foo.c

For example, this will instrument all the files except the ones in /usr/include.

If both options are used then a file is instrumented if its name matches anyof the regexes from -fprofile-filter-list and doesn’t match all the regexesfrom -fprofile-exclude-list.

$ clang --coverage -fprofile-exclude-files="^/usr/include/.*$" \ -fprofile-filter-files="^/usr/.*$"

In that case /usr/foo/oof.h is instrumented since it matches the filter regex anddoesn’t match the exclude regex, but /usr/include/foo.h doesn’t since it matchesthe exclude regex.

Controlling Debug Information¶

Controlling Size of Debug Information¶

Debug info kind generated by Clang can be set by one of the flags listedbelow. If multiple flags are present, the last one is used.

-g0¶: Don’t generate any debug info (default).

-gline-tables-only¶

Generate line number tables only.

This kind of debug info allows to obtain stack traces with function names,file names and line numbers (by such tools as gdb or addr2line). Itdoesn’t contain any other data (e.g. description of local variables orfunction parameters).

-fstandalone-debug¶

Clang supports a number of optimizations to reduce the size of debuginformation in the binary. They work based on the assumption thatthe debug type information can be spread out over multiplecompilation units. Specifically, the optimizations are:

will not emit type definitions for types that are not needed by amodule and could be replaced with a forward declaration.
will only emit type info for a dynamic C++ class in the module thatcontains the vtable for the class.
will only emit type info for a C++ class (non-trivial, non-aggregate)in the modules that contain a definition for one of its constructors.
will only emit type definitions for types that are the subject of explicittemplate instantiation declarations in the presence of an explicitinstantiation definition for the type.

The -fstandalone-debug option turns off these optimizations.This is useful when working with 3rd-party libraries that don’t comewith debug information. Note that Clang will never emit typeinformation for types that are not referenced at all by the program.

-fno-standalone-debug¶: On Darwin -fstandalone-debug is enabled by default. The-fno-standalone-debug option can be used to get to turn on thevtable-based optimization described above.

-g¶: Generate complete debug info.

-feliminate-unused-debug-types¶: By default, Clang does not emit type information for types that are definedbut not used in a program. To retain the debug info for these unused types,the negation -fno-eliminate-unused-debug-types can be used.This can be particulary useful on Windows, when using NATVIS files thatcan reference const symbols that would otherwise be stripped, even in fulldebug or standalone debug modes.

Controlling Macro Debug Info Generation¶

Debug info for C preprocessor macros increases the size of debug information inthe binary. Macro debug info generated by Clang can be controlled by the flagslisted below.

-fdebug-macro¶: Generate debug info for preprocessor macros. This flag is discarded when-g0 is enabled.

-fno-debug-macro¶: Do not generate debug info for preprocessor macros (default).

Controlling Debugger “Tuning”¶

While Clang generally emits standard DWARF debug info (http://dwarfstd.org),different debuggers may know how to take advantage of different specific DWARFfeatures. You can “tune” the debug info for one of several different debuggers.

-ggdb, -glldb, -gsce, -gdbx¶: Tune the debug info for the gdb, lldb, Sony PlayStation®debugger, or dbx, respectively. Each of these options implies -g.(Therefore, if you want both -gline-tables-only and debugger tuning, thetuning option must come first.)

Controlling LLVM IR Output¶

Controlling Value Names in LLVM IR¶

Emitting value names in LLVM IR increases the size and verbosity of the IR.By default, value names are only emitted in assertion-enabled builds of Clang.However, when reading IR it can be useful to re-enable the emission of valuenames to improve readability.

-fdiscard-value-names¶: Discard value names when generating LLVM IR.

-fno-discard-value-names¶: Do not discard value names when generating LLVM IR. This option can be usedto re-enable names for release builds of Clang.

Comment Parsing Options¶

Clang parses Doxygen and non-Doxygen style documentation comments and attachesthem to the appropriate declaration nodes. By default, it only parsesDoxygen-style comments and ignores ordinary comments starting with // and/*.

-Wdocumentation¶

Emit warnings about use of documentation comments. This warning group is offby default.

This includes checking that \param commands name parameters that actuallypresent in the function signature, checking that \returns is used only onfunctions that actually return a value etc.

-Wno-documentation-unknown-command¶: Don’t warn when encountering an unknown Doxygen command.

-fparse-all-comments¶: Parse all comments as documentation comments (including ordinary commentsstarting with // and /*).

-fcomment-block-commands=[commands]¶

Define custom documentation commands as block commands. This allows Clang toconstruct the correct AST for these custom commands, and silences warningsabout unknown commands. Several commands must be separated by a commawithout trailing space; e.g. -fcomment-block-commands=foo,bar definescustom commands \foo and \bar.

It is also possible to use -fcomment-block-commands several times; e.g.-fcomment-block-commands=foo -fcomment-block-commands=bar does the sameas above.

C Language Features¶

The support for standard C in clang is feature-complete except for theC99 floating-point pragmas.

Extensions supported by clang¶

See Clang Language Extensions.

Differences between various standard modes¶

clang supports the -std option, which changes what language mode clang uses.The supported modes for C are c89, gnu89, c94, c99, gnu99, c11, gnu11, c17,gnu17, c23, gnu23, c2y, gnu2y, and various aliases for those modes. If no -stdoption is specified, clang defaults to gnu17 mode. Many C99 and C11 featuresare supported in earlier modes as a conforming extension, with a warning. Use-pedantic-errors to request an error if a feature from a later standardrevision is used in an earlier mode.

Differences between all c* and gnu* modes:

c* modes define “__STRICT_ANSI__”.
Target-specific defines not prefixed by underscores, like linux,are defined in gnu* modes.
Trigraphs default to being off in gnu* modes; they can be enabledby the -trigraphs option.
The parser recognizes asm and typeof as keywords in gnu* modes;the variants __asm__ and __typeof__ are recognized in all modes.
The parser recognizes inline as a keyword in gnu* mode, inaddition to recognizing it in the *99 and later modes for which it ispart of the ISO C standard. The variant __inline__ is recognized in allmodes.
The Apple “blocks” extension is recognized by default in gnu* modeson some platforms; it can be enabled in any mode with the -fblocksoption.

Differences between *89 and *94 modes:

Digraphs are not recognized in c89 mode.

Differences between *94 and *99 modes:

The *99 modes default to implementing inline / __inline__as specified in C99, while the *89 modes implement the GNU version.This can be overridden for individual functions with the __gnu_inline__attribute.
The scope of names defined inside a for, if, switch, while,or do statement is different. (example: if ((struct x {int x;}*)0) {}.)
__STDC_VERSION__ is not defined in *89 modes.
inline is not recognized as a keyword in c89 mode.
restrict is not recognized as a keyword in *89 modes.
Commas are allowed in integer constant expressions in *99 modes.
Arrays which are not lvalues are not implicitly promoted to pointersin *89 modes.
Some warnings are different.

Differences between *99 and *11 modes:

Warnings for use of C11 features are disabled.
__STDC_VERSION__ is defined to 201112L rather than 199901L.

Differences between *11 and *17 modes:

__STDC_VERSION__ is defined to 201710L rather than 201112L.

Differences between *17 and *23 modes:

__STDC_VERSION__ is defined to 202311L rather than 201710L.
nullptr and nullptr_t are supported, only in *23 mode.
ATOMIC_VAR_INIT is removed from *23 mode.
bool, true, false, alignas, alignof, static_assert,and thread_local are now first-class keywords, only in *23 mode.
typeof and typeof_unqual are supported, only *23 mode.
Bit-precise integers (_BitInt(N)) are supported by default in *23mode, and as an extension in *17 and earlier modes.
[[]] attributes are supported by default in *23 mode, and as anextension in *17 and earlier modes.

Differences between *23 and *2y modes:

__STDC_VERSION__ is defined to 202400L rather than 202311L.

GCC extensions not implemented yet¶

clang tries to be compatible with gcc as much as possible, but some gccextensions are not implemented yet:

clang does not support decimal floating point types (_Decimal32 andfriends) yet.
clang does not support nested functions; this is a complex featurewhich is infrequently used, so it is unlikely to be implementedanytime soon. In C++11 it can be emulated by assigning lambdafunctions to local variables, e.g:
```
auto const local_function = [&](int parameter) { // Do something};...local_function(1);
```
clang only supports global register variables when the register specifiedis non-allocatable (e.g. the stack pointer). Support for general globalregister variables is unlikely to be implemented soon because it requiresadditional LLVM backend support.
clang does not support static initialization of flexible arraymembers. This appears to be a rarely used extension, but could beimplemented pending user demand.
clang does not support__builtin_va_arg_pack/__builtin_va_arg_pack_len. This isused rarely, but in some potentially interesting places, like theglibc headers, so it may be implemented pending user demand. Notethat because clang pretends to be like GCC 4.2, and this extensionwas introduced in 4.3, the glibc headers will not try to use thisextension with clang at the moment.
clang does not support the gcc extension for forward-declaringfunction parameters; this has not shown up in any real-world codeyet, though, so it might never be implemented.

This is not a complete list; if you find an unsupported extensionmissing from this list, please send an e-mail to cfe-dev. This listcurrently excludes C++; see C++ Language Features. Also, thislist does not include bugs in mostly-implemented features; please seethe bugtrackerfor known existing bugs (FIXME: Is there a section for bug-reportingguidelines somewhere?).

Intentionally unsupported GCC extensions¶

clang does not support the gcc extension that allows variable-lengtharrays in structures. This is for a few reasons: one, it is tricky toimplement, two, the extension is completely undocumented, and three,the extension appears to be rarely used. Note that clang doessupport flexible array members (arrays with a zero or unspecifiedsize at the end of a structure).
GCC accepts many expression forms that are not valid integer constantexpressions in bit-field widths, enumerator constants, case labels,and in array bounds at global scope. Clang also accepts additionalexpression forms in these contexts, but constructs that GCC accepts due tosimplifications GCC performs while parsing, such as x - x (where x is avariable) will likely never be accepted by Clang.
clang does not support __builtin_apply and friends; this extensionis extremely obscure and difficult to implement reliably.

Microsoft extensions¶

clang has support for many extensions from Microsoft Visual C++. To enable theseextensions, use the -fms-extensions command-line option. This is the defaultfor Windows targets. Clang does not implement every pragma or declspec providedby MSVC, but the popular ones, such as __declspec(dllexport) and #pragmacomment(lib) are well supported.

clang has a -fms-compatibility flag that makes clang accept enoughinvalid C++ to be able to parse most Microsoft headers. For example, itallows unqualified lookup of dependent base class members, which isa common compatibility issue with clang. This flag is enabled by defaultfor Windows targets.

-fdelayed-template-parsing lets clang delay parsing of function templatedefinitions until the end of a translation unit. This flag is enabled bydefault for Windows targets.

For compatibility with existing code that compiles with MSVC, clang defines the_MSC_VER and _MSC_FULL_VER macros. When on Windows, these default toeither the same value as the currently installed version of cl.exe, or 1933and 193300000 (respectively). The -fms-compatibility-version= flagoverrides these values. It accepts a dotted version tuple, such as 19.00.23506.Changing the MSVC compatibility version makes clang behave more like thatversion of MSVC. For example, -fms-compatibility-version=19 will enableC++14 features and define char16_t and char32_t as builtin types.

C++ Language Features¶

clang fully implements all of standard C++98 except for exportedtemplates (which were removed in C++11), all of standard C++11,C++14, and C++17, and most of C++20.

See the C++ support in Clang pagefor detailed information on C++ feature support across Clang versions.

Controlling implementation limits¶

-fbracket-depth=N¶: Sets the limit for nested parentheses, brackets, and braces to N. Thedefault is 256.

-fconstexpr-depth=N¶: Sets the limit for constexpr function invocations to N. The default is 512.

-fconstexpr-steps=N¶: Sets the limit for the number of full-expressions evaluated in a singleconstant expression evaluation. This also controls the maximum sizeof array and dynamic array allocation that can be constant evaluated.The default is 1048576.

-ftemplate-depth=N¶: Sets the limit for recursively nested template instantiations to N. Thedefault is 1024.

-foperator-arrow-depth=N¶: Sets the limit for iterative calls to ‘operator->’ functions to N. Thedefault is 256.

Objective-C Language Features¶

Objective-C++ Language Features¶

OpenMP Features¶

Clang supports all OpenMP 4.5 directives and clauses. See OpenMP Supportfor additional details.

Use -fopenmp to enable OpenMP. Support for OpenMP can be disabled with-fno-openmp.

Use -fopenmp-simd to enable OpenMP simd features only, without linkingthe runtime library; for combined constructs(e.g. #pragma omp parallel for simd) the non-simd directives and clauseswill be ignored. This can be disabled with -fno-openmp-simd.

Controlling implementation limits¶

-fopenmp-use-tls¶: Controls code generation for OpenMP threadprivate variables. In presence ofthis option all threadprivate variables are generated the same way as threadlocal variables, using TLS support. If -fno-openmp-use-tlsis provided or target does not support TLS, code generation for threadprivatevariables relies on OpenMP runtime library.

OpenCL Features¶

Clang can be used to compile OpenCL kernels for execution on a device(e.g. GPU). It is possible to compile the kernel into a binary (e.g. for AMDGPU)that can be uploaded to run directly on a device (e.g. usingclCreateProgramWithBinary) orinto generic bitcode files loadable into other toolchains.

Compiling to a binary using the default target from the installation can be doneas follows:

$ echo "kernel void k(){}" > test.cl$ clang test.cl

Compiling for a specific target can be done by specifying the triple correspondingto the target, for example:

$ clang --target=nvptx64-unknown-unknown test.cl$ clang --target=amdgcn-amd-amdhsa -mcpu=gfx900 test.cl

Compiling to bitcode can be done as follows:

$ clang -c -emit-llvm test.cl

This will produce a file test.bc that can be used in vendor toolchainsto perform machine code generation.

Note that if compiled to bitcode for generic targets such as SPIR/SPIR-V,portable IR is produced that can be used with various vendortools as well as open source tools such as SPIRV-LLVM Translatorto produce SPIR-V binary. More details are provided in the offlinecompilation from OpenCL kernel sources into SPIR-V using open sourcetools.From clang 14 onwards SPIR-V can be generated directly as detailed inthe SPIR-V support section.

Clang currently supports OpenCL C language standards up to v2.0. Clang mainlysupports full profile. There is only very limited support of the embeddedprofile.From clang 9 a C++ mode is available for OpenCL (seeC++ for OpenCL).

OpenCL v3.0 support is complete but it remains in experimental state, see moredetails about the experimental features and limitations in OpenCL Supportpage.

OpenCL Specific Options¶

Most of the OpenCL build options from the specification v2.0 section 5.8.4 are available.

Examples:

$ clang -cl-std=CL2.0 -cl-single-precision-constant test.cl

Many flags used for the compilation for C sources can also be passed whilecompiling for OpenCL, examples: -c, -O<1-4|s>, -o, -emit-llvm, etc.

Some extra options are available to support special OpenCL features.

-cl-no-stdinc¶

Allows to disable all extra types and functions that are not native to the compiler.This might reduce the compilation speed marginally but many declarations from theOpenCL standard will not be accessible. For example, the following will fail tocompile.

$ echo "bool is_wg_uniform(int i){return get_enqueued_local_size(i)==get_local_size(i);}" > test.cl$ clang -cl-std=CL2.0 -cl-no-stdinc test.clerror: use of undeclared identifier 'get_enqueued_local_size'error: use of undeclared identifier 'get_local_size'

More information about the standard types and functions is provided in thesection on the OpenCL Header.

-cl-ext¶

Enables/Disables support of OpenCL extensions and optional features. All OpenCLtargets set a list of extensions that they support. Clang allows to amend this usingthe -cl-ext flag with a comma-separated list of extensions prefixed with'+' or '-'. The syntax: -cl-ext=<(['-'|'+']<extension>[,])+>, whereextensions can be either one of the OpenCL published extensionsor any vendor extension. Alternatively, 'all' can be used to enableor disable all known extensions.

Example disabling double support for the 64-bit SPIR-V target:

$ clang -c --target=spirv64 -cl-ext=-cl_khr_fp64 test.cl

Enabling all extensions except double support in R600 AMD GPU can be done using:

$ clang --target=r600 -cl-ext=-all,+cl_khr_fp16 test.cl

Note that some generic targets e.g. SPIR/SPIR-V enable all extensions/features inclang by default.

OpenCL Targets¶

OpenCL targets are derived from the regular Clang target classes. The OpenCLspecific parts of the target representation provide address space mapping aswell as a set of supported extensions.

Specific Targets¶

There is a set of concrete HW architectures that OpenCL can be compiled for.

For AMD target:

$ clang --target=amdgcn-amd-amdhsa -mcpu=gfx900 test.cl

For Nvidia architectures:

$ clang --target=nvptx64-unknown-unknown test.cl

Generic Targets¶

A SPIR-V binary can be produced for 32 or 64 bit targets.
$ clang --target=spirv32 -c test.cl$ clang --target=spirv64 -c test.cl
More details can be found in the SPIR-V support section.
SPIR is available as a generic target to allow portable bitcode to be producedthat can be used across GPU toolchains. The implementation follows the SPIRspecification. There are two flavorsavailable for 32 and 64 bits.
$ clang --target=spir test.cl -emit-llvm -c$ clang --target=spir64 test.cl -emit-llvm -c
Clang will generate SPIR v1.2 compatible IR for OpenCL versions up to 2.0 andSPIR v2.0 for OpenCL v2.0 or C++ for OpenCL.
x86 is used by some implementations that are x86 compatible and currentlyremains for backwards compatibility (with older implementations prior toSPIR target support). For “non-SPMD” targets which cannot spawn multiplework-items on the fly using hardware, which covers practically all non-GPUdevices such as CPUs and DSPs, additional processing is needed for the kernelsto support multiple work-item execution. For this, a 3rd party toolchain,such as for example POCL, can be used.
This target does not support multiple memory segments and, therefore, the fakeaddress space map can be added using the -ffake-address-space-map flag.
All known OpenCL extensions and features are set to supported in the generic targets,however -cl-ext flag can be used to toggle individual extensions andfeatures.

OpenCL Header¶

By default Clang will include standard headers and therefore most of OpenCLbuiltin functions and types are available during compilation. Thedefault declarations of non-native compiler types and functions can be disabledby using flag -cl-no-stdinc.

The following example demonstrates that OpenCL kernel sources with variousstandard builtin functions can be compiled without the need for an explicitincludes or compiler flags.

$ echo "bool is_wg_uniform(int i){return get_enqueued_local_size(i)==get_local_size(i);}" > test.cl$ clang -cl-std=CL2.0 test.cl

More information about the default headers is provided in OpenCL Support.

OpenCL Extensions¶

Most of the cl_khr_* extensions to OpenCL C from the official OpenCLregistry are available andconfigured per target depending on the support available in the specificarchitecture.

It is possible to alter the default extensions setting per target using-cl-ext flag. (See flags description for more details).

Vendor extensions can be added flexibly by declaring the list of types andfunctions associated with each extensions enclosed within the followingcompiler pragma directives:

#pragma OPENCL EXTENSION the_new_extension_name : begin// declare types and functions associated with the extension here#pragma OPENCL EXTENSION the_new_extension_name : end

For example, parsing the following code adds my_t type and my_funcfunction to the custom my_ext extension.

#pragma OPENCL EXTENSION my_ext : begintypedef struct{ int a;}my_t;void my_func(my_t);#pragma OPENCL EXTENSION my_ext : end

There is no conflict resolution for identifier clashes among extensions.It is therefore recommended that the identifiers are prefixed with adouble underscore to avoid clashing with user space identifiers. Vendorextension should use reserved identifier prefix e.g. amd, arm, intel.

Clang also supports language extensions documented in The OpenCL C LanguageExtensions Documentation.

OpenCL-Specific Attributes¶

OpenCL support in Clang contains a set of attribute taken directly from thespecification as well as additional attributes.

nosvm¶

Clang supports this attribute to comply to OpenCL v2.0 conformance, but itdoes not have any effect on the IR. For more details reffer to the specificationsection 6.7.2

opencl_unroll_hint¶

The implementation of this feature mirrors the unroll hint for C.More details on the syntax can be found in the specificationsection 6.11.5

convergent¶

To make sure no invalid optimizations occur for single program multiple data(SPMD) / single instruction multiple thread (SIMT) Clang provides attributes thatcan be used for special functions that have cross work item semantics.An example is the subgroup operations such as intel_sub_group_shuffle

// Define custom my_sub_group_shuffle(data, c)// that makes use of intel_sub_group_shuffler1 = ...if (r0) r1 = computeA();// Shuffle data from r1 into r3// of threads id r2.r3 = my_sub_group_shuffle(r1, r2);if (r0) r3 = computeB();

with non-SPMD semantics this is optimized to the following equivalent code:

r1 = ...if (!r0) // Incorrect functionality! The data in r1 // have not been computed by all threads yet. r3 = my_sub_group_shuffle(r1, r2);else { r1 = computeA(); r3 = my_sub_group_shuffle(r1, r2); r3 = computeB();}

Declaring the function my_sub_group_shuffle with the convergent attributewould prevent this:

my_sub_group_shuffle() __attribute__((convergent));

Using convergent guarantees correct execution by keeping CFG equivalencewrt operations marked as convergent. CFG G´ is equivalent to G wrtnode Ni : iff ∀ Nj (i≠j) domination and post-domination relations withrespect to Ni remain the same in both G and G´.

noduplicate¶

noduplicate is more restrictive with respect to optimizations thanconvergent because a convergent function only preserves CFG equivalence.This allows some optimizations to happen as long as the control flow remainsunmodified.

for (int i=0; i<4; i++) my_sub_group_shuffle()

can be modified to:

my_sub_group_shuffle();my_sub_group_shuffle();my_sub_group_shuffle();my_sub_group_shuffle();

while using noduplicate would disallow this. Also noduplicate doesn’thave the same safe semantics of CFG as convergent and can cause changes inCFG that modify semantics of the original program.

noduplicate is kept for backwards compatibility only and it considered to bedeprecated for future uses.

C++ for OpenCL¶

Starting from clang 9 kernel code can contain C++17 features: classes, templates,function overloading, type deduction, etc. Please note that this is not animplementation of OpenCL C++ andthere is no plan to support it in clang in any new releases in the near future.

Clang currently supports C++ for OpenCL 1.0 and 2021.For detailed information about this language refer to the C++ for OpenCLProgramming Language Documentation availablein the latest buildor in the official release.

To enable the C++ for OpenCL mode, pass one of following command line options whencompiling .clcpp file:

C++ for OpenCL 1.0: -cl-std=clc++, -cl-std=CLC++, -cl-std=clc++1.0,-cl-std=CLC++1.0, -std=clc++, -std=CLC++, -std=clc++1.0 or-std=CLC++1.0.
C++ for OpenCL 2021: -cl-std=clc++2021, -cl-std=CLC++2021,-std=clc++2021, -std=CLC++2021.

Example of use:

template<class T> T add( T x, T y ){ return x + y;}__kernel void test( __global float* a, __global float* b){ auto index = get_global_id(0); a[index] = add(b[index], b[index+1]);}

clang -cl-std=clc++1.0 test.clcppclang -cl-std=clc++ -c --target=spirv64 test.cl

By default, files with .clcpp extension are compiled with the C++ forOpenCL 1.0 mode.

clang test.clcpp

For backward compatibility files with .cl extensions can also be compiledin C++ for OpenCL mode but the desirable language mode must be activated witha flag.

clang -cl-std=clc++ test.cl

Support of C++ for OpenCL 2021 is currently in experimental phase, refer toOpenCL Support for more details.

C++ for OpenCL kernel sources can also be compiled online in drivers supportingcl_ext_cxx_for_openclextension.

Constructing and destroying global objects¶

Global objects with non-trivial constructors require the constructors to be runbefore the first kernel using the global objects is executed. Similarly globalobjects with non-trivial destructors require destructor invocation just afterthe last kernel using the program objects is executed.In OpenCL versions earlier than v2.2 there is no support for invoking globalconstructors. However, an easy workaround is to manually enqueue theconstructor initialization kernel that has the following name scheme_GLOBAL__sub_I_<compiled file name>.This kernel is only present if there are global objects with non-trivialconstructors present in the compiled binary. One way to check this is bypassing CL_PROGRAM_KERNEL_NAMES to clGetProgramInfo (OpenCL v2.0s5.8.7) and then checking whether any kernel name matches the naming scheme ofglobal constructor initialization kernel above.

Note that if multiple files are compiled and linked into libraries, multiplekernels that initialize global objects for multiple modules would have to beinvoked.

Applications are currently required to run initialization of global objectsmanually before running any kernels in which the objects are used.

clang -cl-std=clc++ test.cl

If there are any global objects to be initialized, the final binary willcontain the _GLOBAL__sub_I_test.cl kernel to be enqueued.

Note that the manual workaround only applies to objects declared at theprogram scope. There is no manual workaround for the construction of staticobjects with non-trivial constructors inside functions.

Global destructors can not be invoked manually in the OpenCL v2.0 drivers.However, all memory used for program scope objects should be released onclReleaseProgram.

Libraries¶

Limited experimental support of C++ standard libraries for OpenCL isdescribed in OpenCL Support page.

Target-Specific Features and Limitations¶

CPU Architectures Features and Limitations¶

X86¶

The support for X86 (both 32-bit and 64-bit) is considered stable onDarwin (macOS), Linux, FreeBSD, and Dragonfly BSD: it has been testedto correctly compile many large C, C++, Objective-C, and Objective-C++codebases.

On x86_64-mingw32, passing i128(by value) is incompatible with theMicrosoft x64 calling convention. You might need to tweakWinX86_64ABIInfo::classify() in lib/CodeGen/Targets/X86.cpp.

For the X86 target, clang supports the -m16 command lineargument which enables 16-bit code output. This is broadly similar tousing asm(".code16gcc") with the GNU toolchain. The generated codeand the ABI remains 32-bit but the assembler emits instructionsappropriate for a CPU running in 16-bit mode, with address-size andoperand-size prefixes to enable 32-bit addressing and operations.

Several micro-architecture levels as specified by the x86-64 psABI are defined.They are cumulative in the sense that features from previous levels areimplicitly included in later levels.

-march=x86-64: CMOV, CMPXCHG8B, FPU, FXSR, MMX, FXSR, SCE, SSE, SSE2
-march=x86-64-v2: (close to Nehalem) CMPXCHG16B, LAHF-SAHF, POPCNT, SSE3, SSE4.1, SSE4.2, SSSE3
-march=x86-64-v3: (close to Haswell) AVX, AVX2, BMI1, BMI2, F16C, FMA, LZCNT, MOVBE, XSAVE
-march=x86-64-v4: AVX512F, AVX512BW, AVX512CD, AVX512DQ, AVX512VL

Intel AVX10 ISA isa major new vector ISA incorporating the modern vectorization aspects ofIntel AVX-512. This ISA will be supported on all future Intel processors.Users are supposed to use the new options -mavx10.N and -mavx10.N-512on these processors and should not use traditional AVX512 options anymore.

The N in -mavx10.N represents a continuous integer number startingfrom 1. -mavx10.N is an alias of -mavx10.N-256, which means toenable all instructions within AVX10 version N at a maximum vector length of256 bits. -mavx10.N-512 enables all instructions at a maximum vectorlength of 512 bits, which is a superset of instructions -mavx10.N enabled.

Current binaries built with AVX512 features can run on Intel AVX10/512 capableprocessors without re-compile, but cannot run on AVX10/256 capable processors.Users need to re-compile their code with -mavx10.N, and maybe update somecode that calling to 512-bit X86 specific intrinsics and passing or returning512-bit vector types in function call, if they want to run on AVX10/256 capableprocessors. Binaries built with -mavx10.N can run on both AVX10/256 andAVX10/512 capable processors.

Users can add a -mno-evex512 in the command line with AVX512 options ifthey want to run the binary on both legacy AVX512 and new AVX10/256 capableprocessors. The option has the same constraints as -mavx10.N, i.e.,cannot call to 512-bit X86 specific intrinsics and pass or return 512-bit vectortypes in function call.

Users should avoid using AVX512 features in function target attributes whendeveloping code for AVX10. If they have to do so, they need to add an explicitevex512 or no-evex512 together with AVX512 features for 512-bit ornon-512-bit functions respectively to avoid unexpected code generation. Bothcommand line option and target attribute of EVEX512 feature can only be usedwith AVX512. They don’t affect vector size of AVX10.

User should not mix the use AVX10 and AVX512 options together at any time,because the option combinations are conflicting sometimes. For example, acombination of -mavx512f -mavx10.1-256 doesn’t show a clear intention tocompiler, since instructions in AVX512F and AVX10.1/256 intersect but do notoverlap. In this case, compiler will emit warning for it, but the behavioris determined. It will generate the same code as option -mavx10.1-512.A similar case is -mavx512f -mavx10.2-256, which equals to-mavx10.1-512 -mavx10.2-256, because avx10.2-256 implies avx10.1-256and -mavx512f -mavx10.1-256 equals to -mavx10.1-512.

There are some new macros introduced with AVX10 support. -mavx10.1-256 willenable __AVX10_1__ and __EVEX256__, while -mavx10.1-512 enables__AVX10_1__, __EVEX256__, __EVEX512__ and __AVX10_1_512__.Besides, both -mavx10.1-256 and -mavx10.1-512 will enable all AVX512feature specific macros. A AVX512 feature will enable both __EVEX256__,__EVEX512__ and its own macro. So __EVEX512__ can be used to guard codethat can run on both legacy AVX512 and AVX10/512 capable processors but cannotrun on AVX10/256, while a AVX512 macro like __AVX512F__ cannot tell thedifference among the three options. Users need to check additional macros__AVX10_1__ and __EVEX512__ if they want to make distinction.

ARM¶

The support for ARM (specifically ARMv6 and ARMv7) is considered stableon Darwin (iOS): it has been tested to correctly compile many large C,C++, Objective-C, and Objective-C++ codebases. Clang only supports alimited number of ARM architectures. It does not yet fully supportARMv5, for example.

PowerPC¶

The support for PowerPC (especially PowerPC64) is considered stableon Linux and FreeBSD: it has been tested to correctly compile manylarge C and C++ codebases. PowerPC (32bit) is still missing certainfeatures (e.g. PIC code on ELF platforms).

Other platforms¶

clang currently contains some support for other architectures (e.g. Sparc);however, significant pieces of code generation are still missing, and theyhaven’t undergone significant testing.

clang contains limited support for the MSP430 embedded processor, butboth the clang support and the LLVM backend support are highlyexperimental.

Other platforms are completely unsupported at the moment. Adding theminimal support needed for parsing and semantic analysis on a newplatform is quite easy; see lib/Basic/Targets.cpp in the clang sourcetree. This level of support is also sufficient for conversion to LLVM IRfor simple programs. Proper support for conversion to LLVM IR requiresadding code to lib/CodeGen/CGCall.cpp at the moment; this is likely tochange soon, though. Generating assembly requires a suitable LLVMbackend.

Operating System Features and Limitations¶

Windows¶

Clang has experimental support for targeting “Cygming” (Cygwin / MinGW)platforms.

Cygwin¶

Clang works on Cygwin-1.7.

MinGW32¶

Clang works on some mingw32 distributions. Clang assumes directories asbelow;

C:/mingw/include
C:/mingw/lib
C:/mingw/lib/gcc/mingw32/4.[3-5].0/include/c++

On MSYS, a few tests might fail.

MinGW-w64¶

For 32-bit (i686-w64-mingw32), and 64-bit (x86_64-w64-mingw32), Clangassumes as below;

GCC versions 4.5.0 to 4.5.3, 4.6.0 to 4.6.2, or 4.7.0 (for the C++ header search path)
some_directory/bin/gcc.exe
some_directory/bin/clang.exe
some_directory/bin/clang++.exe
some_directory/bin/../include/c++/GCC_version
some_directory/bin/../include/c++/GCC_version/x86_64-w64-mingw32
some_directory/bin/../include/c++/GCC_version/i686-w64-mingw32
some_directory/bin/../include/c++/GCC_version/backward
some_directory/bin/../x86_64-w64-mingw32/include
some_directory/bin/../i686-w64-mingw32/include
some_directory/bin/../include

This directory layout is standard for any toolchain you will find on theofficial MinGW-w64 website.

Clang expects the GCC executable “gcc.exe” compiled fori686-w64-mingw32 (or x86_64-w64-mingw32) to be present on PATH.

Some tests might fail onx86_64-w64-mingw32.

AIX¶

TOC Data Transformation¶

TOC data transformation is off by default (-mno-tocdata).When -mtocdata is specified, the TOC data transformation will be applied toall suitable variables with static storage duration, including static datamembers of classes and block-scope static variables (if not marked as exceptions,see further below).

Suitable variables must:

have complete types
be independently generated (i.e., not placed in a pool)
be at most as large as a pointer
not be aligned more strictly than a pointer
not be structs containing flexible array members
not have internal linkage
not have aliases
not have section attributes
not be thread local storage

The TOC data transformation results in the variable, not its address,being placed in the TOC. This eliminates the need to load the address of thevariable from the TOC.

Note:If the TOC data transformation is applied to a variable whose definitionis imported, the linker will generate fixup code for reading or writing to thevariable.

When multiple toc-data options are used, the last option used has the affect.For example: -mno-tocdata=g5,g1 -mtocdata=g1,g2 -mno-tocdata=g2 -mtocdata=g3,g4results in -mtocdata=g1,g3,g4

Names of variables not having external linkage will be ignored.

Options:

-mno-tocdata¶: This is the default behaviour. Only variables explicitly specified with-mtocdata= will have the TOC data transformation applied.

-mtocdata¶: Apply the TOC data transformation to all suitable variables with staticstorage duration (including static data members of classes and block-scopestatic variables) that are not explicitly specified with -mno-tocdata=.

-mno-tocdata=¶: Can be used in conjunction with -mtocdata to mark the comma-separatedlist of external linkage variables, specified using their mangled names, asexceptions to -mtocdata.

-mtocdata=¶: Apply the TOC data transformation to the comma-separated list of externallinkage variables, specified using their mangled names, if they are suitable.Emit diagnostics for all unsuitable variables specified.

Default Visibility Export Mapping¶

The -mdefault-visibility-export-mapping= option can be used to controlmapping of default visibility to an explicit shared object export(i.e. XCOFF exported visibility). Three values are provided for the option:

-mdefault-visibility-export-mapping=none: no additional exportinformation is created for entities with default visibility.
-mdefault-visibility-export-mapping=explicit: mark entities for exportif they have explicit (e.g. via an attribute) default visibility from thesource, including RTTI.
-mdefault-visibility-export-mapping=all: set XCOFF exported visibilityfor all entities with default visibility from any source. This gives aexport behavior similar to ELF platforms where all entities with defaultvisibility are exported.

SPIR-V support¶

Clang supports generation of SPIR-V conformant to the OpenCL EnvironmentSpecification.

To generate SPIR-V binaries, Clang uses the external llvm-spirv tool from theSPIRV-LLVM-Translator repo.

Prior to the generation of SPIR-V binary with Clang, llvm-spirvshould be built or installed. Please refer to the following instructionsfor more details. Clang will look for llvm-spirv-<LLVM-major-version> andllvm-spirv executables, in this order, in the PATH environment variable.Clang uses llvm-spirv with the widely adopted assembly syntax package.

The versioning ofllvm-spirv is aligned with Clang major releases. The same applies to themain development branch. It is therefore important to ensure the llvm-spirvversion is in alignment with the Clang version. For troubleshooting purposesllvm-spirv can be tested in isolation.

Example usage for OpenCL kernel compilation:

$ clang --target=spirv32 -c test.cl$ clang --target=spirv64 -c test.cl

Both invocations of Clang will result in the generation of a SPIR-V binary filetest.o for 32 bit and 64 bit respectively. This file can be importedby an OpenCL driver that support SPIR-V consumption or it can be compiledfurther by offline SPIR-V consumer tools.

Converting to SPIR-V produced with the optimization levels other than -O0 iscurrently available as an experimental feature and it is not guaranteed to workin all cases.

Clang also supports integrated generation of SPIR-V without use of llvm-spirvtool as an experimental feature when -fintegrated-objemitter flag is passed inthe command line.

$ clang --target=spirv32 -fintegrated-objemitter -c test.cl

Note that only very basic functionality is supported at this point and thereforeit is not suitable for arbitrary use cases. This feature is only enabled when clangbuild is configured with -DLLVM_EXPERIMENTAL_TARGETS_TO_BUILD=SPIRV option.

Linking is done using spirv-link from the SPIRV-Tools project. Similar to other externallinkers, Clang will expect spirv-link to be installed separately and to bepresent in the PATH environment variable. Please refer to the build andinstallation instructions.

$ clang --target=spirv64 test1.cl test2.cl

More information about the SPIR-V target settings and supported versions of SPIR-Vformat can be found in the SPIR-V target guide.

clang-cl¶

clang-cl is an alternative command-line interface to Clang, designed forcompatibility with the Visual C++ compiler, cl.exe.

To enable clang-cl to find system headers, libraries, and the linker when runfrom the command-line, it should be executed inside a Visual Studio Native ToolsCommand Prompt or a regular Command Prompt where the environment has been setup using e.g. vcvarsall.bat.

clang-cl can also be used from inside Visual Studio by selecting the LLVMPlatform Toolset. The toolset is not part of the installer, but may be installedseparately from theVisual Studio Marketplace.To use the toolset, select a project in Solution Explorer, open its PropertyPage (Alt+F7), and in the “General” section of “Configuration Properties”change “Platform Toolset” to LLVM. Doing so enables an additional PropertyPage for selecting the clang-cl executable to use for builds.

To use the toolset with MSBuild directly, invoke it with e.g./p:PlatformToolset=LLVM. This allows trying out the clang-cl toolchainwithout modifying your project files.

It’s also possible to point MSBuild at clang-cl without changing toolset bypassing /p:CLToolPath=c:\llvm\bin /p:CLToolExe=clang-cl.exe.

When using CMake and the Visual Studio generators, the toolset can be set with the -T flag:

cmake -G"Visual Studio 16 2019" -T LLVM ..

When using CMake with the Ninja generator, set the CMAKE_C_COMPILER andCMAKE_CXX_COMPILER variables to clang-cl:

cmake -GNinja -DCMAKE_C_COMPILER="c:/Program Files (x86)/LLVM/bin/clang-cl.exe" -DCMAKE_CXX_COMPILER="c:/Program Files (x86)/LLVM/bin/clang-cl.exe" ..

Command-Line Options¶

To be compatible with cl.exe, clang-cl supports most of the same command-lineoptions. Those options can start with either / or -. It also supportssome of Clang’s core options, such as the -W options.

Options that are known to clang-cl, but not currently supported, are ignoredwith a warning. For example:

clang-cl.exe: warning: argument unused during compilation: '/AI'

To suppress warnings about unused arguments, use the -Qunused-arguments option.

Options that are not known to clang-cl will be ignored by default. Use the-Werror=unknown-argument option in order to treat them as errors. If theseoptions are spelled with a leading /, they will be mistaken for a filename:

clang-cl.exe: error: no such file or directory: '/foobar'

Please file a bugfor any valid cl.exe flags that clang-cl does not understand.

Execute clang-cl /? to see a list of supported options:

CL.EXE COMPATIBILITY OPTIONS: /? Display available options /arch:<value> Set architecture for code generation /Brepro- Emit an object file which cannot be reproduced over time /Brepro Emit an object file which can be reproduced over time /clang:<arg> Pass <arg> to the clang driver /C Don't discard comments when preprocessing /c Compile only /d1PP Retain macro definitions in /E mode /d1reportAllClassLayout Dump record layout information /diagnostics:caret Enable caret and column diagnostics (on by default) /diagnostics:classic Disable column and caret diagnostics /diagnostics:column Disable caret diagnostics but keep column info /D <macro[=value]> Define macro /EH<value> Exception handling model /EP Disable linemarker output and preprocess to stdout /execution-charset:<value> Runtime encoding, supports only UTF-8 /E Preprocess to stdout /FA Output assembly code file during compilation /Fa<file or directory> Output assembly code to this file during compilation (with /FA) /Fe<file or directory> Set output executable file or directory (ends in / or \) /FI <value> Include file before parsing /Fi<file> Set preprocess output file name (with /P) /Fo<file or directory> Set output object file, or directory (ends in / or \) (with /c) /fp:except- /fp:except /fp:fast /fp:precise /fp:strict /Fp<filename> Set pch filename (with /Yc and /Yu) /GA Assume thread-local variables are defined in the executable /Gd Set __cdecl as a default calling convention /GF- Disable string pooling /GF Enable string pooling (default) /GR- Disable emission of RTTI data /Gregcall Set __regcall as a default calling convention /GR Enable emission of RTTI data /Gr Set __fastcall as a default calling convention /GS- Disable buffer security check /GS Enable buffer security check (default) /Gs Use stack probes (default) /Gs<value> Set stack probe size (default 4096) /guard:<value> Enable Control Flow Guard with /guard:cf, or only the table with /guard:cf,nochecks. Enable EH Continuation Guard with /guard:ehcont /Gv Set __vectorcall as a default calling convention /Gw- Don't put each data item in its own section /Gw Put each data item in its own section /GX- Disable exception handling /GX Enable exception handling /Gy- Don't put each function in its own section (default) /Gy Put each function in its own section /Gz Set __stdcall as a default calling convention /help Display available options /imsvc <dir> Add directory to system include search path, as if part of %INCLUDE% /I <dir> Add directory to include search path /J Make char type unsigned /LDd Create debug DLL /LD Create DLL /link <options> Forward options to the linker /MDd Use DLL debug run-time /MD Use DLL run-time /MTd Use static debug run-time /MT Use static run-time /O0 Disable optimization /O1 Optimize for size (same as /Og /Os /Oy /Ob2 /GF /Gy) /O2 Optimize for speed (same as /Og /Oi /Ot /Oy /Ob2 /GF /Gy) /Ob0 Disable function inlining /Ob1 Only inline functions which are (explicitly or implicitly) marked inline /Ob2 Inline functions as deemed beneficial by the compiler /Ob3 Same as /Ob2 /Od Disable optimization /Og No effect /Oi- Disable use of builtin functions /Oi Enable use of builtin functions /Os Optimize for size (like clang -Os) /Ot Optimize for speed (like clang -O3) /Ox Deprecated (same as /Og /Oi /Ot /Oy /Ob2); use /O2 instead /Oy- Disable frame pointer omission (x86 only, default) /Oy Enable frame pointer omission (x86 only) /O<flags> Set multiple /O flags at once; e.g. '/O2y-' for '/O2 /Oy-' /o <file or directory> Set output file or directory (ends in / or \) /P Preprocess to file /Qvec- Disable the loop vectorization passes /Qvec Enable the loop vectorization passes /showFilenames- Don't print the name of each compiled file (default) /showFilenames Print the name of each compiled file /showIncludes Print info about included files to stderr /source-charset:<value> Source encoding, supports only UTF-8 /std:<value> Language standard to compile for /TC Treat all source files as C /Tc <filename> Specify a C source file /TP Treat all source files as C++ /Tp <filename> Specify a C++ source file /utf-8 Set source and runtime encoding to UTF-8 (default) /U <macro> Undefine macro /vd<value> Control vtordisp placement /vmb Use a best-case representation method for member pointers /vmg Use a most-general representation for member pointers /vmm Set the default most-general representation to multiple inheritance /vms Set the default most-general representation to single inheritance /vmv Set the default most-general representation to virtual inheritance /volatile:iso Volatile loads and stores have standard semantics /volatile:ms Volatile loads and stores have acquire and release semantics /W0 Disable all warnings /W1 Enable -Wall /W2 Enable -Wall /W3 Enable -Wall /W4 Enable -Wall and -Wextra /Wall Enable -Weverything /WX- Do not treat warnings as errors /WX Treat warnings as errors /w Disable all warnings /X Don't add %INCLUDE% to the include search path /Y- Disable precompiled headers, overrides /Yc and /Yu /Yc<filename> Generate a pch file for all code up to and including <filename> /Yu<filename> Load a pch file and use it instead of all code up to and including <filename> /Z7 Enable CodeView debug information in object files /Zc:char8_t Enable C++20 char8_t type /Zc:char8_t- Disable C++20 char8_t type /Zc:dllexportInlines- Don't dllexport/dllimport inline member functions of dllexport/import classes /Zc:dllexportInlines dllexport/dllimport inline member functions of dllexport/import classes (default) /Zc:sizedDealloc- Disable C++14 sized global deallocation functions /Zc:sizedDealloc Enable C++14 sized global deallocation functions /Zc:strictStrings Treat string literals as const /Zc:threadSafeInit- Disable thread-safe initialization of static variables /Zc:threadSafeInit Enable thread-safe initialization of static variables /Zc:trigraphs- Disable trigraphs (default) /Zc:trigraphs Enable trigraphs /Zc:twoPhase- Disable two-phase name lookup in templates /Zc:twoPhase Enable two-phase name lookup in templates /Zi Alias for /Z7. Does not produce PDBs. /Zl Don't mention any default libraries in the object file /Zp Set the default maximum struct packing alignment to 1 /Zp<value> Specify the default maximum struct packing alignment /Zs Run the preprocessor, parser and semantic analysis stagesOPTIONS: -### Print (but do not run) the commands to run for this compilation --analyze Run the static analyzer -faddrsig Emit an address-significance table -fansi-escape-codes Use ANSI escape codes for diagnostics -fblocks Enable the 'blocks' language feature -fcf-protection=<value> Instrument control-flow architecture protection. Options: return, branch, full, none. -fcf-protection Enable cf-protection in 'full' mode -fcolor-diagnostics Use colors in diagnostics -fcomplete-member-pointers Require member pointer base types to be complete if they would be significant under the Microsoft ABI -fcoverage-mapping Generate coverage mapping to enable code coverage analysis -fcrash-diagnostics-dir=<dir> Put crash-report files in <dir> -fdebug-macro Emit macro debug information -fdelayed-template-parsing Parse templated function definitions at the end of the translation unit -fdiagnostics-absolute-paths Print absolute paths in diagnostics -fdiagnostics-parseable-fixits Print fix-its in machine parseable form -flto=<value> Set LTO mode to either 'full' or 'thin' -flto Enable LTO in 'full' mode -fmerge-all-constants Allow merging of constants -fmodule-file=<module_name>=<module-file> Use the specified module file that provides the module <module_name> -fmodule-header=<header> Build <header> as a C++20 header unit -fmodule-output=<path> Save intermediate module file results when compiling a standard C++ module unit. -fms-compatibility-version=<value> Dot-separated value representing the Microsoft compiler version number to report in _MSC_VER (0 = don't define it; default is same value as installed cl.exe, or 1933) -fms-compatibility Enable full Microsoft Visual C++ compatibility -fms-extensions Accept some non-standard constructs supported by the Microsoft compiler -fmsc-version=<value> Microsoft compiler version number to report in _MSC_VER (0 = don't define it; default is same value as installed cl.exe, or 1933) -fno-addrsig Don't emit an address-significance table -fno-builtin-<value> Disable implicit builtin knowledge of a specific function -fno-builtin Disable implicit builtin knowledge of functions -fno-complete-member-pointers Do not require member pointer base types to be complete if they would be significant under the Microsoft ABI -fno-coverage-mapping Disable code coverage analysis -fno-crash-diagnostics Disable auto-generation of preprocessed source files and a script for reproduction during a clang crash -fno-debug-macro Do not emit macro debug information -fno-delayed-template-parsing Disable delayed template parsing -fno-sanitize-address-poison-custom-array-cookie Disable poisoning array cookies when using custom operator new[] in AddressSanitizer -fno-sanitize-address-use-after-scope Disable use-after-scope detection in AddressSanitizer -fno-sanitize-address-use-odr-indicator Disable ODR indicator globals -fno-sanitize-ignorelist Don't use ignorelist file for sanitizers -fno-sanitize-cfi-cross-dso Disable control flow integrity (CFI) checks for cross-DSO calls. -fno-sanitize-coverage=<value> Disable specified features of coverage instrumentation for Sanitizers -fno-sanitize-memory-track-origins Disable origins tracking in MemorySanitizer -fno-sanitize-memory-use-after-dtor Disable use-after-destroy detection in MemorySanitizer -fno-sanitize-recover=<value> Disable recovery for specified sanitizers -fno-sanitize-stats Disable sanitizer statistics gathering. -fno-sanitize-thread-atomics Disable atomic operations instrumentation in ThreadSanitizer -fno-sanitize-thread-func-entry-exit Disable function entry/exit instrumentation in ThreadSanitizer -fno-sanitize-thread-memory-access Disable memory access instrumentation in ThreadSanitizer -fno-sanitize-trap=<value> Disable trapping for specified sanitizers -fno-standalone-debug Limit debug information produced to reduce size of debug binary -fno-strict-aliasing Disable optimizations based on strict aliasing rules (default) -fobjc-runtime=<value> Specify the target Objective-C runtime kind and version -fprofile-exclude-files=<value> Instrument only functions from files where names don't match all the regexes separated by a semi-colon -fprofile-filter-files=<value> Instrument only functions from files where names match any regex separated by a semi-colon -fprofile-generate=<dirname> Generate instrumented code to collect execution counts into a raw profile file in the directory specified by the argument. The filename uses default_%m.profraw pattern (overridden by LLVM_PROFILE_FILE env var) -fprofile-generate Generate instrumented code to collect execution counts into default_%m.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var) -fprofile-instr-generate=<file_name_pattern> Generate instrumented code to collect execution counts into the file whose name pattern is specified as the argument (overridden by LLVM_PROFILE_FILE env var) -fprofile-instr-generate Generate instrumented code to collect execution counts into default.profraw file (overridden by '=' form of option or LLVM_PROFILE_FILE env var) -fprofile-instr-use=<value> Use instrumentation data for coverage testing or profile-guided optimization -fprofile-use=<value> Use instrumentation data for profile-guided optimization -fprofile-remapping-file=<file> Use the remappings described in <file> to match the profile data against names in the program -fprofile-list=<file> Filename defining the list of functions/files to instrument -fsanitize-address-field-padding=<value> Level of field padding for AddressSanitizer -fsanitize-address-globals-dead-stripping Enable linker dead stripping of globals in AddressSanitizer -fsanitize-address-poison-custom-array-cookie Enable poisoning array cookies when using custom operator new[] in AddressSanitizer -fsanitize-address-use-after-return=<mode> Select the mode of detecting stack use-after-return in AddressSanitizer: never | runtime (default) | always -fsanitize-address-use-after-scope Enable use-after-scope detection in AddressSanitizer -fsanitize-address-use-odr-indicator Enable ODR indicator globals to avoid false ODR violation reports in partially sanitized programs at the cost of an increase in binary size -fsanitize-ignorelist=<value> Path to ignorelist file for sanitizers -fsanitize-cfi-cross-dso Enable control flow integrity (CFI) checks for cross-DSO calls. -fsanitize-cfi-icall-generalize-pointers Generalize pointers in CFI indirect call type signature checks -fsanitize-coverage=<value> Specify the type of coverage instrumentation for Sanitizers -fsanitize-hwaddress-abi=<value> Select the HWAddressSanitizer ABI to target (interceptor or platform, default interceptor) -fsanitize-memory-track-origins=<value> Enable origins tracking in MemorySanitizer -fsanitize-memory-track-origins Enable origins tracking in MemorySanitizer -fsanitize-memory-use-after-dtor Enable use-after-destroy detection in MemorySanitizer -fsanitize-recover=<value> Enable recovery for specified sanitizers -fsanitize-stats Enable sanitizer statistics gathering. -fsanitize-thread-atomics Enable atomic operations instrumentation in ThreadSanitizer (default) -fsanitize-thread-func-entry-exit Enable function entry/exit instrumentation in ThreadSanitizer (default) -fsanitize-thread-memory-access Enable memory access instrumentation in ThreadSanitizer (default) -fsanitize-trap=<value> Enable trapping for specified sanitizers -fsanitize-undefined-strip-path-components=<number> Strip (or keep only, if negative) a given number of path components when emitting check metadata. -fsanitize=<check> Turn on runtime checks for various forms of undefined or suspicious behavior. See user manual for available checks -fsplit-lto-unit Enables splitting of the LTO unit. -fstandalone-debug Emit full debug info for all types used by the program -fstrict-aliasing Enable optimizations based on strict aliasing rules -fsyntax-only Run the preprocessor, parser and semantic analysis stages -fwhole-program-vtables Enables whole-program vtable optimization. Requires -flto -gcodeview-ghash Emit type record hashes in a .debug$H section -gcodeview Generate CodeView debug information -gline-directives-only Emit debug line info directives only -gline-tables-only Emit debug line number tables only -miamcu Use Intel MCU ABI -mllvm <value> Additional arguments to forward to LLVM's option processing -nobuiltininc Disable builtin #include directories -Qunused-arguments Don't emit warning for unused driver arguments -R<remark> Enable the specified remark --target=<value> Generate code for the given target --version Print version information -v Show commands to run and use verbose output -W<warning> Enable the specified warning -Xclang <arg> Pass <arg> to the clang compiler

The /clang: Option¶

When clang-cl is run with a set of /clang:<arg> options, it will gather allof the <arg> arguments and process them as if they were passed to the clangdriver. This mechanism allows you to pass flags that are not exposed in theclang-cl options or flags that have a different meaning when passed to the clangdriver. Regardless of where they appear in the command line, the /clang:arguments are treated as if they were passed at the end of the clang-cl commandline.

The /Zc:dllexportInlines- Option¶

This causes the class-level dllexport and dllimport attributes to not applyto inline member functions, as they otherwise would. For example, in the codebelow S::foo() would normally be defined and exported by the DLL, but whenusing the /Zc:dllexportInlines- flag it is not:

struct __declspec(dllexport) S { void foo() {}}

This has the benefit that the compiler doesn’t need to emit a definition ofS::foo() in every translation unit where the declaration is included, as itwould otherwise do to ensure there’s a definition in the DLL even if it’s notused there. If the declaration occurs in a header file that’s widely used, thiscan save significant compilation time and output size. It also reduces thenumber of functions exported by the DLL similarly to what-fvisibility-inlines-hidden does for shared objects on ELF and Mach-O.Since the function declaration comes with an inline definition, users of thelibrary can use that definition directly instead of importing it from the DLL.

Note that the Microsoft Visual C++ compiler does not support this option, andif code in a DLL is compiled with /Zc:dllexportInlines-, the code using theDLL must be compiled in the same way so that it doesn’t attempt to dllimportthe inline member functions. The reverse scenario should generally work though:a DLL compiled without this flag (such as a system library compiled with VisualC++) can be referenced from code compiled using the flag, meaning that thereferencing code will use the inline definitions instead of importing them fromthe DLL.

Also note that like when using -fvisibility-inlines-hidden, the address ofS::foo() will be different inside and outside the DLL, breaking the C/C++standard requirement that functions have a unique address.

The flag does not apply to explicit class template instantiation definitions ordeclarations, as those are typically used to explicitly provide a singledefinition in a DLL, (dllexported instantiation definition) or to signal thatthe definition is available elsewhere (dllimport instantiation declaration). Italso doesn’t apply to inline members with static local variables, to ensurethat the same instance of the variable is used inside and outside the DLL.

Using this flag can cause problems when inline functions that would otherwisebe dllexported refer to internal symbols of a DLL. For example:

void internal();struct __declspec(dllimport) S { void foo() { internal(); }}

Normally, references to S::foo() would use the definition in the DLL fromwhich it was exported, and which presumably also has the definition ofinternal(). However, when using /Zc:dllexportInlines-, the inlinedefinition of S::foo() is used directly, resulting in a link error sinceinternal() is not available. Even worse, if there is an inline definition ofinternal() containing a static local variable, we will now refer to adifferent instance of that variable than in the DLL:

inline int internal() { static int x; return x++; }struct __declspec(dllimport) S { int foo() { return internal(); }}

This could lead to very subtle bugs. Using -fvisibility-inlines-hidden canlead to the same issue. To avoid it in this case, make S::foo() orinternal() non-inline, or mark them dllimport/dllexport explicitly.

Finding Clang runtime libraries¶

clang-cl supports several features that require runtime library support:

Address Sanitizer (ASan): -fsanitize=address
Undefined Behavior Sanitizer (UBSan): -fsanitize=undefined
Code coverage: -fprofile-instr-generate -fcoverage-mapping
Profile Guided Optimization (PGO): -fprofile-generate
Certain math operations (int128 division) require the builtins library

In order to use these features, the user must link the right runtime librariesinto their program. These libraries are distributed alongside Clang in thelibrary resource directory. Clang searches for the resource directory bysearching relative to the Clang executable. For example, if LLVM is installedin C:\Program Files\LLVM, then the profile runtime library will be locatedat the pathC:\Program Files\LLVM\lib\clang\11.0.0\lib\windows\clang_rt.profile-x86_64.lib.

For UBSan, PGO, and coverage, Clang will emit object files that auto-link theappropriate runtime library, but the user generally needs to help the linker(whether it is lld-link.exe or MSVC link.exe) find the library resourcedirectory. Using the example installation above, this would mean passing/LIBPATH:C:\Program Files\LLVM\lib\clang\11.0.0\lib\windows to the linker.If the user links the program with the clang or clang-cl drivers, thedriver will pass this flag for them.

The auto-linking can be disabled with -fno-rtlib-defaultlib. If that flag isused, pass the complete flag to required libraries as described for ASan below.

If the linker cannot find the appropriate library, it will emit an error likethis:

$ clang-cl -c -fsanitize=undefined t.cpp$ lld-link t.obj -dlllld-link: error: could not open 'clang_rt.ubsan_standalone-x86_64.lib': no such file or directorylld-link: error: could not open 'clang_rt.ubsan_standalone_cxx-x86_64.lib': no such file or directory$ link t.obj -dll -nologoLINK : fatal error LNK1104: cannot open file 'clang_rt.ubsan_standalone-x86_64.lib'

To fix the error, add the appropriate /libpath: flag to the link line.

For ASan, as of this writing, the user is also responsible for linking againstthe correct ASan libraries.

If the user is using the dynamic CRT (/MD), then they should addclang_rt.asan_dynamic-x86_64.lib to the link line as a regular input. Forother architectures, replace x86_64 with the appropriate name here and below.

If the user is using the static CRT (/MT), then different runtimes are usedto produce DLLs and EXEs. To link a DLL, passclang_rt.asan_dll_thunk-x86_64.lib. To link an EXE, pass-wholearchive:clang_rt.asan-x86_64.lib.

Windows System Headers and Library Lookup¶

clang-cl uses a set of different approaches to locate the right system librariesto link against when building code. The Windows environment uses libraries fromthree distinct sources:

Windows SDK
UCRT (Universal C Runtime)
Visual C++ Tools (VCRuntime)

The Windows SDK provides the import libraries and headers required to buildprograms against the Windows system packages. Underlying the Windows SDK is theUCRT, the universal C runtime.

This difference is best illustrated by the various headers that one would findin the different categories. The WinSDK would contain headers such asWinSock2.h which is part of the Windows API surface, providing the Windowssocketing interfaces for networking. UCRT provides the C library headers,including e.g. stdio.h. Finally, the Visual C++ tools provides the underlyingVisual C++ Runtime headers such as stdint.h or crtdefs.h.

There are various controls that allow the user control over where clang-cl willlocate these headers. The default behaviour for the Windows SDK and UCRT is asfollows:

Consult the command line.
Anything the user specifies is always given precedence. The followingextensions are part of the clang-cl toolset:
- /winsysroot:
The /winsysroot: is used as an equivalent to -sysroot on Unixenvironments. It allows the control of an alternate location to be treatedas a system root. When specified, it will be used as the root where theWindows Kits is located.
- /winsdkversion:
- /winsdkdir:
If /winsysroot: is not specified, the /winsdkdir: argument is consultedas a location to identify where the Windows SDK is located. Contrary to/winsysroot:, /winsdkdir: is expected to be the complete path ratherthan a root to locate Windows Kits.
The /winsdkversion: flag allows the user to specify a version identifierfor the SDK to prefer. When this is specified, no additional validation isperformed and this version is preferred. If the version is not specified,the highest detected version number will be used.
Consult the environment.
TODO: This is not yet implemented.
This will consult the environment variables:
- WindowsSdkDir
- UCRTVersion
Fallback to the registry.
If no arguments are used to indicate where the SDK is present, and thecompiler is running on Windows, the registry is consulted to locate theinstallation.

The Visual C++ Toolset has a slightly more elaborate mechanism for detection.

Consult the command line.
- /winsysroot:
The /winsysroot: is used as an equivalent to -sysroot on Unixenvironments. It allows the control of an alternate location to be treatedas a system root. When specified, it will be used as the root where theVC directory is located.
- /vctoolsdir:
- /vctoolsversion:
If /winsysroot: is not specified, the /vctoolsdir: argument is consultedas a location to identify where the Visual C++ Tools are located. If/vctoolsversion: is specified, that version is preferred, otherwise, thehighest version detected is used.
Consult the environment.
- /external:[VARIABLE]
  This specifies a user identified environment variable which is treated asa path delimiter (;) separated list of paths to map into -imsvcarguments which are treated as -isystem.
- INCLUDE and EXTERNAL_INCLUDE
  The path delimiter (;) separated list of paths will be mapped to-imsvc arguments which are treated as -isystem.
- LIB (indirectly)
  The linker link.exe or lld-link.exe will honour the environmentvariable LIB which is a path delimiter (;) set of paths to consult forthe import libraries to use when linking the final target.
The following environment variables will be consulted and used to form pathsto validate and load content from as appropriate:
VCToolsInstallDir
VCINSTALLDIR
Path
Consult ISetupConfiguration [Windows Only]
Assuming that the toolchain is built with USE_MSVC_SETUP_API defined andis running on Windows, the Visual Studio COM interface ISetupConfigurationwill be used to locate the installation of the MSVC toolset.
Fallback to the registry [DEPRECATED]
The registry information is used to help locate the installation as a finalfallback. This is only possible for pre-VS2017 installations and isconsidered deprecated.

Restrictions and Limitations compared to Clang¶

Strict Aliasing¶

Strict aliasing (TBAA) is always off by default in clang-cl. Whereas in clang,strict aliasing is turned on by default for all optimization levels.

To enable LLVM optimizations based on strict aliasing rules (e.g., optimizationsbased on type of expressions in C/C++), user will need to explicitly pass-fstrict-aliasing to clang-cl.