What are the differences and tradeoffs between -march=haswell, -march=core-avx2, and -mavx2 for compiling avx2 intrinsics? I know that -mavx2 is a flag and -march=haswell/core-avx2 are architectures which just translate to a bunch of flags. So -mavx2 is a subset of the other two.

Understanding the Context

But beyond that, how do I choose the right one for my application? Using -march will also allow you more possibilities to use 3rd party closed source as well. You should be able to link -mcpu=cortex-r5 with -march=armv7-r code; well it is fine in one directions, so the tools may complain. For -O0, whether -march=native or -march=<generic> is the default still specifies the same family, so both are perfectly compatibly with -O0; and whenever another optimization level is specified, -march=native is beneficial to performance.

Key Insights

So, for me, the fact that -O0 is the default doesn't matter for -march 's default. As I understand it, -march=native will detect the ISA and extensions to use from cpuid (which include model, family and stepping information). -march=xxx will use a baseline set of extensions and a baseline ISA. There are a lot of possible combinations of extensions, so only the most relevant were chosen (e.g. skylake-avx512 was added to reflect an important extension of some skylakes).

Final Thoughts

-march ... Internet search for "-march=armv8.2-a+i8mm" turns up nearly nothing helpful. Either build_aar.sh is asking for an arch that doesn't make sense, or I need to plug in a version of clang that supports that arch. -march=foo implies -mtune=foo unless you also specify a different -mtune. This is one reason why using -march is better than just enabling options like -mavx without doing anything about tuning. Caveat: -march=native on a CPU that GCC doesn't specifically recognize will still enable new instruction sets that GCC can detect, but will leave -mtune=generic.

Use a new enough GCC that knows about ... -march: generate instructions for a specific machine type. Defaults to x86-64-v3 on AMD64 and armv8-a on AArch64. Use -march=compatibility for best compatibility, or -march=native for best performance if a native executable is deployed on the same machine or on a machine with the same CPU features.