What are the differences and tradeoffs between -march=haswell, -march=core-avx2, and -mavx2 for compiling avx2 intrinsics? I know that -mavx2 is a flag and -march=haswell/core-avx2 are architectures which just translate to a bunch of flags. So -mavx2 is a subset of the other two. But beyond that, how do I choose the right one for my application?
-march=foo implies -mtune=foo unless you also specify a different -mtune. This is one reason why using -march is better than just enabling options like -mavx without doing anything about tuning. Caveat: -march=native on a CPU that GCC doesn't specifically recognize will still enable new instruction sets that GCC can detect, but will leave -mtune=generic. Use a new enough GCC that knows about ...
For -O0, whether -march=native or -march=As I understand it, -march=native will detect the ISA and extensions to use from cpuid (which include model, family and stepping information). -march=xxx will use a baseline set of extensions and a baseline ISA. There are a lot of possible combinations of extensions, so only the most relevant were chosen (e.g. skylake-avx512 was added to reflect an important extension of some skylakes). -march ...
Using Clang 16.0 or later, I would like to know what values could be used for the -march argument. The command clang --print-supported-cpus shows for -mcpu=, but I see no alternative for -march.