Hi everyone
I am happy to announce the fully implementation of Function Multiversioning (FMV) in Clear
Linux* OS for Intel® Architecture
(
https://clearlinux.org/features/function-multiversioning-fmv)
But what is FMV?
Imagine that you are developing software that could work in multiple platforms. At the end
of the day, it could be running anywhere, maybe on a server or a home computer. While
Intel architecture provides many powerful instruction set extensions, it is challenging
for developers to generate code that takes advantage of these capabilities.
Currently we as developers have these choices:
-> Write multiple versions of their code, each targeting different instruction set
extensions, and manually handle runtime dispatching of these versions
-> Generate multiple versions of their binary, each targeting a different platform
-> Choose a minimum hardware requirement that will not take advantage of newer
platforms
This seems like a lot of work. Wouldn’t it be better to optimize the same functions for
multiple architectures and execute them when the binary detects the architecture at
runtime? This feature exists and is known as Function Multiversioning (FMV). FMV is a
compiler feature that is capable of optimizing the same code for multiple architectures,
automatically selecting the correct architecture-specific version of the code at runtime.
The Clear Linux* Project for Intel® Architecture is currently the only Linux distribution
to support Function Multiversioning in C code, making it easier to develop applications
that take advantage of the enhanced instructions of the Intel architecture.
For example, consider the AVX2 instruction set extension, introduced in the 4th Generation
Intel® Core™ processor family (formerly known as Haswell). Normally, telling the compiler
to use AVX2 instructions would limit our binary to Haswell and newer processors. With FMV,
the compiler can generate AVX2-optimized versions of the code and will automatically, at
runtime, ensure that only the appropriate versions are used. In other words, when the
binary is run on Haswell or later generation CPUs, it will use Haswell-specific
optimizations, and when that same binary is run on a pre-Haswell generation processor, it
will fall back to using the standard instructions supported by the older processor.
You must be wondering if this is possible, so let’s use a simple array addition but with
some modifications (FMV) :
#define MAX 1000000
int a[256], b[256], c[256];
__attribute__((target_clones("arch=core-avx2","arch=atom","arch=slm","default"),noinline))
void foo(){
int i,x;
for (x=0; x<MAX; x++){
for (i=0; i<256; i++){
a[i] = b[i] + c[i];
}
}
}
int main () {
foo();
return 0;
}
As you can see in the __attribute__ line you can specify the architecture you want your
binary to run. Actually this binary has the same time execution that one build with -mavx2
flag but also could run in an atom system.
In Clear Linux Project for Intel Architecture, our focus is on applying this technology on
packages where we detect that AVX2 instructions can give a possible improvement. The
experiments we have done show that some packages are already optimized internally for
multiple instruction sets, so FMV would not be needed there. Other compiler optimization
techniques can take advantage of the profile data to perform additional optimizations
based on how the code behaves (AutoFDO). We use these optimizations and FMV to improve
the performance as much as possible. We invite the community to use this new feature and
release the power of your code within multiple architectures with little effort. Write
once and deploy everywhere!
Regards
Victor Rodriguez
Intel Open Source Technology Center