Enoki: structured vectorization and differentiation on modern processors
Abstract
Enoki is a C++ template library that enables automatic transformations of numerical code, for instance to create a “wide” vectorized variant of an algorithm that runs on the CPU or GPU, or to compute gradients via transparent forward/reverse-mode automatic differentation.
The core parts of the library are implemented as a set of header files with no dependencies other than a sufficiently C++17-capable compiler (GCC >= 8.2, Clang >= 7.0, Visual Studio >= 2017). Enoki code reduces to efficient SIMD instructions available on modern CPUs and GPUs—in particular, Enoki supports:
- Intel: AVX512, AVX2, AVX, and SSE4.2,
- ARM: NEON/VFPV4 on armv7-a, Advanced SIMD on 64-bit armv8-a,
- NVIDIA: CUDA via a Parallel Thread Execution (PTX) just-in-time compiler.
- Fallback: a scalar fallback mode ensures that programs still run even if none of the above are available.
Deploying a program on top of Enoki usually serves three goals:
-
Enoki ships with a convenient library of special functions and data structures that facilitate implementation of numerical code (vectors, matrices, complex numbers, quaternions, etc.).
-
Programs built using these can be instantiated as wide versions that process many arguments at once (either on the CPU or the GPU). Enoki is also structured in the sense that it handles complex programs with custom data structures, lambda functions, loadable modules, virtual method calls, and many other modern C++ features.
-
If derivatives are desired (e.g. for stochastic gradient descent), Enoki performs transparent forward or reverse-mode automatic differentiation of the entire program.
Finally, Enoki can do all of the above simultaneously: if desired, it can compile the same source code to multiple different implementations (e.g. scalar, AVX512, and CUDA+autodiff).