BAT: A Benchmark suite for AutoTuners
the code by ?nding the best possible values for a given architecture. To our knowledge, there are currently no standardized benchmark suites for comparing and testing autotuners. Developers of autotuners thus make their own when presenting and comparing autotuners. We thus present BAT, a Benchmark suite for AutoTuners with HPC-based parameterized GPU programs. CUDA programs and kernels from "The Scalable Heterogeneous Computing (SHOC) Benchmark" are parameterized. BAT contains a varied selection of benchmarks of different complexity that can utilize multiple GPUs on one system, either by running the same program and computations on multiple nodes, or by splitting the work between nodes. BAT contains 9 di?erent HPC benchmarks that provide a large search space of autotuning parameters, and are modified to suite many di?erent autotuners. BAT also includes a CLI that facilitates autotuning with the benchmarks. Our benchmark suite is tested with four di?erent autotuners, OpenTuner, Kernel Tuner, CLTune and KTT. They di?er in setup and how they tune. The impact of the di?erent benchmark parameters on the running time across architectures is analyzed. Test systems used include a DGX-2, IBM Power System AC922 with Tesla V100-SXM2 32 GB GPUs, an RTX Titan, a GeForce GTX 980 and a server with 20 Tesla T4 GPUs.