Kernel Class Files¶
Each kernel in the Suite is implemented in a class whose header and
implementation files reside in the src subdirectory named for the group in
which the kernel lives. A kernel class is responsible for implementing all
operations that manage data, execute, and record execution timing, checksums,
and other information for each variant and tuning of a kernel. To properly
integrate into the RAJA Performance Suite framework, the kernel class must be
a subclass of the KernelBase base class that defines the interface for
kernels in the Suite. The KernelBase.hpp header file resides in the
src/common directory.
Continuing with the example we started discussing above, we add the
ADD.hpp header file for the ADD class to the stream directory
along with multiple implementation files. We describe the contents of these
files in the following sections:
ADD.cppcontains methods to set up and tear down the memory for the ADD kernel, and compute and record a checksum on the result after it executes. It also specifies ADD kernel information in theADDclass constructor.
ADD-Seq.cppcontains sequential CPU variants and tunings of the kernel.
ADD-OMP.cppcontains OpenMP CPU multithreading variants and tunings of the kernel.
ADD-OMPTarget.cppcontains OpenMP target offload variants and tunings of the kernel.
ADD-Cuda.cppcontains CUDA GPU variants and tunings of the kernel.
FOO-Hip.cppcontains HIP GPU variants and tunings of the kernel.
FOO-Sycl.cppcontains SYCL variants and tunings of the kernel.
Note
All kernels in the Suite follow the same file organization and implementation pattern. Inspection of the files for any individual kernel will help to understand how the kernel class implementations are organized.
Important
If a new execution back-end variant is added that is not listed
here, that variant should be placed in a file named to clearly
distinguish the back-end implementation, such as
ADD-<back-end>.cpp. Keeping the variants for each back-end
in a separate file helps to understand compiler optimization
when looking at generated assembly code, for example, and also
to work with vendors on issues.
Kernel class header file¶
In its entirety, the ADD kernel class header file ADD.hpp is:
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//
// Copyright (c) Lawrence Livermore National Security, LLC and other
// RAJA Project Developers. See top-level LICENSE and COPYRIGHT
// files for dates and other details. No copyright assignment is required
// to contribute to RAJA Performance Suite.
//
// SPDX-License-Identifier: (BSD-3-Clause)
//~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~//
///
/// ADD kernel reference implementation:
///
/// for (Index_type i = ibegin; i < iend; ++i ) {
/// c[i] = a[i] + b[i];
/// }
///
#ifndef RAJAPerf_Stream_ADD_HPP
#define RAJAPerf_Stream_ADD_HPP
#define ADD_DATA_SETUP \
Real_ptr a = m_a; \
Real_ptr b = m_b; \
Real_ptr c = m_c;
#define ADD_BODY \
c[i] = a[i] + b[i];
#include "common/KernelBase.hpp"
namespace rajaperf
{
class RunParams;
namespace stream
{
class ADD : public KernelBase
{
public:
ADD(const RunParams& params);
~ADD();
void setSize(Index_type target_size, Index_type target_reps);
void setUp(VariantID vid, size_t tune_idx);
void updateChecksum(VariantID vid, size_t tune_idx);
void tearDown(VariantID vid, size_t tune_idx);
void defineSeqVariantTunings();
void defineOpenMPVariantTunings();
void defineOpenMPTargetVariantTunings();
void defineKokkosVariantTunings();
void defineCudaVariantTunings();
void defineHipVariantTunings();
void defineSyclVariantTunings();
void runSeqVariant(VariantID vid);
void runOpenMPVariant(VariantID vid);
void runOpenMPTargetVariant(VariantID vid);
void runKokkosVariant(VariantID vid);
template < size_t block_size >
void runCudaVariantImpl(VariantID vid);
template < size_t block_size >
void runHipVariantImpl(VariantID vid);
template < size_t work_group_size >
void runSyclVariantImpl(VariantID vid);
private:
static const size_t default_gpu_block_size = 256;
using gpu_block_sizes_type = integer::make_gpu_block_size_list_type<default_gpu_block_size>;
Real_ptr m_a;
Real_ptr m_b;
Real_ptr m_c;
};
} // end namespace stream
} // end namespace rajaperf
#endif // closing endif for header file include guard
The key parts of a kernel class header file are:
Copyright statement at the top of the file.
Note
Each file in the RAJA Performance Suite must start with a boilerplate comment for the project copyright information.
Reference implementation, which is a comment section that shows a C-style implementation of the kernel. This is basically what the
Base_Seqvariant of the kernel looks like. All kernel variants should produce results close to the reference version.Uniquely-named include guard that guards the contents of the header file. All such guards have the form
RAJAPerf_<group name>_<kernel name>_HPP.Macro definitions that contain source lines of code that appear in multiple places in the kernel class implementation, such as setting data pointers and operations in the kernel body. While macros obfuscate the code somewhat, we use them to reduce the amount of code we maintain and ensure implementations are consistent across variants.
Class definition derived from the
KernelBaseclass. We describe this in more detail below.
Note
All types, methods, etc. in the RAJA Performance Suite reside in the
rajaperfnamespace.In addition, each kernel class lives in the namespace of the kernel group of which the kernel is a member. For example, here, the
ADDclass is in thestreamnamespace.Each kernel class must be publicly derived from the
KernelBaseclass so that the kernel integrates properly into the Suite.
The class must provide a constructor that takes a reference to a RunParams
object, which contains input parameters for running the Suite – we’ll say more
about this later. The class constructor may or may not allocate storage for
a class object. If it does, the storage should be deallocated in the class
destructor.
Several methods in the KernelBase class are pure virtual and the derived
kernel class must provide implementations of those methods. These methods
take a VariantID argument and a tuning index of type size_t. They
include: setUp, updateChecksum, and tearDown.
Other methods in the code above, such as defineCudaVariantTunings are
virtual in the KernelBase class and so they may be provided optionally by
the kernel class for kernel specific operations. The define*VariantTunings
methods specify which variants are implemented and define the tunings that are
available for the kernel.
Some other methods in the code above, such as runSeqVariant and
runCudaVariantImpl are unique to each kernel but the names are expected
by the boilerplate macros used in the kernel source files.
While all method names should be reasonably descriptive of what they do, we’ll provide more details about them when we describe the kernel class implementation in the next section.
Lastly, any data members used in the class implementation are defined,
typically in a private member section so they don’t bleed out of the
kernel class. For example, in the ADD class, we see data members for
default GPU block sizes and a list for holding a set of block sizes for
exploring kernel performance with respect to changes in GPU block size. Also,
there are pointer member variables to hold data arrays for the kernel. Here we
have m_a, m_b``, and m_c for the three arrays used in the ADD kernel.
Note that we use the convention to prefix class data members with m_ so it
is clear in the source code which data are class members and which are local
variables.