Generating Gamma Random Variable in CUDA in Parallel
Recently, I had to write a random variable generator for Gamma Distributions. I couldn’t find one for Gamma distribution in cuRAND API of CUDA version 8.0. After googling about how to generate a gamma random variable I found out two rejection methods for doing so in its Wikipedia page under the section “Generating gamma-distributed random variables”. Out of these two methods “Marsaglia and Tsang’s Method” which is a transformation-rejection based method is the one used in many other scientific softwares like Matlab and GSL. The I also found the original paper here. If you want a detailed description of the transformation-rejection based method go through it. Here is a nice blog post about the same.
Let’s first setup Visual Studio. I am using VS2013 Ultimate on my machine which has a i7 processor and a gtx960m graphics card. In Visual Studio create a new “Cuda Project”. Remove the example code and keep only the main function.
Now let’s add the code for host and device memory required. We will also have to create the cudaState variables to be able to generate random numbers in parallel. We will assign a unique cudaState variable to each thread. curand_init function is used to create these state variables. Quoting from the cuRAND API Documentation,
The curand_init() function sets up an initial state allocated by the caller using the given seed, sequence number, and offset within the sequence. Different seeds are guaranteed to produce different starting states and different sequences. The same seed always produces the same state and the same sequence. The state set up will be the state after (267 ⋅ sequence + offset) calls to curand() from the seed state.
For the highest quality parallel pseudorandom number generation, each experiment should be assigned a unique seed. Within an experiment, each thread of computation should be assigned a unique sequence number. If an experiment spans multiple kernel launches, it is recommended that threads between kernel launches be given the same seed, and sequence numbers be assigned in a monotonically increasing way. If the same configuration of threads is launched, random state can be preserved in global memory between launches to avoid state setup time.
As described earlier we would need two samples, from a uniform and a normal distributions respectively. We would use CUDA api functions curand_normal and curand_uniform for this purpose. We now need to pass the curandState variables, we created earlier, to these functions. The function prototypes are:
Now let’s write the function that would generate a gamma random variable. Note that I have used the double2 (a 2-by-1 double vector) data structure for passing the alpha and beta parameters.
Here’s the final code for generating gamma random variables on gpu and printing them.
Now we will verify the implementation by plotting a histogram of the samples generated in MATLAB and comparing it against histogram of samples generated inside MATLAB. The code for verifying and the histogram plots are given below. Check in the main function of the code above, we dumped the samples created to a csv file, gamout.csv; we would need that file here. Copy that file to the current working directory of MATLAB so that it can load the samples from the file to an array.