Due to restrictions in Thrust, typically the lack of a vector-valued reduction operation, some code here uses a parameter numberofchunks. This quantity should be about the same as the anticipated number of threads. This may not be easy to determine, but here are guidlines: 1. On a multicore system, take numberofchunks to be the number of cores, e.g. 4 on a quad core machine. With hyperthreading, you might try double the number of cores. 2. On a GPU, start with 100 threads, probably a conservative number.