
Usage:

   rthpearson(x,y)

for the two input vectors x and y

> cor.test(x,y,method="pearson")$estimate
     cor
0.504088
> rthpearson(x,y)
[1] 0.504088

# dual-core, hyperthreaded
> n <- 50000000
> x <- runif(n)
> y <- x + runif(n)
> system.time(cor.test(x,y,method="pearson"))
   user  system elapsed
  3.603   0.569   4.204
> source("rthpearson.R")
> system.time(rthpearson(x,y))
   user  system elapsed
  3.025   0.578   2.007


Straightforward application of thrust::inner_product() and
thrust::reduce().

However, wasteful, due to multiple Thrust calls, each of which has
startup time, especially in the GPU case.

A potentially faster version would use the chunking approach used in
rthhist().

