This set will serve as a sort of clearinghouse for the increasing number of platforms for parallel computation in R, explaining the pros and cons of each. It will also list resources for learning more about parallel computation, both in R and otherwise.
Here we cover mainly these types of hardware:
(In alphabetical order.)
Offers (mostly) transparent distributed objects for some R functions.
Every version of R relies on some version of the BLAS, the Basic Linear Algebra Subroutines. But the one that comes with stock R does not take full advantage of the multicore machines that almost any R user has these days.Here is my my brief introduction.
This is stock R's vehicle for parallel computation, arising from the old multicore and snow packages. The multicore part can be used only on multicore machines, and even then, only Unix-family (Macs, Linux), not Windows. The snow part can be used on anything. These are not always the absolute fastest packages, but they are very easy to use, and do well for lots of applications.
The "un-MapReduce," in the sense of avoiding the confining MapReduce paradigm of Hadoop and Spark while retaining use of distributed files/memory objects as the basis for computation.
Distributed computing, typically as a higher-level interface to MPI, mainly for linear algebra applications. Usually needs very large systems and very large problems to be effective.
This is an R interface to MPI, a very widely used library for exchanging data between computers in a cluster. If you have an application in which individual pairs of cluster nodes need to exchange messages with each other, as opposed to the manager/worker paradigm of R's parallel package, Rmpi is the way to go. Installation can be tricky, though.
An irreverent but insightful and useful introduction to parallel computation in R.