I've long been a fanatic on debugging, so much so that I wrote a book about it. Since I now do a lot of R programming, I am especially interested in methods and tools for debugging R code. This Web page will be devoted to such matters.
I must emphasize that the above phrase, "methods and tools," means what it says. There actually are general methods for debugging, beginning with what I call The Fundamental Principle of Debugging, and they will be presented here along with the various tools.
The crude way to confirm is to temporarily add print statements to your code. But all that adding and subtracting of print lines, rerunning of the program and so on (a) wastes valuable time (b) distracts from your focus and (c) is just plain tiring.
To make your debugging faster and more enjoyable, use debugging tools. Fortunately, many tools are available for R.
Another big advantage of using debugging tools is that they will tell you the location of an execution error.
The most important point is The Fundamental Principle of Debugging:
As you run through the code step by step, confirm that what you think is true, is true. Is the value of x really 3, as you expect? Does the code take the "else" branch of an if-then-else as expected? Eventually you will discover a point in the execution of the code that does NOT confirm. You will then have pinpointed the location of the bug, and can then focus on its location.
Another one, The Principle of Starting Simple:
In debugging, start out with very simple cases.
The R parallel package, which is included in base R, was written as an amalgamation, with some modification, of two popular user-contributed R packages, snow and multicore. Here we are interested in the former, and all references below to "parallel" will mean that portion of the package.The parallel pakage is a simple and powerful package for efficiently parallelizing many applications. However, it presents major challenges for debugging. Source of the problem:
The problem is that one has R running without a terminal. Say one runs something like makeCluster(2). That launches two new R processes on the machine from which the call is made. Let's call the original R process the manager, and the two launched ones workers. The workers take their input from the manager, rather than from a keyboard. In fact, for them there is no keyboard. That becomes a major issue, since for instance R's basic debugging functions, e.g. debug() and browser(), rely on keyboard input.
Accordingly, we must prepare ourselves for a somewhat lower level of convenience in debugging tools.
There are some helpful tools in our partools package. (Note: partools consists of a lot more than simply debugging tools. If interested, read the package's vignette.)Solution I:
As mentioned, debugging via print statements is actually counterproductive. But some may feel it is a good approach for simple code, so how might we do this in the case of parallel?
Once again, an obstacle is that the worker R processes do not have a terminal. Not only do they not have a keyboard, they also don't have a screen. So even print statements, e.g. print(x), won't work.
One solution is to write to a file instead of to the screen. We might for instance insert code like
cat(x, '\n', file = 'debugoutput', append=TRUE)
which would write the value of x to the file debugoutput
Note, however, that we would also need to know which worker wrote that message! The partools function dbsmsg() writes a separate file for each worker, with names dbs.01, dbs.02 and so on. For example, if worker 3 calls dbsmsg(x), the value of x at worker 3 will be written to the file dbs.03.Solution II:
The built-in function dump.frames() prints out a stack trace in the event of an execution error. The partools function dbsdump() will have each worker call dump.frames().Solution III:
Ideally, we would like to use R's debug() function at each worker, but again, the lack of a keyboard and screen makes direct usage impossible. Fortunately, the designers of the parallel package gave us a workaround.
In calling makeCluster() to set up a parallel cluster, we can use the argument manual = TRUE to set up a real terminal for each worker. Then we can use debug() and so on. This can be tedious. The partools function dbs() automates most of the process. (Available only on Unix-family systems, i.e. Macs and Linux.)