ECS 145 Term Project

Due Date

March 17, 11:59 pm. Tip: Act as if the due date is one day before the real one.


You are all familiar with the idea of a histogram. It's designed to show which values of a variable are more frequent and which occur less often.

For instance, R has a number of built-in datasets, one of which is Nile, the height of the Nile river over a multi-year period. Typing

> hist(Nile)

produces this picture:

We see that the most common values seem to be in the 800s, with some rare values in the 400s and 1400s, and so on.

A major issue, though, is the number of bins, controlled by the argument breaks in hist(). If we have too few bins, the bins become very wide and we lose details. In the extreme, we have just 1 bin, totally noninormative. But if we have too many bins, we have too few data points falling in each bin, and those small sample sizes give us unreliable bin counts; the appearance of the histogram then becomes very choppy.

A more advanced alternative to a histogram is a kernel density estimator. Instead of just counting data points in a bin, points outside the bin are counted too, just with small weights. Actually, we don't really have bins, but the details won't concern us here. The key points are that (a) this method yields a smooth curve, which many consider more appealing, and (b) there is still an argument, bw, that controls the "wiggliness" of the curve.

The arguments breaks or bw are called tuning parameters or hyperparameters. They give the user control over some aspect of an algorithm.

The goal of this project is to produce an R package that helps a user explore the effects of setting different values of breaks or bw. The nature of the exploration will be that the user will be repeatedly asked whether she wishes to change the current value of the tuning parameter, zoom in/out, etc. After exploration, the code will save some of the graphs that the user has found useful.


Note carefully!

Important Rules



General commnets: