\name{freqparcoord}
\alias{freqparcoord}

\title{
Parallel coordinates.
}

\description{

(a) Addresses the screen-clutter problem in parallel coordinates, by only
plotting the "most typical" cases, meaning those highest estimated
multivariate density values.  This makes it easier to discern relations
between variables, especially those whose axes are "distant" from each
other.  

(b) One can plot the "least typical" cases, i.e. those with the lowest
density values, in order to find outliers.  

(c) One can plot only cases that are "local maxima" in terms of density,
as a means of performing clustering.  

}

\usage{
freqparcoord <- function(x,m,dispcols=1:ncol(x),grpvar=NULL,
      method="maxdens",faceting="vert",k=NULL,klm=10*k,
            keepidxs=F,plotidxs=F,cls=NULL)
}

\arguments{
   \item{x}{The data, in data frame or matrix form.}
   \item{m}{Number of lines to plot for each group.  A negative value in
      conjunction with \code{method} being "maxdens" indicates that the
      lowest-density lines are to be plotted.}
   \item{dispcols}{Columns of \code{x} to be displayed.}
   \item{grpvar}{Column for the grouping variable, if any (if none, all
      the data is treated as a single group); vector or factor.}
   \item{method}{What to display, "extremedens" for plotting the most
      (or least) typical lines, "locmax" for cluster hunting, or 
      "randsamp" for plotting a random sample of lines.}
   \item{faceting}{How to display groups, if present.  Use "vert" for
      vertical stacking of group plots, "horiz" for horizontal ones, or
      "none" to draw all lines in one plot, color-coding the groups.}
   \item{k}{Number of nearest neighbors to use for density estimation.}
   \item{klm}{If method is "locmax", number of nearest neighbors to 
      use for finding local maxima in cluster hunting.  Generally needs
      to be much larger than \code{k}, to avoid getting a lot of points.}
   \item{keepidxs}{If TRUE, the indices of the rows of \code{x} that 
      are plotted will be stored in a component \code{idxs} of the
      return value.}
   \item{plotidxs}{If TRUE, lines will be annotated with their case
      numbers, i.e. their row numbers within \code{x}.  Use only with
      small values of \code{m}, as overplotting may occur.}
   \item{cls}{Cluster to use (see the \code{parallel} package) for
      parallel computation.}
}


\details{ If \code{method = "extremedens"}, the \code{m} most frequent
(\code{m} positive) or least frequent (\code{m} negative) rows of
\code{x} will be plotted from each group (the nongroup case being
considered one group).   If  \code{method = "locmax"}, the rows having
the property that their density value is highest in their neighborhood
will be plotted.  Otherwise, \code{m} random rows will be displayed.  In
both cases, the lines will be color-coded according to density value.
If \code{cls} is non-null, the computation will be done in parallel.

The data is centered and scaled before plotting.  Note that the selected
rows are still plotted on the scale of the entire data set.

If some variable is constant within such data, scaling is impossible,
and an error message, "arguments imply differing number of rows: 0, 1,"
will appear.  In such case, try a larger value of \code{m}.

}

\author{
Norm Matloff <matloff@cs.ucdavis.edu> and and Yingkang Xie
<ykxie@ucdavis.edu>
}

% \keyword{
% }

% \seealso{
% }

\examples{
# baseball player data courtesy of UCLA Stat. Dept., www.socr.ucla.edu
data(baseball)

# form a 2-process cluster (optional), e.g. for a dual core machine
library(parallel)
c2 <- makeCluster(2)

# plot baseball data, broken down by position category (infield,
# outfield, etc.); plot the 5 higest-density values in each group
freqparcoord(baseball,5,4:6,7,method="maxdens")
# we see that the most typical pitchers are tall and young, while the
# catchers are short and heavy

# find the outliers, 1 for each position 
freqparcoord(baseball,-1,4:6,7)
# for instance we see an infielder of average height and weight, but
# extremely high age, worth looking into

# do the same, but also plot and retain the indices of the rows being
# plotted
p <- freqparcoord(baseball,-1,4:6,7,keepidxs=T,plotidxs=T)
p
p$idxs
# ah, that outlier infielder was case number 674; let's take a look
baseball[674,]
# Julio Franco, 48 years old!

# olive oil data courtesy of Dr. Martin Theus
data(olives)
olv <- olives

# there are 9 olive-oil producing areas of Italy, named Area here
# check whether the area groups have distinct patterns (yes)
freqparcoord(olv,1,3:10,1)

# same check but looking at within-group variation (turns out that some
# variables are more diverse in some areas than others)
freqparcoord(olv,25,3:10,1)

# look at it without stacking the groups
freqparcoord(olv,25,3:10,1,faceting="none")

# pretend we don't know about area; see if we pick them up from
# clustering (yes, somewhat:  about 6 groups suggested; printout 
# shows 6 different areas)
p <- freqparcoord(olv,1,3:10,method="locmax",keepidxs=T)
olv[p$idxs,]

# generate some simulated data with clusters at (0,0), (1,2) and (3,3),
# and see whether "locmax" picks them up
cv <- 0.5*diag(2)
x <- rmixmvnorm(5000,2,3,list(c(0,0),c(1,2),c(3,3)),list(cv,cv,cv))
p <- freqparcoord(x,m=1,method="locmax",keepidxs=T,cls=c2,k=50,klm=800)
xdisp <- x[p$idxs,]
tmp <- order(xdisp[,1])
tmp
xdisp[tmp,]  # this worked very well in this case


}


