Yihui Xie
(xie@yihui.name / GitHub
/ Twitter) Department of Statistics, Iowa State University; interested in statistical computing and graphics; author of knitr, animation and a few other R packages.
This article was borrowed from my blog
post to show how to visualize a
large amount of data in scatter plots. Here is how the original data was generated:
Original scatter plot
It is not useful since you can see nothing.
Transparent colors
We take alpha = 0.1 to generate semi-transparent colors.
Set axis limits
Zoom into the point cloud:
Smaller symbols
Use smaller points:
Subset
Only take a look at a random subset:
Hexagons
We can use the color of hexagons to denote the number of points in them:
2D kernel density estimation
We can estimate the two-dimensional density surface using the kde2d() function in the MASS
package:
That is only a static plot, and we can actually interact with the surface (e.g. rotating and
zooming) if we draw it with the rgl package:
Run the code below to see the surface rotating automatically if you are interested: