Vistat

a reproducible gallery of statistical graphics

Demonstration of the Central Limit Theorem

  • Lijia Yu (yu@lijiayu.net / GitHub / Twitter) A master candidate majoring in Bioinformatics at Beijing Institute of Genomics.

In Probability Theory, the Central Limit Theorem (CLT) states that, given certain conditions, the mean of a sufficiently large number of independent random variables, each with a well-defined mean and well-defined variance, will be approximately normally distributed.

As shown in the Bean Machine article, CLT has a number of variants. This article shows you as long as the conditions of CLT are satisfied, the distribution of the sample mean will be approximate to the Normal distribution when the sample size n is large enough, no matter what is the original distribution.

In the animation package, there is a function named clt.ani(). It shows the distribution of the sample mean when the sample size grows up. The test shapiro.test() is provided as a measure of normality.

Classical Central Limit Theorem

With the parameter FUN in the function clt.ani() you can select distribution which will be shown in the animation. Here is the example with the Poisson distribution.

library(animation)
ani.options(interval = 0.5)
par(mar = c(3, 3, 1, 0.5), mgp = c(1.5, 0.5, 0), tcl = -0.3)
lambda = 4
f = function(n) rpois(n, lambda)
clt.ani(FUN = f, mean = lambda, sd = lambda)

When CLT does not work

The Cauchy distribution is an example of a distribution which has no mean, variance or higher moments defined, so we cannot apply CLT to this distribution.

ani.options(interval = 0.5)
par(mar = c(3, 3, 1, 0.5), mgp = c(1.5, 0.5, 0), tcl = -0.3)
f = function(n) rcauchy(n, location = 0, scale = 2)
clt.ani(FUN = f, mean = NA, sd = NA)

Meta

Keywords: Categories: Reviewer: You can find the R Markdown source document in the vistat repository on GitHub.