Codes for implementing algorithms mentioned in this chapter are available at my Github repository. Each script runs the respective algorithm with example data.
1. Moments (of a distribution)
Moments are statistics that describe the data’s tendencies to cluster around certain values, or the shape of the distribution. The $n$-th moment of a continuous function $f(x)$ about a value $c$ is:
\[\mu_n = \int_{-\infty}^{\infty} (x-c)^n f(x) \mathop{dx}\]For a probability density function (pdf), the zeroth moment is obviously $1$. The first moment would be the expected value (or mean), i.e. $\mathrm{E}[X]$ (we usually assume $c = 0$ for the first moment). The second (central) moment would be the variance, i.e. $\mathrm{E}[(X-\mu)^2]$ (we usually assume $c = \mathrm{E}[X]$ for second or higher moments).
For discrete distributions, the first moment (mean) is: $\mu = \frac{1}{N} \sum_{i=1}^N x_i$. The second central moment (variance) is $\mathrm{Var}(X) = \frac{1}{N-1} \sum_{i=1}^N (x_i - \mu)^2$. We use $N-1$ instead of $N$ as denominator to account for the bias in sample variance in comparison to population variance, which is by a factor of $\frac{N-1}{N}$ (see Wikipedia). The square root of the variance is the standard deviation: $\sigma(x) = \sqrt{\mathrm{Var}(X)}$.
The third moment (skewness) is a measure of asymmetry, usually evaluated as $\mathrm{Skew}(X) = \frac{1}{N} \sum_{i=1}^N (\frac{x_i - \mu}{\sigma(x)})^3$. A positive value of skewness suggests that the distribution is right-skewed (or right-tailed), with a long tail extending in the positive $x$-axis direction.
Caution from the book: In real life it is good practice to believe in skewnesses only when they are several or many times as large as $\sqrt{\frac{15}{N}}$ (when $\mu$ is true mean) or $\sqrt{\frac{6}{N}}$ (when $\mu$ is sample mean).
The fourth moment (kurtosis) is a measure of how peaked (leptokurtic) or flat (platykurtic) of a distribution is, compared to the normal distribution. This is usually evaluated as $\mathrm{Kurt}(X) = [ \frac{1}{N} \sum_{i=1}^N (\frac{x_i - \mu}{\sigma(x)})^4 ] - 3$, where the last term ensures that a normal distribution has a kurtosis of $0$.
In practice, roundoff errors could be magnified when computing central moments, for first computing $\mu$ then subtracting it from every sample. To correct these errors, a corrected two-pass algorithm is usually used for large samples, where a second term summing up the roundoff errors is subtracted. For example, the sample variance could be calculated as:
\[\mathrm{Var}(X) = \frac{1}{N-1} \{ \sum_{i=1}^N (x_i - \mu)^2 - \frac{1}{N} [\sum_{i=1}^N (x_i - \mu)]^2 \}\]