t-distribution

t-Distribution

The distribution

The t-distribution is a symmetric bell-shaped distribution very similar to the Normal distribution. The distribution is due to British William Sealy Gosset (1876-1937) who published in 1908 the paper "On the probable error of the mean". It is widely known as "Student's t-distribution, after Gosset's pseudonym. It was developed to account for the evaluation of small samples (say < 50) where the sample mean and standard deviation can deviate significantly from the mean and standard deviation in the whole population. The name "t" probably refers to "test". The t-distribution has a larger spread than the normal, which reflects the problems with parameters derived from small samples.

The probability density and the distribution functions are ugly formulas, which are available in Wikipedia. Most basic statistics textbooks will have tables of t for degrees of freedom up to 50 and several significance levels. Excel has the function T.DIST().

Degrees of freedom

Degrees of freedom are related to sample size and the number of variables estimated in a statistical procedure. The mean of a sample of n independent numbers has n - 1 degrees of freedom ("df"). If we have know the values of n-1 samples, and the mean of the n samples we can not change the n-th sample value at will, because then the mean would change. So only n - 1 df. In general the number of df is n minus the number of parameters estimated from the sample. A good explanation has been given by Walker (1970) in geometric terms and and more. In case data are not independent, the above rule would yield exaggerated df; a correction can be applied.

T-value in the case of multivariate regression

A multivariate regression results in a list of coefficients and their standard deviations. The overall succes of the regression is measured by the amount of explained variance of the dependent variable, "R-square". Each coefficient has its own t-value, which is obtained by dividing the coefficient by its standard deviation. This provides a sigificance test for the individual coefficient by consulting a table of t-values to see if it is sufficiently different from zero.

Another use of these t-values is to square them, sum these squares and normalize by setting the sum to 1.0. Then these normalized values give the relative contributions of the variables in the multivariate regression, by showing which part of the R-square they represent.

Top

Home