What is the Degree of Freedom?
In statistics, the term "degree of freedom" (often abbreviated as "df" or "dof") refers to the number of values in the final calculation of a statistic that are free to vary. It plays a crucial role in various statistical tests and distributions, including the Chi-Square distribution, t-distribution, and F-distribution. The concept of degrees of freedom can be understood differently depending on the context, so let's explore it in a few common situations:
T-Test and ANOVA (Analysis of Variance):
In a one-sample t-test, the degrees of freedom are (n - 1), where 'n' is the sample size. These represent the number of values in the sample that are free to vary after you've calculated the sample mean.
In independent samples t-tests or ANOVA, there are two sets of degrees of freedom: one for each group or sample. The total degrees of freedom for the entire analysis are the sum of the degrees of freedom for all groups.
For example, in a two-sample t-test with sample sizes 'n1' and 'n2', the degrees of freedom for each sample are (n1 - 1) and (n2 - 1), and the total degrees of freedom are (n1 - 1) + (n2 - 1).
Chi-Square Test:
In the Chi-Square Test for Independence, the degrees of freedom are calculated as (r - 1) × (c - 1), where 'r' is the number of rows and 'c' is the number of columns in the contingency table.
In the Chi-Square Goodness-of-Fit Test, the degrees of freedom depend on the number of categories or parameters in the model.
F-Test (Analysis of Variance):
In the context of ANOVA or regression analysis, the degrees of freedom represent the number of independent pieces of information in the numerator and denominator of the F-statistic.
The numerator degrees of freedom are associated with the variability between groups or models, while the denominator degrees of freedom are associated with the variability within groups or residuals.
Linear Regression:
In simple linear regression, you have two degrees of freedom: one for the slope coefficient and one for the intercept coefficient.
Degrees of freedom are essential because they determine the critical values for various statistical tests, which, in turn, affect the interpretation of the test results. In essence, degrees of freedom quantify how much information is available in your data to estimate or test certain parameters or relationships. They are used to calculate the critical values for hypothesis testing and to determine the shape of probability distributions, such as the t-distribution, F-distribution, and Chi-Square distribution, which are fundamental in statistical analysis.
We have seen this curious expression in two settings now in the use of the t-test and when using the Chi-squared test. What exactly are degrees of freedom, anyway? More specifically, let's look at situations where these words come up. The expression
(where x mean is the average of the xs),is associated with N - 1 degree of freedom. Similarly, in a 2 * 2 table of counts, we always say the chi-squared statistic has one degree of freedom. The general rule is that the degree of freedom is the number of data points to which you can assign any value. Let's see how to apply this rule, When we look at the expression sum(xi-mean x)^2, there are N terms in the sum. Each term is a squared difference between an observation x and the average x mean of all the xs. Let us look at these differences and write them down.
We have d1=xi- mean x, d2=x2-mean x.........dN-mean x.
There are N differences di, but notice that these must always sum to 0. Adding up all of the values on the right-hand side gives a sum of the xs minus N times their average. The d must sum to 0 no matter what values the x's are.
So how many differences d can we freely choose to be any values we want, and still have them add up to 0? The answer is all of them, except for the last one. The last one is determined by all of the others so that they all sum to 0. Notice also that it does not matter which d we call the "last". We can freely choose, N-1 values, and the one remaining value is determined by all of the others. Now let us examine the use of a degree of freedom when discussing the chi-squared test, As an example, let us return to the data given below.
The chi-squared statistic measures the association of rose and columns. In the present example, an association of interest is between developing lung cancer and exposure to fungicides. The test is independent of the number of mice allocated to exposure or not and the number of mice that eventually develop tumors or not. The significance level of the test should only reflect the "inside" counts of this table and not these marginal totals.
Learn PYTHON
0 Comments