Chi-Square Test

How to learn the Chi-Square Test?

The Chi-Square Test was used for the first time by Karl Pearson. Chi-square is the measure of the quantum of difference between observed and expected frequencies.

The Chi-Square (χ²) test is a statistical test used to determine whether there is a significant association or relationship between categorical variables. It is particularly useful for analyzing data where variables are not numerical but fall into distinct categories or groups. The Chi-Square test comes in two main variants: the Chi-Square Test for Independence and the Chi-Square Goodness-of-Fit Test.

Chi-Square Test for Independence:

Purpose: This test is used to assess whether two categorical variables are independent of each other, meaning there is no association or relationship between them.

Hypotheses:

Null Hypothesis (H0): The two categorical variables are independent (no association).

Alternative Hypothesis (Ha): The two categorical variables are not independent (there is an association).

Test Statistic: The Chi-Square statistic is calculated from the observed and expected frequencies of the categories in a contingency table (also known as a cross-tabulation table).

Degrees of Freedom: The degrees of freedom for the Chi-Square test for independence depend on the dimensions of the contingency table and are calculated as (r - 1) × (c - 1), where 'r' is the number of rows and 'c' is the number of columns in the table.

Significance Level: Researchers typically choose a significance level (alpha) to determine the threshold for statistical significance.

Interpretation: If the calculated Chi-Square statistic is greater than the critical value from the Chi-Square distribution table (based on degrees of freedom and significance level), then the null hypothesis is rejected, indicating that there is a significant association between the two categorical variables.

Chi-Square Goodness-of-Fit Test:

Purpose: This test is used to determine whether observed data fits an expected (theoretical) distribution or pattern. It is often used to assess how well the observed data aligns with a particular hypothesis or expected outcome.

Hypotheses:

Null Hypothesis (H0): The observed data fits the expected distribution.

Alternative Hypothesis (Ha): The observed data does not fit the expected distribution.

Test Statistic: The Chi-Square statistic is calculated by comparing the observed frequencies with the expected frequencies based on a theoretical distribution or model.

Degrees of Freedom: The degrees of freedom depend on the specific context and the number of categories or parameters in the model.

Significance Level: A chosen significance level (alpha) is used to determine statistical significance.

Interpretation: If the calculated Chi-Square statistic exceeds the critical value from the Chi-Square distribution table (based on degrees of freedom and significance level), the null hypothesis is rejected, suggesting that the observed data does not fit the expected distribution.

The Chi-Square test is widely used in various fields, including biology, social sciences, market research, and quality control. It is a valuable tool for exploring relationships between categorical variables, detecting deviations from expected patterns, and making inferences about populations based on sample data.

Chi squared test

Chi-Square =sum(O-E)^2/sum(E)

Here, O = Observed Frequency

     E= Expected Frequency

Where E =(RT *CT)/N

Here, RT= The row total for the row containing the cell

   CT=The column total for the column containing the cell

   N = Total number of observations

Degree of Freedom in Chi-Square Test

When we compare the calculated value of the Chi-Square Test with the table value, we have to determine the degree of freedom. By 'degree of freedom' means the number of classes to which the values can be assigned arbitrarily or at will without violating the restrictions or limitations placed. For example, we have to choose any five numbers whose total is 100. Now, our freedom is confined only with the choice of four numbers. The fifth number will be taken up in such a manner that it will make the total 100 with other numbers already chosen. Thus, we are not free in the choice of the fifth number. Suppose, the four numbers which we selected independently are 25,15,18 and 22, but now the fifth number would be 100-(25+15+18+22)=20. Here our degree of freedom will be n-1=5-1=4. This is denoted by the symbol 'v' which is pronounced as Nu.
While attempting the Chi-Square Test, information about rows and columns is given, and the degree of freedom will be calculated as:
v = (r-1)(c-1)
v(Nu) = Degree of Freedom
r = number of rows
c = number of columns
If the information is not given in rows and columns, then the degree of  freedom will be calculated as  
v = n-1

Chi-Square Test Python

from scipy.stats import chi2_contingency
obs = np.array([[2010], [515]])
obs
array([[20, 10], [ 5, 15]])
chi2_contingency(obs)
(6.75, 0.0093747684594349, 1, array([[15., 15.], [10., 10.]]))
The observed chi-squared statistic is equal to 6.75, with an associated
p-value of 0.009.
Since the p-value is quite small, we reject the null hypothesis of no
association between the row and column variable and conclude an
association between these two variables.
Learn More

Post a Comment

0 Comments