How to use a correlation test before going for the regression analysis?

Vikram Satale
3 min readMar 29, 2021

In machine learning, regression analysis can be thought of as the supervised learning technique. The regression analysis is usually used to predict the continuous variable.

Correlation

You might have come across situations where two variables are related to each other. Like in the situations; demand and supply, height vs weight, share prices of stocks A and B, etc.

In the statistics, it is there is a technique to learn how strong or how weak these variables are associated. This technique is called correlation. Moving ahead we can use the relation to predict one variable if we know the observation in another variable. This is known as regression analysis.

Let us see how to use a correlation test before going for regression analysis.

Pearson correlation test

We can find the Pearson correlation coefficient (r) for the sample using the following formula.

Sample Correlation formula

If the population correlation coefficient (ρ) is significantly high enough then this indicates that there is a strong linear relationship between two variables x and y. If there is a strong linear relationship between two variables then we can use regression analysis.

To test the significance of the population correlation coefficient (ρ) we use the Pearson correlation test as follows.

Procedure

The null and alternative hypotheses are as follows.

H0: The correlation coefficient ρ is not significant OR ρ=0.

Vs

H1: The correlation coefficient ρ is significant (ρ≠0) OR the correlation coefficient ρ is significantly positive (ρ>0) OR the correlation coefficient ρ is significantly negative (ρ<0).

If we have calculated sample correlation coefficient r. Then to test the claim about ρ, either we use the traditional method of finding test statistics or we use the Pearson correlation table of critical values. In this article, we will understand the use of the Pearson correlation table of critical values. The procedure is as follows.

  1. Find the tail of the test. We decide the tail of the test based on an alternative hypothesis. If an alternative hypothesis contains the”>” sign then it is a right-tailed test (One-tailed test). If an alternative hypothesis contains the “<” sign then it is a left-tailed test (One-tailed test). If an alternative hypothesis contains the”≠” sign then it is a two-tailed test.
  2. Find the degrees of freedom (df) = n-2.
  3. Chose the significance level (α).
  4. Search for the critical value in the body of the table corresponding to degrees of freedom.

Decision rule

After finding the critical value we use the decision rule to conclude. The decision rule is as follows.

  1. Two-tailed test: If | r | > critical value then we reject the null hypothesis and conclude that the correlation coefficient ρ is significant otherwise, we conclude that ρ is not significant.
  2. Right-tailed test: If r > critical value then we reject the null hypothesis and conclude that the correlation coefficient ρ is significantly positive otherwise we conclude that ρ is not significantly positive.
  3. Left-tailed test: If -r > critical value then we reject the null hypothesis and conclude that the correlation coefficient ρ is significantly negative otherwise, we conclude that ρ is not significantly negative.

Refer to the following table of Pearson critical values.

Pearson Correlation Table

To read the article in more detail, refer to the innovativestats.blogspot.com

--

--