Correlation and Regression analysis: concept and uses. Test of Significance

Correlation and Regression analysis: concept and uses. Test of Significance:-
Correlation:-
> The term correlation is a combination of two words 'Co' (together) and the relation between two quantities. Correlation is when it is observed that a change in a unit in one variable is retaliated by an equivalent change in another variable, i.e., direct or indirect, at the time of study of two variables. Or else the variables are said to be uncorrelated when the motion in one variable does not amount to any movement in a specific direction in another variable. It is a statistical technique that represents the strength of the linkage between variable pairs. 
> Correlation can be either negative or positive. If the two variables move in the same direction, i.e. an increase in one variable results in the corresponding increase in another variable, and vice versa, then the variables are considered to be positively correlated. For example, Investment and profit. 
> On the contrary, if the two variables move in different directions so that an increase in one variable leads to a decline in another variable and vice versa, this situation is known as a negative correlation. For example, Product price and demand.
Correlation Analysis:-
- Correlation analysis is applied in quantifying the association between two continuous variables, for example, an dependent and independent variable or among two independent variables.
- The sample of a correlation coefficient is estimated in the correlation analysis. It ranges between -1 and +1, denoted by r and quantifies the strength and direction of the linear association among two variables. The correlation among two variables can either be positive, i.e. a higher level of one variable is related to a higher level of another or negative, i.e. a higher level of one variable is related to a lower level of the other.
- The sign of the coefficient of correlation shows the direction of the association. The magnitude of the coefficient shows the strength of the association.
- For example, a correlation of r = 0.8 indicates a positive and strong association among two variables, while a correlation of r = -0.3 shows a negative and weak association. A correlation near to zero shows the non-existence of linear association among two continuous variables.

Regression:-
> A statistical technique based on the average mathematical relationship between two or more variables is known as regression, to estimate the change in the metric dependent variable due to the change in one or more independent variables. It plays an important role in many human activities since it is a powerful and flexible tool that is used to forecast past, present, or future events based on past or present events. For example, The future profit of a business can be estimated on the basis of past records. 
> There are two variables x and y in a simple linear regression, wherein y depends on x or say that is influenced by x. Here y is called as a variable dependent, or criterion, and x is a variable independent or predictor. The line of regression y on x is expressed as below: 
Y = a + bx
where, 
a = constant,
b = regression coefficient,
The a and b are the two regression parameters in this equation.
Regression Analysis:- Regression analysis refers to assessing the relationship between the outcome variable and one or more variables. The outcome variable is known as the dependent or response variable and the risk elements, and co-founders are known as predictors or independent variables. The dependent variable is shown by “y” and independent variables are shown by “x” in regression analysis.

Test of Significance:-
> Once sample data has been gathered through an observational study or experiment, statistical inference allows analysts to assess evidence in favor or some claim about the population from which the sample has been drawn. The methods of inference used to support or reject claims based on sample data are known as tests of significance.
> Every test of significance begins with a null hypothesis H0. H0 represents a theory that has been put forward, either because it is believed to be true or because it is to be used as a basis for argument, but has not been proved. For example, in a clinical trial of a new drug, the null hypothesis might be that the new drug is no better, on average, than the current drug. We would write H0: there is no difference between the two drugs on average.
> The alternative hypothesis, Ha, is a statement of what a statistical hypothesis test is set up to establish. For example, in a clinical trial of a new drug, the alternative hypothesis might be that the new drug has a different effect, on average, compared to that of the current drug. We would write Ha: the two drugs have different effects, on average. The alternative hypothesis might also be that the new drug is better, on average, than the current drug. In this case we would write Ha: the new drug is better than the current drug, on average.
> The final conclusion once the test has been carried out is always given in terms of the null hypothesis. We either "reject H0 in favor of Ha" or "do not reject H0"; we never conclude "reject Ha", or even "accept Ha".
> If we conclude "do not reject H0", this does not necessarily mean that the null hypothesis is true, it only suggests that there is not sufficient evidence against H0 in favor of Ha; rejecting the null hypothesis then, suggests that the alternative hypothesis may be true.
> Hypotheses are always stated in terms of population parameter, such as the mean . An alternative hypothesis may be one-sided or two-sided. A one-sided hypothesis claims that a parameter is either larger or smaller than the value given by the null hypothesis. A two-sided hypothesis claims that a parameter is simply not equal to the value given by the null hypothesis -- the direction does not matter.
> Hypotheses for a one-sided test for a population mean take the following form:
H0: = k
Ha: > k
or
H0: = k
Ha: < k.
> Hypotheses for a two-sided test for a population mean take the following form:
H0: = k
Ha:  k.
A confidence interval gives an estimated range of values which is likely to include an unknown population parameter, the estimated range being calculated from a given set of sample data.

Comments