Essential Statistical Analysis
1
Types of Statistical Analyses
In this section our team has carefully selected the pain points of students related to statistical analyses when they are working on their research study. The sources used for this section are thoroughly checked and all the definitions as well as methods are authentic and reliable.
Different statistical analyses are appropriate for different types of data. It is essential that you select the appropriate statistical test, because an incorrect test can result in incorrect research conclusions. Basically there are two types of statistical tests: parametric and non-parametric. Your first decision when selecting a statistical test is to determine whether a para-metric or non-parametric test is appropriate. The use of a parametric test requires that three assumptions be met: the variable measured is normally distributed in the population, the data represent an interval or ordinal scale, and the selection of participants is independent. Most variables examined in social science research are normally distributed. Most measures used in social science research represent an interval scale. And the use of random sampling will fulfill the assumption of independent selection of participants. A non-parametric test is an appropriate statistical test to use when the three parametric assumptions are not met and when the data represent an ordinal or nominal scale. The following parametric statistical tests are described below: (a) t-test; (b) analysis of variance (ANOVA), including post-hoc procedures; (c) factorial analysis of variance; (d) analysis of covariance; and (e) multivariate analysis of variance.
​
Parametric and Non-Parametric Tests:
There are procedures that use sample statistics to estimate characteristics of the population. These characteristics of the population are called parameters, and the statistical procedures are referred to as parametric procedures. In addition, parametric statistics are used when the researcher can assume that the population is normally distributed, has homogeneity of variance within different groups, and has data that are interval or ratio in scale. As long as the assumptions on which parametric statistics are, for the most part, met, the researcher uses a t test, ANOVA, ANCOVA, MANOVA, or some other parametric procedure. If these assumptions are not met—that is, if the data are not interval or ratio or are not distributed normally—the researcher should consider using a nonparametric analog to the parametric test. For most parametric procedures, a corresponding non-parametric test can be used. The interpretation of the results is similar with both kinds of tests. What differs is the computational equation and tables for determining the significance level of the results. Both procedures test a hypothesis and report a level of significance for rejecting the null. In contrast to parametric tests, however, nonparametric tests do not test hypotheses about characteristics of a population. Rather, nonparametric procedures test hypotheses about relationships between categorical variables, shapes of distributions, and normality of distribution. While para-metric procedures use means, nonparametric techniques are concerned with frequencies, percentages, and proportions. The parametric tests are generally more powerful in detecting significant differences and are used frequently even when all assumptions cannot be met. Table to your right gives the names of nonparametric tests that are analogous to parametric tests we have already discussed.
These correlational techniques can be used to analyze the degree of relationship between two variables. Because two variables are involved, these techniques are called bivariate correlational statistics. However, many research problems in the involve interrelationships between three or more variables. There are a number of commonly used multivariate correlational statistics: (a) multiple regression, (b) discriminant analysis, (c) canonical correlation, (d) path analysis, (e) factor analysis, and (f) structural equation modeling (Gall etal., 2007).
2
Statistical Significance and NULL Hypothesis
Inferential Statistics:
Inferential statistics deal with inferences about populations based on the results of samples (Gay, Mills, & Airasian, 2006). Most social science research deals with samples drawn from larger populations. Thus, inferential statistics are data analysis procedures for determining the likelihood that results obtained from a sample are the same results that would have been obtained for the entire population.
​
The NULL Hypothesis
The research hypothesis typically states a difference or relationship in the expected direction. A null hypothesis states that no difference or relationship exists. The null hypothesis is preferred when applying statistical tests. You can never prove your hypothesis, only disprove it. Hypothesis testing is a process of disproving or rejecting, and the null hypothesis is best suited for this purpose (see Gall etal., 2007; Gay etal., 2006).The initial step in hypothesis testing then is to establish a null hypothesis. For instance, the null hypothesis for our example can be stated as follows:
“No significant difference exists between the mean mathematics scores of ninth grade students who receive computer mathematics instruction and ninth grade students who receive traditional mathematics instruction.”
After formulating the null hypothesis, the researcher carries out a test of statistical significance to determine whether the null hypothesis can be rejected (i.e., whether there is a true or real difference between the groups).This test enables us to make statements of the type:
“If the null hypothesis is correct, we would find this large a difference between sample means only once in a hundred experiments (p< .01).Because we have found this large a difference, the null hypothesis quite probably is false. Therefore, we will reject the null hypothesis and conclude that the difference between sample means reflects a true difference between population means. (Gall etal., p. 138)”
​
​
What is Statistical Significance?
The purpose of using a test of statistical significance is to determine if the difference in two scores (or the characteristics of the sample is different from the population) is significant or the result of chance fluctuation/or sampling error.
-
To determine statistical significance, the data are analyzed via a formula and a score is obtained—a t score, chi square score, or F score.
-
You then determine the degrees of freedom (number of subjects minus one).
-
Then compare the calculated significance score with the table score in the appropriate probability table using the calculated degrees of freedom. Select the .05 level of confidence. (This would be the minimum level you should accept.)
-
If the calculated significance score is greater than the one in the table (at the .05 level), then this indicates that the difference in student scores is statistically significant at the .05 level of confidence and not the result of chance fluctuation. This means that the difference in student scores is so great that it could only be obtained by chance only 5% of the time. And 95% of the time, the difference between the student scores is the result of the impact of the independent variable on the dependent variable. Hence, the null hypothesis is rejected.
-
If the calculated significance score is less than the one in the probability table (at the .05 level), then this indicates that the difference in student scores is statistically insignificant at the .05 level and is the result of chance fluctuation. This means that the difference in the student scores is so small that the independent variable had no impact on the dependent variable. Therefore, 95% of the time, the difference in student scores is the result of chance fluctuation. And only 5% of the time could the independent variable have impacted the dependent variable. Hence, the null hypothesis is accepted.
3
T-TEST
In many research situations, a mean from one group is compared with a mean from another group to determine the probability that the corresponding population means are different. The most common statistical procedure for determining the level of significance when two means are compared is the t test. The t test is a formula that generates a number, and this number is used to determine the probability level (p level) of rejecting the null hypothesis. Two different forms of the equation are used in the t test, one for independent samples and one for samples that are paired, or dependent. Independent samples are groups of participants that have no relationship to each other; the two samples have different participants in each group, and the participants are usually either assigned randomly from a common population or drawn from two different populations.
​
Example 1:
If you are testing the difference between an experimental group and a control group mean in a posttest-only design, the independent samples t test would be the appropriate statistic. Comparing leadership styles of two groups of superintendents would also utilize an independent samples t test.
The second form of the t test can be referred to by several different names, including paired, dependent samples, correlated, or matched t test. This t test is used in situations in which the participants from the two groups are paired or matched in some way.
Example 2:
A common example of this case is the same group of participants tested twice, as in a pretest–posttest study (e.g., see Pascarella & Lunenburg, 1988). Whether the same or different subjects are in each group, as long as a systematic relationship exists between the groups it is necessary to use the dependent samples t test to calculate the probability of rejecting the null hypothesis. In the Pascarella and Lunenburg (1988) study, elementary school principals from two school districts received leadership training using Hersey and Blanchard’s situational leadership framework. Pretests and post tests were administered to the principals and a sample of their teachers before and after training to determine the effects of training on principals’ leadership effectiveness and style range. The study provided partial support only for Hersey and Blanchard’s situational leadership theory. Using dependent samples t tests, principals were perceived as more effective three years after training than before training: t (principals) = 6.46 (15) <.01and t(teachers) = 3.73 (59) <.01. However, no significant differences were found in principals’ effectiveness immediately following training, nor in principals’ leadership style range before and after training.
Although the formulas and degrees of freedom are different for each form of t test, the interpretation and reporting of the results are the same, the df for the dependent t test is the number of pairs minus one. Thus, you need not worry about whether the correct formula has been used.
​
Example 3:
Amore concrete explanation of using the t test is the following example. Suppose a researcher is interested in finding out whether there is a significant difference between high school boys and girls with respect to mathematics achievement. The research question would be: Is there a difference in mathematics achievement (the dependent variable) of boys compared with girls (the independent variable)? The null hypothesis would be: There is no difference between boys and girls in mathematics achievement. To test this hypothesis, the researcher would randomly select a sample of boys and girls from the population of all high school students. Let us say that the sample mean for boys achievement is 540, and the sample mean for girls is 520. Because we assume the null hypothesis—that the population means are equal—we use the t test to show how often the difference of scores in the samples would occur if the population means are equal. If our degrees of freedom (total sample size minus 1) is60 and the calculated t value 1.29, we can see by referring to the t test table that the probability of attaining this difference in the sample means, for a two-tailed test, is .20, or 20 times out of 100. We accept the null hypothesis and say that there is no statistically significant difference between the mathematics achievement of high school boys and girls.
4
One-Way ANOVA (Analysis of Variance)
If a study is conducted in which two or more sample means are com-pared on one independent variable, then to test the null hypothesis the researcher would employ a procedure called one-way analysis of variance (ANOVA). ANOVA is simply an extension of the t test. Rather than using multiple t tests to compare all possible pairs of means in a study of two or more groups, ANOVA allows you to test the differences between all groups and make more accurate probability statements than when using a series of separate t tests. It is called analysis of variance because the statistical formula uses the variances of the groups and not the means to calculate a value that reflects the degree of differences in the means. Instead of at statistic, ANOVA calculates an F statistic (or F ratio). The F is analogous to the t. It is a three- or four-digit number that is used in a distribution of F table with the degrees of freedom to find the level of significance that you use to reject or not reject the null. There are two degrees of freedom. The first is the number of groups in the study minus one, and the second is the total number of subjects minus the number of groups. These numbers follow the Fin reporting the results of ANOVA. For example, in reporting F(4, 80) = 4.25, the degrees of freedom mean that five group means are being compared and 85 subjects are in the analysis. ANOVA addresses the question: Is there a significant difference between the population means? If the F value that is calculated is large enough, then the null hypothesis (meaning there is no difference among the groups) can be rejected with confidence that the researcher is correct in concluding that at least two means are different.
​
Example:
Let us assume, for example, that a researcher is comparing the quality of school life of three groups of students in urban, suburban, and rural school districts (e.g., see Lunenburg & Schmidt, 1989). The researchers in the cited study selected a random sample from each group, administered a quality of school life (QSL) instrument, and calculated the means and variances of each group. The sample group means for QSL were: urban = 18,rural = 20, and suburban = 25. The null hypothesis that is tested, then, is that the population means of 18, 20, and 25 are equal, or, more correctly, that these are different only by sampling and measurement error. The calculated Fin the study was 5.12 and p< .01. Lunenburg and Schmidt (1989) concluded that at least two of the means were different, and that this conclusion will be right 99 times out of 100. Many other variables were examined in the study referenced above. Results of the ANOVA indicated that the means were different only, but not where the differences occurred. Post hoc procedures are necessary to determine exactly where the differences in mean scores occurred.
5
ANCOVA (Analysis of Covariance)
Analysis of covariance (ANCOVA) is a statistical procedure used incases similar to ones in which a one-way or factorial ANOVA is used. ANCOVA has two major purposes: to adjust initial group differences statistically on one or more variables that are related to the dependent variable but uncontrolled, and to increase the likelihood of finding a significant difference between group means.
The variable that is used in ANCOVA to adjust the scores (in the above example the pretest) is called the covariate or concomitant variable. Covariates are often pretest scores or results from achievement, attitude, or aptitude tests that would be related to the dependent variable. IQ scores and scores on prior standardized achievement tests, for instance, are commonly used as covariates. The second purpose of covariance analysis is to increase what is called the power of the statistical test to find differences between groups. Briefly, power is the probability of detecting a significant difference. It is used to increase power when the sample size is low or when the researcher has reason to believe that the differences between the groups will be small. ANCOVA can be used in several situations: with two groups and one independent variable in place of a ttest; with one independent variable that has more than two groups in place of one-way ANOVA; and with factorial analysis of variance. Studies can also use more than one covariate ina single ANCOVA procedure. The reporting of ANCOVA is very similar to the reporting of ANOVA. Because ANCOVA is used frequently with intact groups, without random assignment, it should be noted that the interpretation of results should weigh the possibility that other uncontrolled and unmeasured variables are also related to the dependent variable and hence may affect the dependent variable. In other words, while statistical adjustment of the effect of the covariate can be achieved, the researcher cannot conclude that the groups are equal in the sense of random assignment.
6
Chi Square
Chi square is a nonparametric statistical test appropriate when the data are in the form of frequency counts or percentages and proportions that can be converted into frequencies (Gay etal., 2006). These frequency counts can be placed into two or more categories. Thus, chi square is appropriate when the data are a nominal scale (e.g., male or female, Democrat or Republican). Achi square test compares the proportions actually observed in a study to the proportions expected, to see if they are significantly different. Expected proportions are usually the frequencies that would be expected if the groups were equal, although occasionally they also may be based on past data. The chi square value increases as the difference between observed and expected frequencies increases.
One-dimensional chi square is the chi square can be used to compare frequencies occurring in different categories or groups.
Example:
For example, you might wish to investigate whether doctoral students prefer to study alone or with others. Tabulation, based on a random sample of100 doctoral students, might reveal that 45 prefer to study alone, and 55prefer to study with others. The null hypothesis of no preference would suggest a 50–50 split. In order to determine whether the groups were significantly different, you would compare the observed frequencies (45, 55) with the expected frequencies (50, 50) using a chi square test of significance.
Two-dimensional chi square is the chi square may be used also when frequencies are categorized along more than one dimension, sort of a factorial chi square.
Example:
In the study sequence example above, you might select a stratified sample, comprising 50 males and 50 females. Responses could then be classified by study preference and by gender, a two-way classification that would allow us to see whether study preference is related to gender. Although 2×2 applications are quite common, contingency tables maybe based on any number of categories, for example, 2×3, 3×3, 2×4, and so forth. When a two-way classification is used, calculation of expected frequencies is a little more complex, but is not difficult.
7
Regression and Prediction
For regression and prediction, you would be assessing if a correlation exists between two variables. You might know the score on one and wish to predict the score on the second variable, and regression is related to how well you can make that prediction. The closer the correlation coefficients are to −1 or +1, the better your predictions become. A strong prediction is evidenced on packs of cigarettes—the more a person smokes, the higher the prediction of the person developing lung cancer or other cancers.
Multiple Regression:
Multiple regression is a prediction equation that determines the correlation between a combination of two or more predictor variables and a criterion variable. For example, we might use undergraduate GPA, graduate GPA, and GRE scores (verbal, quantitative, and analytical) to predict success in graduate school. Multiple regression is one of the most widely used statistical techniques in the social sciences, because of its versatility and precision. It can be used to analyze data from any of the major quantitative research designs: experimental, causal-comparative, and correlational. It can handle nominal, ordinal, interval, and ratio data. And it provides not only the relationship between variables but also the magnitude of the relationships.
​
Example:
The use of multiple regression is illustrated in a relationship study con-ducted by Lunenburg and Columba (1992). The purpose of their study was to identify personality factors and education level that distinguish high performing urban principals from average performing urban principals (n= 179). Four independent criteria were used to measure principal performance (dependent variable): supervisor’s ratings, paired comparis on ratings, peer nomination ratings, and teacher ratings, resulting in an over-all performance score. Sixteen personality factors (measured by the 16PF)and education level (master’s or doctorate) were the predictors (independent variables). Stepwise multiple regression analysis revealed that factors E (dominant), M (imaginative), Q2 (self-sufficient), A (warm), and doctoral education level were consistent predictors of superior performance of urban principals.
The first step in multiple regression usually is to compute the correlation between the best single predictor variable and the criterion variable. This procedure yields a multiple correlation coefficient (R). Because Factor E: dominant is the best predictor, it is the first predictor entered into the multiple regression. Unless you specify otherwise, the computer program will start the multiple regression analysis with the most powerful predictor of the criterion variable. Suppose you have not specified the order in which the predictor variables are to be entered into the multiple regression analysis. In this case, after selecting the best predictor, the computer program will search for the next best predictor of the criterion variable. This second predictor is not chosen on the basis of its product-moment correlation (r) with the criterion. Rather, the second predictor is chosen on the basis of how well it improves on the prediction achieved by the first variable. What qualities should a variable have to be a good second predictor? First, it should correlate as little as possible with the first predictor variable. The second quality of a good second predictor is obvious: It should correlate as highly as possible with the criterion variable. In short, a good second predictor is one that correlates as little as possible with the first predictor and as highly as possible with the criterion. The third predictor entered in the multiple regression analysis is deter-mined by whether it improves on the prediction made by the first two predictors. The computer program will keep adding predictor variables until none are left. Each new predictor variable will contribute less to R than the preceding predictor, however, in which case there are rapidly diminishing returns for adding new predictors.
At this point we can consider further the meaning of R. The multiple correlation coefficient (R)is a measure of the magnitude of the relationship between a criterion variable and some combination of predictor variables. The value of R will increase with each variable that enters the multiple regression analysis. Thus, we see in Table 4.3 that the value of R to predict superior performance gradually increases from .49 to .64 as each predictor variable is added. The value of .64 represents the best prediction one can make of superior principal performance from the predictor variables listed in Table 4.3.The value of R can range from 0.00 to 1.00; negative values are not possible. The larger the R, the better the prediction of the criterion variable. If Ris squared, it will yield a statistic known as the coefficient of determination (R2). The 10th column of Table 4.3 shows the R2coefficient corresponding to the R in the ninth column. For example, the R2coefficient is.41, which is the square of the corresponding R coefficient (.64). R2expresses the amount of variance in the criterion variable that is explained by a predictor variable or combination of predictor variables. Each b value in the multiple regression equation is a regression weight, which can vary from −1.00 to 1.00. A regression weight (sometimes called a b weight) is a multiplier term added to each predictor variable in a regression equation in order to maximize the predictive value of the variables. The b weights were converted to beta (B) weights. Beta weights are the regression weights in a multiple regression equation in which all the variables in the equation are in standard score form. Some researchers prefer beta weights because they form an absolute scale. For example, a beta weight of +.40 is of greater magnitude than a beta weight of +.30, irrespective of the predictor variable with which it is associated. In contrast, the magnitude of a b weight is dependent on the scale form of the predictor measure with which it is associated.
8
Discriminant Analysis
Discriminant analysis is a statistical procedure related to multiple correlation. It uses a number of predictor variables to classify subjects into two or more distinct groups such as dropouts versus persisters, successful versus unsuccessful students, delinquents versus non delinquents, and so on. The criterion in discriminant analysis is a person’s group membership. The procedure results in an equation, or discriminant function, where the scores on the predictors are multiplied by weights to predict the classification of subjects into groups. When there are just two groups, the discriminant function is essentially a multiple correlation equation with the group membership criterion coded 0 or 1. But with three or more groups as the criterion, discriminant analysis goes beyond multiple correlation.
Example:
Discriminant analysis might be used to identify predictors of success in a school of education doctoral program. You could identify the variables that discriminated membership into one of two groups: those who successfully completed doctoral study and those who did not. A number of different predictors might be used: Miller Analogies Test (MAT) scores, Graduate Record Examination (GRE) scores, undergraduate GPA, graduate GPA, time lapse between the master’s degree and entrance into the doctoral program, doctoral major, age at entrance, gender, race/ethnicity, and marital status. Complex correlational analysis would produce an equation showing the variables that were significant in predicting the criterion, success or lack of success in a doctoral program.
9
Canonical Correlation
Canonical correlation is a generalization of multiple regression that adds more than one dependent variable to the prediction equation. Recall that a multiple correlation coefficient shows the correlation between the “best” combination of independent variables and a single dependent variable. Canonical correlation extends the analysis to more than one dependent variable. In other words, canonical correlation is an analysis with several independent variables and several dependent variables. It takes into account the X and Y scores, the relations between the X variables, between the Y variables, and between the X and Y sets of variables. The result is a canonical correlation coefficient that represents the maximum correlation possible between sets of X scores and sets of Y scores. It also indicates the relative contributions of the separate independent and dependent variables to the canonical correlation, so you can see which variables are most important to the relationships between the sets. For more information on canonical correlation, see Thompson’s Canonical Correlation Analysis (1984).
10
Path Analysis
Path analysis is a set of statistical procedures designed to test a hypothesized causal model about the relationships among three or more variables. Using theory and existing knowledge, the researcher proposes a causal model and then applies path analysis to determine if the causal model is consistent with the empirical data. Models inconsistent with the data are rejected, whereas those not rejected are viewed as plausible causal patterns to be subjected to further investigation.
​
Example:
Lunenburg (1984, 1991a, 1991b), Lunenburg and Cadavid (1992), Lunenburg and O’Reilly (1974), and Lunenburg and Mankowsky (2000) tested a path model of the influences affecting teachers’ job satisfaction. The model used in the four studies hypothesized that job satisfaction is a function of a teachers’ belief systems, locus of control, perceptions of organizational climate, dimensions of school bureaucratization, pupil control orientation and behavior, and several demographic variables including, gender, age, father’s education, mother’s education, teacher’s academic achievement, teaching experience, teacher’s commitment to the teaching profession, and race and ethnicity in the later studies(Lunenburg & Cadavid, 1992; Lunenburg & Mankowsky, 2000). There were significant relationships between teachers’ job satisfaction and perceptions of teachers’ belief system, locus of control, dimensions of school bureaucratization, organizational climate, and pupil-control orientation and behavior. The researchers concluded that the prototypic profile of the dissatisfied teacher is one who has a closed-minded belief system, an external locus of control, perceives a closed organizational climate, a high level of school bureaucratization, and has both a custodial pupil-control orientation and behavior. The analysis was done separately for samples of African American teachers and White teachers. In general, the path coefficients were similar for African American teachers and White teachers. One exception was found when satisfaction was related to academic achievement. Lower-achieving White teachers tended to be more satisfied in their jobs than did their higher-achieving counterparts; for African Americans, no such difference was found. Another observed difference occurred when satisfaction was related to gender. African American men tended to be more satisfied with their jobs than African American women, but White women tended to be more satisfied than White men. This finding is consistent with a study by Culver, Wolfe, and Cross (1990). For both groups of White and AfricanAmerican teachers, demographic variables such as age, sex, father’s education, mother’s education, and years of teaching experience were found to be of little importance to job satisfaction.
11
Factor Analysis
Another widely used procedure based on correlation is factor analysis. This procedure analyzes the inter correlations among a large set of measures to identify a smaller number of common factors. Factors are hypothetical constructs assumed to underlie different types of psychological measures, such as intelligence, aptitude, achievement, personality, and attitude measures. Factor analysis indicates the extent to which tests or other instruments are measuring the same thing, enabling researchers to deal with a smaller number of constructs. Some factor analysis studies of intelligence tests, for example, have identified underlying verbal, numerical, spatial, memory, and reasoning factors. The first steps in factor analysis involve selecting variables to be included in the analysis and developing a correlation matrix that shows the correlation of each measure with every other measure. There may be a very large number of correlations in the matrix. The matrix is then subjected to computations with a factor analysis computer program that produces clusters of variables that inter correlate highly within the cluster but have low correlations with other clusters. These clusters are the factors, and the object is to identify a smaller number of separate under-lying factors that can account for the covariation among the larger number of variables.