Research Design in Occupational Education Copyright 1997. James P. Key. Oklahoma State University Except for those materials which are supplied by different departments of the University (ex. IRB, Thesis Handbook) and references used by permission.

SOLUTIONS TO STATISTICS PROBLEMS

Descriptive Statistics (S1)

Correlation (S2)

Regression (S3)

Inferential Statistical Terms (S4)

"t" Test (S5)

Analysis of Variance (S6)

Chi Square (S7)

Module S1

1. Describe information or data through the use of numbers
2. See Descriptive Statistics
3. graphically
4.

 Symbol Parameter Statistic Variance Mean Sum Sum X Observation of Variable s2 Variance Standard Deviation s Standard Deviation Mean n Number of Observations

5. a. Frequency Graph; b. Bar Graph or Histogram; c. Frequency Polygon; d. Frequency Curve

6. See drawings

Module S2

1. A quantifiable relationship between two variables

2. Two

4.

5. There is no significant difference between the hours of study and competency rating.

6. The calculated value of .935 was compared with the table value of .444 (based on .05 level of significance and 18 degrees of freedom). Since the calculated value is greater than the table value, we reject the null hypothesis.

7. Representative sample (Random); Normal population; Interval measures; Linearity; Homoscedasticity

8. Multiple correlation describes the degree of relationship between a variable and two or more variables considered simultaneously.

9. Partial correlation describes the relationship between two variables after controlling or partialing out the confounding relationship of another variable(s).

10. Point biserial coefficient Continuous versus dichotomous

Biserial coefficient Continuous versus dichotomized

Phi coefficient Dichotomous versus dichotomous

Tetrachoric coefficient Dichotomized versus dichotomized

11. eta

Module S3

1. A prediction of the levels of one variable when another is held constant at several levels

2.

 Symbol Definition Equation estimated y; value of y axis across from point on regression line for the predictor x value a intercept point of regression line and y axis b slope of regression line x arbitrarily chosen value of predictor variable

3. a = .75; b = 1.75

2.5 hours - 5.125; 4.5 hours - 8.625

.76

From the prediction of 5.125 competency rating from 2.5 hours practicing and studying, we can expect to predict a competency rating varying from 4.365 to 5.885 with approximately 68% accuracy, to predict a competency rating varying from 3.605 to 6.645 with approximately 95% accuracy, or to predict a competency rating varying from 2.845 to 7.405 with 99% accuracy.

4. Representative sample (Random); Normal distribution; Interval measures; Linearity; Homoscedasticity

5. Multiple regression predicts the value of one variable from the values of two or more variables.

6. We must be careful about predicting beyond the data and the variables we are predicting must be like those upon which the regression equation was built or the prediction has no basis.

Module S4

1. See Inferential

2. See Inferential

3. The level of significance is the probability of a Type I error that an investigator is willing to risk in rejecting a null hypothesis.

Module S5

1. When attempting to determine if the difference between two means is greater than that expected from chance; the data is from a normal population and at least ordinal in nature

2.

3. n1 = n2 = 5

F = s2(largest)/s2 (smallest) = 2.5/2.5 = 1 with 4 and 4 degrees of freedom; F.05 with 4 and 4 degrees of freedom = 6.39; 1 < 6.39 so assume s12 = s22

Since sample sizes and variances are equal, either the separate variance formula or pooled variance formula may be used.

4. -4

5. df = 5 + 5 - 2 = 8

6. The calculated value of -4 was compared with the table value of ±3.355 (based on .01 level of significance and 8 degrees of freedom). Since |-4| > |3.355|, we reject the null hypothesis.

Module S6

1. The purpose of ANOVA is to test for significant differences among two or more groups.

2. Single classification ANOVA tests a relationship between a dependent variable and one independent variable, while multiple classification ANOVA tests a relationship between a dependent variable and two or more independent variables.

3. The variance is an average distance of the raw scores in a distribution of numbers from the mean of that distribution.

4. The average variance of subgroups are compared to the variance of the total group.

5. Calculate the sums of squares of deviations of the observations from their mean; calculate the sums of squares for the total group by combining the subgroups; subtract the within group sums of squares from the total group sums of squares to derive the among group sums of squares; divide the among and within sums of squares by their degrees of freedom to obtain their means squares; divide the among mean square by the within mean square to obtain the calculated F value

6. among group mean square and within group mean square

7. Representative sample (Random); Normal populations; Representative samples; At least ordinal measurement; Homogeneous variances

8. There are no significant differences among the means of number of chinups junior high boys can do after varying weeks of practice.

 Source of Variation Sums of Deviations Squared df Mean Square F Among (Between) 40 2 20 8 Within (Error) 30 12 2.5 Total 70 14

The calculated F value of 8 was compared with the table F value of 6.93 (based on .01 level of significance and 2 and 12 degrees of freedom). Since 8 > 6.93, we reject the null hypothesis.

Module S7

1. The purpose is test the difference between an actual sample and another hypothetical or previously established distribution such as that which may be expected due to chance or probability. Nonparametric techniques are usually easier to computer and can be used on nominal data.

2.

3. A one way classification is used when the number of responses, objects, or people fall in two or more categories; a two way classification is used when the number of responses, objects, or people fall in two or more categories with two or more groups.

4. one independent variable = (r - 1), where r is number of levels of independent variable

two independent variables = (r - 1)(s - 1), where r and s are the number of levels of first and second independent variables, respectively

three independent variables = (r - 1)(s - 1)(t - 1), where r, s, and t are the number of levels of the first, second, and third independent variables, respectively

5. Data in frequency form (nominal data); Independent observations; Sample size adequate; Distribution basis must be decided on before data is collected; Sum of observed frequencies must equal sum of expected frequencies.

6. It corrects when there is only one degree of freedom.

7. Parametric statistics test hypotheses based on the assumption that the samples come from normally distributed populations and there is homogeneity of variance. The level of measurement is at least ordinal or interval. Nonparametric statistics test hypotheses that do not require normal distributions or variance assumptions and are designed for ordinal or nominal data.

8. Ordinal or nominal

9.

 P1 P2 P3 Observed responses (Fo) 39 25 56 Expected responses (Fe) 40 40 40 Fo - Fe 1 -15 16 (Fo - Fe)2 1 225 256 .025 2.625 6.4 12.05

Degrees of freedom - (number of levels - 1) = 2

X2.05 = 5.99

Ho = Accept or Reject? Reject

10.

 Observed Expected (Fo-Fe-0.5) (Fo-Fe-0.5)2 Continue 7 5 1.5 2.25 .45 Discontinue 3 5 -1.5 2.25 .45 10 10 .90

Degrees of freedom - (number of levels - 1) = 1

X2.05 = 3.84

Ho = Accept or Reject? Accept

11.

 A U D Row Subtotals Males 20 (40) 10 (15) 70 (45) 100 Females 60 (40) 20 (15) 20 (45) 100 Column Subtotals 80 30 90 200

Degrees of Freedom = (Rows - 1)(Columns - 1) = (2-1)(3-1) = 2

X2.05 = 5.99

Ho = Accept or Reject? Reject