Chapter 6: Categorical Data
257
From part (b), the distributions of ratings were conditioned on the gender of
the student. From the computed conditional distributions for ratings, we see
that the values are the same for the favorable classification for both male and
female genders. Also, for the unfavorable classification, the conditional
distributions are the same for both genders. Based on the definition of
independence for contingency tables, one will again conclude that there is no
association between the gender of the students and their responses of
favorable or unfavorable. That is, these variables are independent of each
other.
In conclusion, if the faculty member who did the survey knew the gender of
the student, he or she would
not
be at an advantage over one who did not
know the gender of the student in predicting the response.
Note:
Contingency tables are not only restricted to 2x2 classifications. Other areas
of statistics deal with much more complex tables.
6-5 Simpson’s Paradox
It is generally accepted that the larger the data set, the more reliable are the
inferences made from the data set. Simpson's paradox, however, shines a
different light on this general opinion. Simpson's paradox illustrates that a
great deal of thought must be given to the inferences when combining small
data sets into a larger one. Sometimes, inferences from the larger data set
contradict the inferences from the smaller data sets. In addition, the
inferences from the larger data set are also usually incorrect. We will
demonstrate with an illustration.
Illustration
For a 1973 study on gender bias in admissions to the graduate school at the
University of California, Berkeley,
Table 6-10
shows the information
obtained for the five largest majors on that campus.




