Chapter 6: Categorical Data
237
CHAPTER 6
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
Categorical Data
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
You should study the topics in this chapter if you need to review
or want to learn about
Two-way tables for a pair of categorical variables
Joint, marginal and conditional distributions of categorical variables
Graphical displays for categorical variables
Independence between categorical variables
Simpson’s paradox
In the previous chapter, we dealt with bivariate data for which the variables
were quantitative. In this chapter, we will explore the relationship between
categorical or qualitative variables.
6-1 Introduction
When we are looking for associations between two qualitative variables,
scatter plots will not work to help in this situation since we use scatter plots
to display two quantitative variables. When we are investigating the
association between two or more qualitative or categorical variables, we will
use contingency tables to present the association between them. When there
are just two qualitative variables, the table is usually called a two-way
contingency table or a bivariate frequency table or just a two-way table.
Examples of categorical variables would be gender, college classification,
political affiliation etc. These variables can assume values which are
qualitative. Examples of values for these variables would be male, freshman,
Democrat, etc. A quantitative variable like age could also be converted to a
categorical variable when data are classified into age groups. For example,
ages 20 – 25 would be an example of an appropriate category (or interval) in
which to classify a 24 year old. Bar graphs will be used to display the
relationship between the qualitative variables.




