 Mathematical Content CPMP Classrooms Helping Your Student Helping with Homework Preparing for Tests Preparing for College Research Base Evidence of Success

# Course 1, Unit 2 - Patterns in Data

Overview
Patterns in Data is an introduction to the analysis of univariate (one variable) data. Throughout this unit students will be developing tools and strategies that will help them make sense of data and communicate their conclusions. The focus is on displaying data (to observe shape, center, and variability/spread) and then computing and interpreting summary statistics such as measures of center (mean, median, and mode) and measures of variability (range, interquartile range, and standard deviation).

Key Ideas from Course 1, Unit 2

• Dot plot (or number line plot): A way of organizing one-variable data. Dot plots are particularly useful when the data set is small and/or spread out. Shown below is the dot plot for the lengths of 100 male bears. (See page 76.) • Histogram: A way of organizing one-variable data. For example, in the histogram below of test scores, 3 students have a score of at least 10 but less than 20, 7 students have a score of at least 20 but less than 30, and so on. (See page 79.) • Relative frequency histogram: This type of histogram has the proportion or percentage that fall into each bar on the vertical axis rather than the frequency or count. This plot is particularly useful if the sample is very large. (See page 79.) • Shape of the distribution: Distributions of one-variable data can be symmetric or skewed. (See page 77.) • Center: We can use mean, median or mode for the measure of center, depending on which is most appropriate. Mean = (sum of the data values) / (number of data values). Median = middle data value in the ordered list. Mode = most frequently occurring data value. (See pages 84 and 94.)

• Percentiles: Percentiles are often used to measure the position of a data value in the distribution. Percentiles are typically used only when there are a very large or infinite number of possible values, such as with heights. So, for example, look at the growth chart for girls on page 105. For this chart, you can see that a 15-year-old girl who weighs about 105 lbs would be at the 25th percentile. This means that about 75% of the girls her age weigh more than 105 lbs. (See pages 103-105.)

• Five-number summary (minimum, 1st quartile, median, 3rd quartile, maximum): Using our example, we can determine the five-number summary as follows. Put the values in order and count to the middle; this is the median. The median is 31. Count to the middle of the first (lower) 50% of the data; this data value is the first quartile, Q1. Q1 is 23. Count to the middle of the second (upper) 50% of the data; this data value is the third quartile, Q3. Q3 is 40. (See page 108.)

• Box plot: Use the five-number summary to make a box plot. You need a scale on the horizontal axis to make sense of the graph. The box contains the middle 50% of the data values, starting at the first quartile and ending at the third quartile. Interquartile range = Q3 - Q1 = 40 - 23 = 17. (See pages 108-111.) • Outliers: Data values that are far from, and separated from the rest of the distribution. If the data are represented by a box plot, then any value that is more than 1.5 times interquartile range (see above) above Q3 or below Q1 will be represented as a dot, separated from the other data. (See pages 113-116.)

• Spread of a distribution: Spread (or variability) could be measured by the range, by the IQR (interquartile range, see above or page 108), or by the standard deviation (see pages 116-124).