WHAT IS INTRO TO DATA SCIENCE: MEASURES OF CENTRAL TENDENCY—MEAN, MEDIAN AND MODE AND HOW TO USE IN PYTHON PROGRAMMING

INTRO TO DATA SCIENCE: MEASURES OF CENTRAL TENDENCY—MEAN, MEDIAN AND MODE

Here we continue our discussion of using statistics to analyze data with several additional descriptive statistics, including:

  • mean—the average value in a set of values.
  • median—the middle value when all the values are arranged in sorted order.
  • mode—the most frequently occurring value.

These are measures of central tendency—each is a way of producing a single value that represents a “central” value in a set of values, i.e., a value which is in some sense typical of the others.
Let’s calculate the mean, median and mode on a list of integers. The following session creates a list called grades, then uses the built­in sum and len functions to calculate the mean “by hand”—sum calculates the total of the grades (397) and len returns the number of grades (5):

In [1]: grades = [8593458985]
In [2]: sum(grades) / len(grades)
Out[2]: 79.4

The previous chapter mentioned the descriptive statistics count and sum—implemented in Python as the built­in functions len and sum. Like functions min and max (introduced in the preceding chapter), sum and len are both examples of functional­-style programming reductions—they reduce a collection of values to a single value—the sum of those values and the number of values, respectively. In Section 3.8’s class-­average example, we could have deleted lines 10–15 of the script and replaced average in line 16 with snippet [2]’s calculation.
The Python Standard Library’s statistics module provides functions for calculating the mean, median and mode—these, too, are reductions. To use these capabilities, first import the statistics module:

In [3]: import statistics

Then, you can access the module’s functions with “statistics.” followed by the name of the function to call. The following calculates the grades list’s mean, median and mode, using the statistics module’s mean, median and mode functions:

In [4]: statistics.mean(grades)
Out[4]: 79.4

In [5]: statistics.median(grades)
Out[5]: 85

In [6]: statistics.mode(grades)
Out[6]: 85

Each function’s argument must be an iterable—in this case, the list grades. To confirm that the median and mode are correct, you can use the built­in sorted function to get a copy of grades with its values arranged in increasing order:

In [7]: sorted(grades)

Out[7]: [45, 85, 85, 89, 93]

The grades list has an odd number of values (5), so median returns the middle value (85). If the list’s number of values is even, median returns the average of the two middle values. Studying the sorted values, you can see that 85 is the mode because it occurs most frequently (twice). The mode function causes a StatisticsError for lists like

[859345898593]

in which there are two or more “most frequent” values. Such a set of values is said to be imodal. Here, both 85 and 93 occur twice.

FREE HACKING COURSE LAUNCHED

*

Post a Comment (0)
Previous Post Next Post