How to Calculate Standard Deviation in Python

Here's a quick script for calculating standard deviation in Python without downloading external libraries. It was an exercise while working on the R page in the wiki. Please leave a comment below, if you can suggest improvements.

  1. from math import sqrt
  2.  
  3.  
  4. def standard_deviation(lst, population=True):
  5. """Calculates the standard deviation for a list of numbers."""
  6. num_items = len(lst)
  7. mean = sum(lst) / num_items
  8. differences = [x - mean for x in lst]
  9. sq_differences = [d ** 2 for d in differences]
  10. ssd = sum(sq_differences)
  11.  
  12. # Note: it would be better to return a value and then print it outside
  13. # the function, but this is just a quick way to print out the values along
  14. # the way.
  15. if population is True:
  16. print('This is POPULATION standard deviation.')
  17. variance = ssd / num_items
  18. else:
  19. print('This is SAMPLE standard deviation.')
  20. variance = ssd / (num_items - 1)
  21. sd = sqrt(variance)
  22. # You could `return sd` here.
  23.  
  24. print('The mean of {} is {}.'.format(lst, mean))
  25. print('The differences are {}.'.format(differences))
  26. print('The sum of squared differences is {}.'.format(ssd))
  27. print('The variance is {}.'.format(variance))
  28. print('The standard deviation is {}.'.format(sd))
  29. print('--------------------------')
  30.  
  31.  
  32. s = [98, 127, 133, 147, 170, 197, 201, 211, 255]
  33. standard_deviation(s)
  34. standard_deviation(s, population=False)

Output:

  1. This is POPULATION standard deviation.
  2. The mean of [98, 127, 133, 147, 170, 197, 201, 211, 255] is 171.0.
  3. The differences are [-73.0, -44.0, -38.0, -24.0, -1.0, 26.0, 30.0, 40.0, 84.0].
  4. The sum of squared differences is 19518.0.
  5. The variance is 2168.6666666666665.
  6. The standard deviation is 46.56894530335282.
  7. --------------------------
  8. This is SAMPLE standard deviation.
  9. The mean of [98, 127, 133, 147, 170, 197, 201, 211, 255] is 171.0.
  10. The differences are [-73.0, -44.0, -38.0, -24.0, -1.0, 26.0, 30.0, 40.0, 84.0].
  11. The sum of squared differences is 19518.0.
  12. The variance is 2439.75.
  13. The standard deviation is 49.393825525059306.
  14. --------------------------
Sun, 2016-07-17 18:42
Offline
Joined: 9 months 2 weeks ago

losing precision


I was doing my own program for calculating (sample)standard deviation and ended with similar code:

for i in lst:
aux += ((i-mean)**2)

However, this seems to lose precision, and your solution give more accurate results

differences = [x - mean for x in lst]
sq_differences = [d ** 2 for d in differences]
ssd = sum(sq_differences)

Why is that separating the operations gives different results?

Fri, 2016-08-12 21:30
Offline
Joined: 2 years 1 month ago

I not sure. Could you post


I not sure. Could you post your complete function? You could put a print statement in there to see what the values are. Or run them on pythontutor.com to compare.