Oftentimes, when reading reports, briefs, or listening to presentations, we will come across statistics that are used to state a fact or support a point the author is trying to make. A conclusion to the process.
Descriptive statistics can play another very important role – as an investigative tool – helping you identify new opportunities and/or issues, and narrow down the scope when further research is needed.
In the previous article, I described how comparing the Mean and Median could help identify possible outliers, and how a Mode can point out the most popular feature or common behavior.
If you already had a chance to work with data analytics, for example, the number of views or interactions with your content on social media, you might have realized that even as an established creator, the daily stats rarely match the average. Instead, they go up and down – oscillate – in a random, most often unpredictable manner.
This poses a real challenge when trying to conclude if the changes we introduced to boost the performance were truly effective or if the uptick in results was only within the expected random variation.
Of course, this isn’t a new problem and the statistics provide us with the much-needed tools to contain the chaos.
Range, Minimum and Maximum values
The most basic of the measures of dispersions, Range, Minimum and Maximum can tell you what kind of values you should expect from the dataset. Let’s look at some practical examples.
Say, you are going on vacation. You might want to look up the minimum and maximum temperatures at your destination to assess what kind of clothes you should pack.
Elsewhere, your company might be running a website or online service. During the initial phase, you decided to rent the expensive server to make sure it can meet all the anticipated demand. System stability was paramount, so the max spec was the preferred way to go. A few months in, you can look at the maximum and minimum loads and confirm if you still require the top-notch configuration or if savings can be made because the load never exceeds a fraction of what the server can offer.
Similarly, you could quickly identify issues with your product if accuracy or stable performance is crucial by calculating the range, the difference between the max and minimum values. A low range, close to the performance baseline would indicate stable behavior.
This is a simple approach, so might not always be suitable, but it’s often a good starting point. If you hired a bartender to pour shots of 45ml of tequila and you find out that they were pouring in a range of +/- 30ml between 30 and 60ml, you might need to talk.
Variance is a measure of how spread out a set of data is from its mean or average value. In day-to-day work, you will rarely use it on its own, since it comes ‘squared’ which means it can cause confusion with its units. Most likely, you will be calculating the variance to get to the standard deviation, so it’s good to know where the numbers came from.
The formula for variance is:
σ² = ∑(xᵢ – μ)² / N
- σ² (sigma squared) is the variance,
- xᵢ is each individual data point,
- μ (mu) is the mean of the dataset, and
- N (or n for the sample) is the total number of data points.
- ∑ means that you should repeat and sum up the results for all the elements in your dataset.
A very important detail to point out: this is a formula for population variance – cases where the dataset you’re working with is complete (all of the data that there ever will be). For better accuracy when we work with a sample of data, below formula is recommended:
s² = ∑(xᵢ – μ)² / n – 1
Standard deviation, similarly to Variance, measures the spread (dispersion) of values in a dataset relative to its mean (aka average).
If you already calculated the variance, getting the value of the standard deviation is super easy – we just need to calculate the square root of the variance!
And the full formula looks like this:
The values of standard deviation are typically much easier to wrap your head around, as they are in the same units as your input data.
That’s all great and easy, but how is that useful for Product development? Here are some examples:
- Analyzing customer feedback: You might collect feedback from customers on various aspects of your product, such as usability, features, and pricing. By calculating the standard deviation of the responses, you can identify which areas are most polarizing among your customers. A higher standard deviation would indicate that the responses are more spread out, meaning that some customers love that aspect of the product while others strongly dislike it. This information can help you identify areas where you might need to focus your attention and make improvements.
- Measuring product or feature performance: By calculating the standard deviation of metrics such as sales, revenue, and user engagement, you can determine how much variability there is in the data. A higher standard deviation might suggest that you need to identify the root cause of the variability and make changes to address it.
- Testing product changes: You might use A/B testing to test different versions of your product or changes to your marketing strategy. By calculating the standard deviation of the data from each test group, you can determine how much variability there is in the results. This information can help you determine whether the results are statistically significant and whether you should implement the changes.
- Monitoring customer churn: You might use data on customer churn, or the rate at which customers stop using your product, to monitor customer retention and loyalty. By calculating the standard deviation of the churn rate over time, you can determine how consistent or variable the rate is. This information can help you identify trends in customer behavior and develop strategies to improve retention and reduce churn.
My favorite way of utilizing Standard Deviation though is by applying it as upper and lower limits in the Control Charts, which looks like the graph below and provides a valuable context for the plotted data.
The Control Charts are such valuable tools that I plan to dedicate it a separate article in near future.
The tip of an iceberg
If you’ve been following me in this series from the beginning (you can find parts 1 and part 2 here), you should by now know the absolute basics of descriptive statistics that will help you better understand your data and fuel data-driven decisions.
And yet, we only just scratched the surface! In the next few posts, we will expand our knowledge by concepts of:
- Central limit theorem
- Normal, uniform, exponential, binomial, and Poisson distribution
- Simple and compound probabilities
- Expected values
- and more!
Have you already been taking advantage of descriptive statistics in product development? Share your stories in the comments!