I’m trying to figure out the best way to understand what a "typical" starting salary is for a particular job or industry, and I keep seeing both "average" (mean) and "median" salary figures reported. It’s confusing because sometimes they’re pretty close, and other times they’re way off.
Specifically, I’m looking at entry-level data science roles. Let’s say I see a report that claims the "average starting salary" for a data scientist is $85,000 and another report that says the "median starting salary" is $78,000. What accounts for that $7,000 difference? Which one should I pay more attention to when negotiating my own salary? Are there specific situations where one is a more reliable indicator than the other? Also, is there a different calculation or method for each that is important to know?
Answer
The terms "median starting salary" and "average starting salary" (often referred to as the mean) represent different ways of summarizing the typical starting salary for a group of individuals. They can provide contrasting insights into the distribution of salaries and the presence of outliers. Here’s a detailed breakdown of the differences:
Average (Mean) Starting Salary:
-
Definition: The average starting salary is calculated by summing up all the individual starting salaries in a dataset and then dividing by the total number of salaries.
-
Formula: Average = (Sum of all salaries) / (Number of salaries)
-
Sensitivity to Outliers: The average is highly sensitive to extreme values or outliers. Outliers are salaries that are significantly higher or lower than the majority of the salaries in the dataset. A few very high salaries can disproportionately inflate the average, making it appear higher than what most people actually earn. Similarly, a few very low salaries can pull the average down.
- Example: Consider the following set of starting salaries: \$40,000, \$45,000, \$50,000, \$55,000, \$60,000, \$70,000, \$150,000. The average starting salary would be (\$40,000 + \$45,000 + \$50,000 + \$55,000 + \$60,000 + \$70,000 + \$150,000) / 7 = \$67,142.86. Notice how the \$150,000 salary significantly skews the average upwards.
Median Starting Salary:
-
Definition: The median starting salary is the middle value in a dataset when the salaries are arranged in ascending or descending order. In other words, it’s the salary that separates the higher half of the salaries from the lower half.
-
Calculation:
- If there’s an odd number of salaries, the median is the middle value.
- If there’s an even number of salaries, the median is the average of the two middle values.
-
Robustness to Outliers: The median is much less sensitive to outliers than the average. Because it only considers the central position in the ordered dataset, extreme values have little to no impact on the median.
- Example: Using the same set of starting salaries as before: \$40,000, \$45,000, \$50,000, \$55,000, \$60,000, \$70,000, \$150,000. Arranging them in ascending order, the median starting salary is \$55,000. The outlier of \$150,000 does not affect the median value.
Key Differences Summarized:
| Feature | Average (Mean) | Median |
|---|---|---|
| Definition | Sum of values divided by the number of values | Middle value in an ordered dataset |
| Sensitivity to Outliers | Highly sensitive | Robust to outliers |
| Interpretation | Represents the typical value if salaries were evenly distributed | Represents the point where half of the salaries are above and half are below |
Which Measure to Use?
-
Skewed Distributions: When dealing with salary data, which is often skewed (i.e., has a long tail of high salaries), the median is usually a better measure of central tendency than the average. It provides a more accurate representation of what a "typical" person earns.
-
Symmetrical Distributions: If the distribution of salaries is relatively symmetrical (i.e., the salaries are evenly spread around the center), the average and median will be similar.
- Understanding the Context: Consider the purpose of the analysis. If you want to understand the total compensation being distributed, the average might be useful. However, if you want to understand what a "typical" new graduate can expect to earn, the median is generally more informative.
In Conclusion:
The median and average starting salaries offer different perspectives on the central tendency of a dataset. The average is susceptible to distortion by extreme values, while the median provides a more stable and representative measure, particularly when dealing with skewed salary distributions. When comparing starting salaries, it’s important to understand which measure is being used and the implications of that choice. It is also essential to examine the distribution itself, if possible, to gain a more complete understanding of the data.