Two people are working with a laptop on a wooden table. Several diagrams and graphs can be seen on the screen. One person is pointing at the screen, the other is holding a coffee cup. - CC0 Licence

©CC0 Licence

More from Digitalization & Leadership

Decision Making of the Future: Drawing on Both Artificial and Creative Intelligence

Are You Ready for Digital Transformation?

FaceReader Technology: What Your Facial Expressions Reveal while Surfing the Web

What Executives Really Need to Know about AI | Part 2

Blockchain Unchained?

DigitizationHuman Resources

HR, Big Data and the Trap of Simplification

HR decisions based on samples can mislead—how confidence intervals help prevent costly mistakes.

How disregarding uncertainty in your data can lead to costly mistakes

These days, we can observe a general trend towards more data-driven decision-making in firms. This trend also applies to areas and functions that have traditionally been less “quantitative” such as human resources.

However, the HR function is catching up quickly. There are a number of big data and analytics applications out there and firms also experiment with recruiting robots based on Facebook Messenger or automated screenings of candidates for trainings. Yet, I regularly observe some widespread knowledge gaps among HR professionals with regards to some basic statistic principles. From my perspective, one of the most important ones is how to deal with uncertainty in (HR) data. Particularly, I observe that many managers, who often only have data from a sample and not from the entire population, make the mistake of treating their sample as if it was the population. Overlooking this point can lead to wrong decisions and potential costly mistakes.

The “uncertainty problem” of samples

Usually, HR managers are interested in data on their whole organization, for example, data on all project teams of the firm, or all employees, or all current expats etc. However, while we might want to get data on all employees and analyze all those people, teams, or expats, data collection might be costly and time consuming. Therefore, HR managers are often required to collect data only from a subset of these groups, i.e. a “sample”. Putting it simply, sampling means that you do not use all available information.

A group of wooden play figures stands on a textured gray surface. Another piece is positioned separately. The pieces resemble simple game pieces and are all uniformly made of light-colored wood.-CC0 Licence

^{In a sample analysis, not all available information is used. Photo © CC0 Licence}

The common approach is to run some statistics on the available data and to use the results as good “estimates” for the whole population. For example, let’s assume a firm has 300 expats and it has collected data on a sample of 100 of those. This year the average satisfaction with the firm’s expat preparation courses prior to their appointment rated 3.8 out of a 5 points (5 being very satisfied, and 1 representing “very unsatisfied”). During the last two years, when the firm collected data on all expats it was 4.2 in both years. The HR manager might conclude that she has to change the prep courses because “it’s getting worse”! This is how it shows up in the chart:

The bar chart entitled "Average satisfaction" shows the average satisfaction values for 2015, 2016 and 2017. In 2015 and 2016, the value is 4, while in 2017 it drops to 3.5.-WU Executive Academy

Yet is this the right interpretation?

The problem with this conclusion is that it is based on the assumption that 3.8 in the sample represents 3.8 in the total population. This is not correct! If a different sample had been taken, the average satisfaction might have been 3.5 or 4.0 or even 4.5 simply because the HR manager got, entirely by chance, more or less disgruntled expats into the sample that influenced the average course rating. Thus, we just have an indication of how all the 300 expats evaluate the course which is based on the sample (and this tells us, it’s “3.8” this year), but there is some uncertainty with regard to this 3.8… it could very well be different.

If managers do not take this uncertainty into account, they might over-react or under-react. For example, let’s assume, that in reality all 300 managers have a satisfaction level of 4.2 while our sample, entirely by chance, tells us it’s 3.8 because of some more unhappy expats in the sample data. What would the conclusion then be? Well, here, we might conclude that the course is still fine – actually, the same satisfaction level as the entire group of expats in the last two years. However, if all managers have an average satisfaction level of 3.6 (and our sample still shows 3.8), then we might be really worried that there is something wrong with the courses.

Four people are standing in a meadow holding round signs with green ticks over their faces. They are outdoors, with buildings and trees in the background.-CC0 Licence

^{If the satisfaction in the sample is high, this does not mean that it also applies to the entire population. Photo © CC0 Licence}

So if managers just simply assume that a statistic such as an average from a sample is the same in the total population, he or she might make mistakes. These mistakes can be under-reactions (the manager should change the prep course but does not) or over-reactions (the manager should not change the prep course as there is no clear evidence of a trend towards more dissatisfaction, but did change a well-functioning course) – both decisions are thus potentially costly and time-consuming mistakes.

The solution

In statistics we would call results from a sample a “point estimate”. A point estimate by itself might be a good start to think about the total population (it is a first good guess), but a point estimate does not provide any information how “good” this estimate is – it does not take uncretainty into account.

Good news is: if your sample was taken randomly, statistics can help us get an understanding of the error caused by using a sample and not the full population. We will never know for sure what the true population value (the average satisfaction of the entire group) is until we actually collect data from the entire population. Yet, we can still deal with this issue using confidence intervals.

Confidence intervals can also be called “range estimates”. Contrary to point estimates, a range estimate provides a whole range of potential population estimates that are likely to be true. For our example above, instead of assuming that the 3.8 average of the sample is also 3.8 in the total population, we would compute the confidence interval. Then we would base our decision-making on a statement that says that we can be 95% confident that the true population average lies somewhere in the range between 3.6 and 4.0.

A person in a suit points with a raised thumb. He is wearing a dark blazer over a checked shirt with a striped tie and a watch on his left wrist. The background is blurred.-CC0 Licence

^{The confidence intervals can be used to calculate the bandwidth in which a value is most likely to be located. Photo © CC0 Licence}

The point about the confidence interval is that our conclusions from the data become very different: we moved from a simple point estimate (the satisfaction of all expats is 3.8) to a range estimate (it is quite likely that expat satisfaction lies somewhere in the range between 3.6 and 4.0) and, therefore, we might make a different decision. In this case, we could conclude that the difference between 4.2 and the quite likely 4.0 of this year is not big enough to engage into redesigning the course.

To sum it up, by taking random samples and computing range estimates instead of point estimates, we acknowledge that our estimate of the population is to some degree uncertain and we are better equipped to avoid costly under- or overreactions.

Prof. Phillip Nell also wrote an article about how politics can be a risk factor for businesses. Read it here.

Update for Leaders

Join 15,000 + professionals and get regular updates on leadership and management topics. Learn something new every time.

Subscribe to our Newsletter (Link einsetzen!)

Interesting Topics

Our Key Topics provide inspiration on the big questions of our time: How can responsible leadership succeed? What role does sustainability play in business? And how do you develop a career with purpose? Discover forward-looking perspectives and practical insights for a changing world.

Sustainability Digitization Strategy Career Female Leadership All about the MBA

Upcoming Events

Find your ideal program with our AI chatbot Brainiac

Let's go!

Salary Negotiation Reloaded: No-Gos, Tactical Mistakes, and How to Do It Right

With an MBA: A Career between South Korea to Vienna

Bucharest Executive MBA goes Global

Online Info-Session: Global Executive MBA

Bucharest Executive MBA goes Global

Online Info-Session: Global Executive MBA

Bucharest Executive MBA goes Global

Online Info-Session: Global Executive MBA

Bucharest Executive MBA goes Global

Online Info-Session: Global Executive MBA

Salary Negotiation Reloaded: No-Gos, Tactical Mistakes, and How to Do It Right

With an MBA: A Career between South Korea to Vienna

Bucharest Executive MBA goes Global

Online Info-Session: Global Executive MBA

More from Digitalization & Leadership

HR, Big Data and the Trap of Simplification

How disregarding uncertainty in your data can lead to costly mistakes

The “uncertainty problem” of samples

Yet is this the right interpretation?

The solution

Update for Leaders

Interesting Topics

Upcoming Events

Our Programs