The Olympic Games have finished a couple of days ago. Two
entire weeks of complete devotion for sport. Unfortunately I hadn’t got any
ticket but I didn’t fail to watch many games on TV and internet. I was looking
at decathlon men competition and I was very impressed by the general quality of
these athletes. They have to be able to do everything: sprinting (100 m),
jumping high, jumping fast (110 m hurdles), long, throw heavy (put shot) and
light (javelin) things, running longer (400 m) and even longer (1500 m)… It
became obvious in my mind that it was the quintessence of the sport, every
athlete has to find the perfect balance between those different performances to
compete efficiently. This sport induces all the quality of a strong man: power,
endurance, flexibility, sprint…
Is it really true? Is it really the most balanced athlete
who win the decathlon competition?
I decided to test this assumption with the results of the previous
Olympic Games (Beijing,
2008). I only kept the athletes who have completed all the disciplines so that
I can do the study on a data set without any missing values. I used the
observations of the scores for each discipline which are calculated according
to the time of the distance done by the athlete. If you are interested in those details, you
can have a look at the way it is calculated on: http://www.iaaf.org/mm/Document/Competitions/ ... _Tables_of_Athletics_2011_23299.pdf.
I have been very surprised to see that the winner, Bryan Clay who has an average of 879
points per discipline, did very poorly in 400 meter (865 points), high jump
(794 points) and in the 1500 meters race (522 points). On the contrary,
he performed very well in 100 meters, 110 meters hurdle and long jump disciplines.
Thus, I started wondering if the decathlon was not about power rather than about my
so-called balance capacity in all the different areas.
Sir Prasanta Chandra Mahalanobis answered to this question some decades ago. In 1936 he decided to create a new function to measure the
distance separating two observations. The most common distance is the Euclidian
distance. However, this distance does not take into account two important
elements. The first element is the variance of the different
variables. Indeed, let's consider the high jump discipline and the pole vault,
a gap of 30 centimeters between two athletes is huge in high jumping whereas
it is a reasonable difference in pole vault. The reason is easy to understand,
the variance in pole vaulting discipline is higher than in high jumping. Fortunately,
most of the robustness to the variance is taken into account by the
international athletic association (the federation who sets the scores) –
although we will see that this is not perfectly true. But there is another
problem which is even more important. The correlation of the different disciplines.
For example the following graphic shows a positive correlation between shot put
and disc throw, which, if we think about it, makes sense! Thus, if we look for
the most complete athlete, there should be no cumulative rewards - we don’t want to give athletes too many points when they have performed well in two very similar disciplines. On the contrary,
if two disciplines are negatively correlated such as 1500 meters and 100 meters
we want to give extra points to athletes who perform well in both of the
disciplines. The Mahalanobis distance has been created in this purpose.
If S is the matrix of variance-covariance of the data set,
we can formally write the Mahalanobis distance between the vectors x and y as:
Once the
matrix S is computed, we can calculate the Mahalanobis score for every athlete - say the distance between zero and the scores of the athlete in the different disciplines.
It was unexpected to see that the gold medal would be claimed by Oleksiy
Kasyanov who has finished 7th during the Olympic Games. On the contrary,
Bryan Clay the Olympic champion would now rank 5th. You can find
below two tables, the first one is the ranking of the athletes according to the
Mahalanobis distance, and the second one is the official decathlon ranking. As
you can see they are many differences. Therefore, decathlon is not the ultimate
sport of complete athlete.
Mahalanobis
Ranking
|
Athlete
|
Mahalanobis
score
|
1
|
Oleksiy
Kasyanov
|
790.60
|
2
|
Andrei
Krauchanka
|
789.16
|
3
|
Maurice
Smith
|
767.85
|
4
|
Leonel
Suárez
|
754.27
|
5
|
Bryan
Clay
|
742.40
|
6
|
Yordanis
Garciá
|
737.40
|
7
|
Michael
Shrade
|
723.31
|
8
|
Romain
Barras
|
709.31
|
9
|
Aleksandr
Pogorelov
|
701.18
|
10
|
Andres
Raja
|
696.00
|
11
|
Roman
Sebrle
|
693.79
|
12
|
Aleksey
Drozdov
|
690.95
|
13
|
André
Niklaus
|
687.12
|
14
|
Massimo
Bertocchi
|
681.92
|
15
|
Jangy
Addy
|
681.16
|
16
|
Mikk
Pahapill
|
677.04
|
17
|
Mikalai
Shubianok
|
667.82
|
18
|
Hadi
Sepehrzad
|
653.71
|
19
|
Damjan
Sitar
|
651.63
|
20
|
Eugene
Martineau
|
637.66
|
21
|
Haifeng
Qi
|
631.22
|
22
|
Aliaksandr
Parkhomenka
|
630.64
|
23
|
Slaven
Dizdarevic
|
607.92
|
24
|
Daniel
Awde
|
607.78
|
Decathlon
Ranking
|
Athlete
|
Decathlon
Score
|
1
|
Bryan
Clay
|
8791
|
2
|
Andrei
Krauchanka
|
8551
|
3
|
Leonel
Suárez
|
8527
|
4
|
Aleksandr
Pogorelov
|
8328
|
5
|
Romain
Barras
|
8253
|
6
|
Roman
Sebrle
|
8241
|
7
|
Oleksiy
Kasyanov
|
8238
|
8
|
André
Niklaus
|
8220
|
9
|
Maurice
Smith
|
8205
|
10
|
Michael
Shrade
|
8194
|
11
|
Mikk
Pahapill
|
8178
|
12
|
Aleksey
Drozdov
|
8154
|
13
|
Andres
Raja
|
8118
|
14
|
Eugene
Martineau
|
8055
|
15
|
Yordanis
Garciá
|
7992
|
16
|
Mikalai
Shubianok
|
7906
|
17
|
Aliaksandr
Parkhomenka
|
7838
|
18
|
Haifeng
Qi
|
7835
|
19
|
Massimo
Bertocchi
|
7714
|
20
|
Jangy
Addy
|
7665
|
21
|
Daniel
Awde
|
7516
|
22
|
Hadi
Sepehrzad
|
7483
|
23
|
Damjan
Sitar
|
7336
|
24
|
Slaven
Dizdarevic
|
7021
|
The code (R):
#data and
data3 are randomly generated for the example
a =
rnorm(24)
data=data.frame(shotPut=a,
discusThrow=0.5*a + 0.5 * rnorm(24))
data3=data.frame(X1=a,
X2=0.5*a + 0.5 * rnorm(24), X3 = rnorm(24), X4 = rnorm(24), , X5 = rnorm(24),
X6 = rnorm(24))
lm.shotPut
= lm(data$shotPut~data$discusThrow)
plot(data$discusThrow,
data$shotPut, axes=TRUE, ann=FALSE)
abline(lm.shotPut)
title(ylab="Score
at shot put", xlab = 'Score at discus throw', col.lab=rgb(0,0,0))
Sigma =
cov(data3)
distance =
mahalanobis(data3,0 , Sigma, inverted = FALSE)
Add a comment