The Olympic Games have finished a couple of days ago. Two entire weeks of complete devotion for sport. Unfortunately I hadn’t got any ticket but I didn’t fail to watch many games on TV and internet. I was looking at decathlon men competition and I was very impressed by the general quality of these athletes. They have to be able to do everything: sprinting (100 m), jumping high, jumping fast (110 m hurdles), long, throw heavy (put shot) and light (javelin) things, running longer (400 m) and even longer (1500 m)… It became obvious in my mind that it was the quintessence of the sport, every athlete has to find the perfect balance between those different performances to compete efficiently. This sport induces all the quality of a strong man: power, endurance, flexibility, sprint…

Is it really true? Is it really the most balanced athlete who win the decathlon competition?

I decided to test this assumption with the results of the previous Olympic Games (Beijing, 2008). I only kept the athletes who have completed all the disciplines so that I can do the study on a data set without any missing values. I used the observations of the scores for each discipline which are calculated according to the time of the distance done by the athlete. If you are interested in those details, you can have a look at the way it is calculated on: http://www.iaaf.org/mm/Document/Competitions/ ... _Tables_of_Athletics_2011_23299.pdf.

I have been very surprised to see that the winner, Bryan Clay who has an average of 879 points per discipline, did very poorly in 400 meter (865 points), high jump (794 points) and in the 1500 meters race (522 points). On the contrary, he performed very well in 100 meters, 110 meters hurdle and long jump disciplines. Thus, I started wondering if the decathlon was not about power rather than about my so-called balance capacity in all the different areas.

Sir Prasanta Chandra Mahalanobis answered to this question some decades ago. In 1936 he decided to create a new function to measure the distance separating two observations. The most common distance is the Euclidian distance. However, this distance does not take into account two important elements. The first element is the variance of the different variables. Indeed, let's consider the high jump discipline and the pole vault, a gap of 30 centimeters between two athletes is huge in high jumping whereas it is a reasonable difference in pole vault. The reason is easy to understand, the variance in pole vaulting discipline is higher than in high jumping. Fortunately, most of the robustness to the variance is taken into account by the international athletic association (the federation who sets the scores) – although we will see that this is not perfectly true. But there is another problem which is even more important. The correlation of the different disciplines. For example the following graphic shows a positive correlation between shot put and disc throw, which, if we think about it, makes sense! Thus, if we look for the most complete athlete, there should be no cumulative rewards - we don’t want to give athletes too many points when they have performed well in two very similar disciplines. On the contrary, if two disciplines are negatively correlated such as 1500 meters and 100 meters we want to give extra points to athletes who perform well in both of the disciplines. The Mahalanobis distance has been created in this purpose.

If S is the matrix of variance-covariance of the data set, we can formally write the Mahalanobis distance between the vectors x and y as:
           

Once the matrix S is computed, we can calculate the Mahalanobis score for every athlete - say the distance between zero and the scores of the athlete in the different disciplines. It was unexpected to see that the gold medal would be claimed by Oleksiy Kasyanov who has finished 7th during the Olympic Games. On the contrary, Bryan Clay the Olympic champion would now rank 5th. You can find below two tables, the first one is the ranking of the athletes according to the Mahalanobis distance, and the second one is the official decathlon ranking. As you can see they are many differences. Therefore, decathlon is not the ultimate sport of complete athlete.

Mahalanobis Ranking
Athlete
Mahalanobis score
1
Oleksiy Kasyanov
790.60
2
Andrei Krauchanka
789.16
3
Maurice Smith
767.85
4
Leonel Suárez
754.27
5
Bryan Clay
742.40
6
Yordanis Garciá
737.40
7
Michael Shrade
723.31
8
Romain Barras
709.31
9
Aleksandr Pogorelov
701.18
10
Andres Raja
696.00
11
Roman Sebrle
693.79
12
Aleksey Drozdov
690.95
13
André Niklaus
687.12
14
Massimo Bertocchi
681.92
15
Jangy Addy
681.16
16
Mikk Pahapill
677.04
17
Mikalai Shubianok
667.82
18
Hadi Sepehrzad
653.71
19
Damjan Sitar
651.63
20
Eugene Martineau
637.66
21
Haifeng Qi
631.22
22
Aliaksandr Parkhomenka
630.64
23
Slaven Dizdarevic
607.92
24
Daniel Awde
607.78


Decathlon Ranking
Athlete
Decathlon Score
1
Bryan Clay
8791
2
Andrei Krauchanka
8551
3
Leonel Suárez
8527
4
Aleksandr Pogorelov
8328
5
Romain Barras
8253
6
Roman Sebrle
8241
7
Oleksiy Kasyanov
8238
8
André Niklaus
8220
9
Maurice Smith
8205
10
Michael Shrade
8194
11
Mikk Pahapill
8178
12
Aleksey Drozdov
8154
13
Andres Raja
8118
14
Eugene Martineau
8055
15
Yordanis Garciá
7992
16
Mikalai Shubianok
7906
17
Aliaksandr Parkhomenka
7838
18
Haifeng Qi
7835
19
Massimo Bertocchi
7714
20
Jangy Addy
7665
21
Daniel Awde
7516
22
Hadi Sepehrzad
7483
23
Damjan Sitar
7336
24
Slaven Dizdarevic
7021


The code (R):

#data and data3 are randomly generated for the example

a = rnorm(24)
data=data.frame(shotPut=a, discusThrow=0.5*a + 0.5 * rnorm(24))
data3=data.frame(X1=a, X2=0.5*a + 0.5 * rnorm(24), X3 = rnorm(24), X4 = rnorm(24), , X5 = rnorm(24), X6 = rnorm(24))

lm.shotPut = lm(data$shotPut~data$discusThrow)

plot(data$discusThrow, data$shotPut, axes=TRUE, ann=FALSE)
abline(lm.shotPut)
title(ylab="Score at shot put", xlab = 'Score at discus throw', col.lab=rgb(0,0,0))

Sigma = cov(data3)
distance = mahalanobis(data3,0 , Sigma, inverted = FALSE)

0

Add a comment

The financial market is not only made of stock options. Other financial products enable market actors to target specific aims. For example, an oil buyer like a flight company may want to cover the risk of increase in the price of oil. In this case it is possible to buy on the financial market what is known as a "Call" or a "Call Option".

A Call Option is a contract between two counterparties (the flight company and a financial actor). The buyer of the Call has the opportunity but not the obligation to buy a certain  quantity of a certain product (called the underlying) at a certain date (the maturity) for a certain price (the strike).

I found a golden website. The blog of Esteban Moro. He uses R to work on networks. In particular he has done a really nice code to make some great videos of networks. This post is purely a copy of his code. I just changed a few arguments to change colors and to do my own network.

To create the network, I used the  Barabási-Albert algorithm that you can find at the end of the post on the different algorithms for networks. Igraph is the library which has been used.
3

As you have certainly seen now, I like working on artificial neural networks. I have written a few posts about models with neural networks (Models to generate networks, Want to win to Guess Who and Study of spatial segregation).

Unfortunately, I missed so far a nice and pleasant aspect of networks : its graphical approach. Indeed, plots of neural networks are often really nice and really useful to understand the network.

Sometimes such a graph can point out some characteristics of the network.
1

I already talked about networks a few times in this blog. In particular, I had this approach to explain spatial segregation in a city or to solve the Guess Who? problem. However, one of the question is how to generate a good network. Indeed, I aim to study strategy to split a network, but I need first to work with a realistic neural network. I could have downloaded data of a network, but I'd rather study the different models proposed to generate neural networks.

The function apply() is certainly one of the most useful function. I was scared of it during a while and refused to use it. But it makes the code so much faster to write and so efficient that we can't afford not using it. If you are like me, that you refuse to use apply because it is scary, read the following lines, it will help you. You want to know how to use apply() in general, with a home-made function or with several parameters ? Then, go to see the following examples.
1

Have you ever played the board game "Guess who?". For those who have not experienced childhood (because it might be the only reason to ignore this board game), this is a game consisting in trying to guess who the opponent player is thinking of among a list of characters - we will call the one he chooses the "chosen character". These characters have several characteristics such as gender, having brown hair or wearing glasses.

If you want to choose randomly your next holidays destination, you are likely to process in a way which is certainly biased. Especially if you choose randomly the latitude and the longitude. A bit like they do in this lovely advertising (For those of you who do not speak French, this is about a couple who have won the national gamble prize and have to decide their next travel. The husband randomly picks Australia and the wife is complaining : "Not again!").
4

My previous post is about a method to simulate a Brownian motion. A friend of mine emailed me yesterday to tell me that this is useless if we do not know how to simulate a normally distributed variable.

My first remark is: use the rnorm() function if the quality of your simulation is not too important (Later, I'll try to explain you why the R "default random generation" functions are not perfect). However, it may be fun to generate a normal distribution from a simple uniform distribution.

The Brownian motion is certainly the most famous stochastic process (a random variable evolving in the time). It has been the first way to model a stock option price (Louis Bachelier's thesis in 1900).

The reason why is easy to understand, a Brownian motion is graphically very similar to the historical price of a stock option.
1

The merge of two insurance companies enables to curb the probability of ruin by sharing the risk and the capital of the two companies.

For example, we can consider two insurance companies, A and B. A is a well known insurance company with a big capital and is dealing with a risk with a low variance. We will assume that the global risk of all its customers follow a chi-square distribution with one degree of freedom.
Blog Archive
Translate
Translate
Loading