1. The Brownian motion is certainly the most famous stochastic process (a random variable evolving in the time). It has been the first way to model a stock option price (Louis Bachelier's thesis in 1900).


    The reason why is easy to understand, a Brownian motion is graphically very similar to the historical price of a stock option.
    A Brownian motion generated with R (n = 1000)


    The historical price of the index FTSE MID 250, source Yahoo Finance.



    Even though it is not any more considered as the real distribution of stock options, it is still used in finance and methods to simulate a Brownian motion are really useful. As you will see the code is strikingly easy. We have used the Donsker's principle.

    If Y(i) is a serie of k variables normally distributed, then we can  create X(i) = (Y(1) + ... + Y(i))sqrt(i/k). Then X(i) is a proxy of the Brownian motion.

    The code (R) : 

    size = 1000
    y = rnorm(size)
    x = y

    for (i in 1:size){
      x[i] = 1/sqrt(size)*(sum(y[1:i])*sqrt(i))
    }

    plot(x, type = 'l', xlab = 'Time', ylab = 'Profit')
    1

    View comments

  2. The merge of two insurance companies enables to curb the probability of ruin by sharing the risk and the capital of the two companies.

    For example, we can consider two insurance companies, A and B. A is a well known insurance company with a big capital and is dealing with a risk with a low variance. We will assume that the global risk of all its customers follow a chi-square distribution with one degree of freedom. Besides, we assume that the initial capital of A is 3 and the premium charged to its customers is 1.05. The reason why it is charged 1.05 is a bit intricate and is not of great importance for this post, I may explain it later, in another post. The idea is that A charges a premium slightly over the expectation of a chi-square with one degree of freedom.

    The second firm B is in a very different situation. This is a very new insurance company with a very low capital (let's say 0.1). Its customers have claims which follow a chi square distribution with two degrees of freedom, which means that the insured product is more risky and the variance of the risk is higher too. B is likely to charge its customers with a higher premium than A. Indeed, B has a low capital, the expectation of the claim as well as the variance are higher. So we consider a premium of 3.

    To sum up:


    Capital Premium Claims
    Firm A k1 = 3 premium1 = 1.05 x1 ~ χ(1)
    Firm B k2 = 0.1 premium2 = 3 x2 ~ χ(2)


    Finally we do a last assumption which is the independence of the claims. This assumption which seems relevant is not such a common place in real insurance problem and is a real source of interesting issues.

    Every day, the firms A and B receive their premiums and pay back the claims to their customers.

    The probability of ruin is, in risk management, and in the current project of risk management rules for insurances (solvency 2), the measure of interest. For a certain number of days, the company A is in a ruin situation if there has been at least one day where the total amount of claims has been greater than the sum of the initial capital and the total amount of premiums. To estimate such a probability we can easily (at least in this case) simulate the model. The probability of ruin is the probability to be in such a situation. For example, in the next plot, we can see that in a particular simulation for the firm A, A experienced a ruin as soon as the day number 7.

    To estimate the probability of ruin for a given number of days, we simulate by Monte Carlo method this situation. For 100 days with a sample of 100,000 simulations, the estimation of the probability of ruin of the company A is 0.66, for the company B, it is 0.41. What is interesting is that if the two companies were merging, which means if they were sharing their claims, theirs premiums and their initial capital, the estimated probability of ruin of the merge company is 0.23. We can see, that merging the company reduce the probability of ruin.

    A limit could be pointed out. The problem in merging two companies is that if the merged company experiences a ruin, it is a very serious problem. Indeed, it may be considered as better to have one company ruined often, and very rarely both of them ruined, than a situation where the merged company is sometimes ruined... In our particular case, and this is why I choose two different profile of firms, this is not a relevant problem. The estimated probability of simultaneous ruin of both A and B is 0.26 which is slightly greater than the estimator of the ruin probability of the merged company. 

    The last plot is an example of a situation where the merged company would not have been ruined while both A and B would have been ruined within 100 days. 

    The red line is the company B while the black line is A and the purple line is the merged company. Both A and B have been ruined, while the merged company has not been ruined.

    The code (R):

    premium1 = 1.05
    premium2 = 3

    horizon = 100
    length = 100000
    # sum1 is the number of time company 1 is ruined
    sum1 = 0
    # sum2 is the number of time company 2 is ruined
    sum2 = 0
    # sum3 is the number of time companies 1 and 2 are ruined
    sum3 = 0
    # sum is the number of time the merge company is ruined
    sum = 0

    for(j in 1:length){
      k1 = rep(0, horizon+1)
      k2 = rep(0, horizon+1)
      k = rep(0, horizon+1)
      k1[1] = 3
      k2[1] = 0.1
      k[1] = k1[1]+k2[1]
      x1 = rchisq(horizon, 1)
      x2 = rchisq(horizon, 2)
     
      ruinX1 = FALSE
      ruinX2 = FALSE
      ruinX = FALSE
     
      for(i in 1:horizon){
        k1[i+1] = premium1 + k1[i] - x1[i]
        k2[i+1] = premium2 + k2[i] - x2[i]
        k[i+1] = premium1 + premium2 + k[i] - x1[i] - x2[i]
        if(k1[i+1]<0){
          ruinX1 = TRUE
        }
        if(k2[i+1]<0){
          ruinX2 = TRUE
        }
        if(k[i+1]<0){
          ruinX = TRUE
        }
      }
      if(ruinX1 & ruinX2){
        sum3 = sum3 + 1
        sum1 = sum1 + 1
        sum2 = sum2 + 1
      }
      else if(ruinX1){
        sum1 = sum1 + 1
      }
      else if(ruinX2){
        sum2 = sum2 + 1
      }
      if(ruinX){
        sum = sum + 1
      }
    }

    # prob of ruin of A
    prob1 = sum1 / length
    # prob1
    # prob of ruin of B
    prob2 = sum2 / length
    # prob2
    # prob of ruin of the merged firm
    prob = sum / length
    # prob
    # prob of ruin of A AND B
    prob3 = sum3 / length
    # prob3

    plot(k1, type = 'l', xlab = 'days', ylab = 'available capital', ylim =c(min(k2, k, k1),max(k2, k, k1)))
    lines(k2, col = 'red')
    lines(k, col = 'purple')
    abline(0, 0)
    0

    Add a comment


  3. How to estimate PI when we only have R and the formula for the surface of a circle (Surface = PI * r * r)?

    The estimation of this number has been one of the greatest challenge in the history of mathematics. PI is the ratio between a circle's circumference and diameter. The first estimation I have heard about has been done by an Egyptian 3700 years ago. His estimation was around 3.16. Which is fairly accurate for a 3700-year-old estimation.

    In this post I propose an easy way to estimate PI. Though it is far from being the most efficient estimation ever made, it is a good opportunity to introduce Monte Carlo simulation in the blog.

    Monte Carlo simulations are methods to estimate results by repeating a random process. The limit of this method is the source of randomness in the results. By construction of these methods, it cannot be mathematically proved, but only "confidence interval" results. Besides, most of these methods are very slow if we want to obtain a fairly accurate result. The optimization of Monte Carlo is an amazing area which is really useful in many disciplines. 

    Again, the method I program here, is far from being efficient, it is only a basic of the Monte Carlo theory and I think one of the easiest illustrations.

    Let's explain how it works.

    As we know, if r is the radius of a circle, then the surface of this circle is PI * r * r. Therefore, if we know the radius of a circle, and if we can estimate the surface of this circle, we can estimate PI. How can we estimate the surface of a circle with a random method. The answer is found in the square! Yes, in the square. Especially in the following figure. 


    A circle in a square is our solution. Indeed, this is very easy to know the surface of a square. It is the square of the length of one of its sides. Do you see where this is leading us? What if we were able to estimate the rate between the surface of the circle and the surface of the square? Then we would know the surface of the circle.

    Indeed, 
    surfaceCircle = (surfaceCircle / surfaceSquare) * surfaceSquare. 



    Finally, we just need to estimate (surfaceCircle/surfaceSquare). And this is where Monte Carlo is useful. If we uniformly distribute k random variables in the square and count how many of them are in the circle, then we have a proxy of the quantity surfaceCircle/surfaceSquare.

    We would obtain something like the next graph. Then the proportion between the number of blue spots and the total number of spots is equal to surfaceCircle/surfaceSquare.


    I have done this experience with R. The next plot is the estimation of PI we can obtain with this method. The red line is the real value of PI. The black line is my estimator of PI according to the number of simulated random variables. As we can see, the accuracy tends to increase with the size of our sample. After 10,000 simulations, we obtain an estimation of PI of 3.1444 which is not so good. Indeed, you can already find in the Old Testament a better estimation of PI!



    Even though it is a weak estimation, Monte Carlo simulation is a very powerful method. Indeed, if the circle is now a well known mathematical object, all the mathematical concepts are not as well understood. The advantage of Monte Carlo simulation is that we do not need any analytic study of the object. This is why Monte Carlo is used in many applied areas.

     The code (R):

    simulation = function(long){
      c = rep(0,long)
      numberIn = 0
      for(i in 1:long){
        x = runif(2,-1,1)
        if(sqrt(x[1]*x[1] + x[2]*x[2]) <= 1){
          numberIn = numberIn + 1
        }
        prop = numberIn / i
        piHat = prop *4
        c[i] = piHat
      }
      return(c)
    }

    size = 1000
    res = simulation(size)
    ini = 1
    plot(res[ini:size], type = 'l')
    lines(rep(pi, size)[ini:size], col = 'red')
    0

    Add a comment

  4. I was in a party last night and a guy was totally drunk. Not just the guy who had a few drinks and speaks a bit too loud, but the one who is not very likely to remember what he has done during his night, but who is rather very likely to suffer from a huge headache today. The guy was literally randomly walking. One step on the left, another forward, then another backward. I immediately remembered one of my courses where our teacher was talking about random walk process. A random walk process, is a stochastic process which represents something moving randomly on a map. It may be something else than a map, but we will keep focus on a map to make it simpler.

    Many questions may be of interest in such a model. The question I was wondering yesterday night is "how long would it take to this guy to go out from the garden if he was really walking randomly?". Here, we consider the guy to be in a map where you can go toward South-East, North-East, South-West or North-West. Each direction being chosen with the same probability 0.25.

    Such a random process is illustrated by the following plots.

    A few steps of a long random walk for a garden with 75 units long sides. This process needed 4192 steps to stop.


    The question was "how long would it take to this guy to go out of the garden if he was really walking randomly?". Obviously, the answer depends on the size of the garden. We will consider a garden which is a square. The length of a side will vary and we will measure how many steps the process needs to reach one of the side of the garden. We then make the size of the garden change to estimate the number of steps according to the size. The following plot illustrates the number of steps needed according to the size of the garden. Without any surprise, the bigger is the garden, the longer the time our poor little guy will need. Besides, the variance of the required time seems to increase with the size of the garden.

    Now I would like to estimate with a quantitative approach these observations. I will assume that the number of required steps increases exponentially. I do an exponential regression to estimate it. In other words I assume that numberStep = a*size^b and I want to estimate the couple (a,b). In the simulation I have done, I found a = 0.759 which is not considered as significantly different from 1 (which is the neutral element for multiplication) and b = 1.91. If I use these values, I can estimate the number of steps I would need as shown by the red line in the next plot.



    The code (R):

    #The function to randomly assign the movement
    decision = function(){
      a = runif(1)
      if(a<=0.25){return(c(1,1))}
      else if(a<= 0.5){return(c(1,-1))}
      else if(a<= 0.75){return(c(-1,1))}
      else {return(c(-1,-1))}
    }

    #The simulation for different size
    long = 150
    record = 1:long
    for (max  in 1 : long){
      position = matrix(0, nrow = 2, ncol = 1)
      k = 0
      test = FALSE
      while(!test){
        k = k+1
        dec = decision()
        pos1 = position[1, length(position[1,])] + dec[1]
        pos2 = position[2, length(position[2,])] + dec[2]
        position = cbind(position, c(pos1, pos2))
        if(abs(pos1)>max | abs(pos2) > max){
          test = TRUE
        }
        record[max] = k
      }
     
    }

    plot(record, type = 'l', xlab = 'Size of the garden', ylab = 'Number of steps needed')

    # Plot of one random walk process
    max = 75
    position = matrix(0, nrow = 2, ncol = 1)
    k = 0
    test = FALSE
    while(!test){
      k = k+1
      dec = decision()
      pos1 = position[1, length(position[1,])] + dec[1]
      pos2 = position[2, length(position[2,])] + dec[2]
      position = cbind(position, c(pos1, pos2))
          mypath = file.path("U:","Blog","Post10","Plot", paste("myplot", k, ".png", sep = ""))
          png(file=mypath)
          mytitle = paste("my title is", k)
          plot(position[1,], position[2,], type = 'l', xlim = c(-max,max), ylim = c(-max, max), xlab = '', ylab = '')
          dev.off()
      if(abs(pos1)>max | abs(pos2) > max){
        test = TRUE
      }
    }

    # The exponential regression

    fit = lm(log(record)~log(1:150))

    exp(fit$coef[1])
    # a = 0.759
    fit$coef[2]
    # b = 1.91


    0

    Add a comment

  5. I'm very glad if you use or at least read sometimes my blog. However, I'd like to give you a list of wonderful blogs which are useful if you are interested in applied mathematics and programming for simulation.

    My favorite one is written by Arthur Charpentier, a French mathematician. His blog is called Freakonometrics and is a wonderful store of R codes and explanations of statistics:
    http://freakonometrics.blog.free.fr/

    R websites:

    -I can't do a reasonable list of websites without talking about R-cran the website of R. http://cran.r-project.org/. You can find nearly everything about R on this website, especially an important documentation about the software.

    -R-bloggers : http://www.r-bloggers.com/. This blog is a useful summary of many blogs related to R. If you are interested in R in general, or if you want to learn how to code in R or if you are interested in a particular mathematical issue, this website is a great hub of knowledge.

    -R-statistics (http://www.r-statistics.com/) is another valuable resource for examples of codes in R.

    -SAS and R (http://sas-and-r.blogspot.co.uk/). Even though this website is not only in R but also in SAS, I love this website. It made my life better when I was coding in SAS or in R.

    -Stackoverflow (http://stackoverflow.com/). This website is about programming in general. It is useful when you have any bug you can't solve. It's a forum where people help or ask for help about any programming problems. When you can't already find someone who had experienced the same problem than you, you can ask for help. The answers are very reactive and you don't have to wait too long before being helped.

    Mathematics websites:

    -A good blog about probability and its applications is http://probabilityandstats.wordpress.com/. I like it because funny topics are analyzed with a serious approach.

    -Gower's weblog is certainly the most well known blog about mathematics. Therefore it has to be in my list. http://gowers.wordpress.com/



    0

    Add a comment


  6. I am currently doing an internship in England. Therefore, I keep alternating between French and English in my different emails and other forms of communication on the Internet. I have been surprised to see that some websites are able to recognize when I use French or when I use English. For example, Facebook automatically proposes me to translate. I was really amazed by this ability, how can a computer know what language I am using? Especially, I use a QWERTY keyboard, which means without accent. Therefore, I don’t write in a proper French and I never use accents.

    I remembered some courses of data analysis and of computer science where the issue was to determine the group whom an individual belonged to.


    The problem:

    If I choose a text, how can I make my computer know if it’s an English text or a French text without accent? Here we try to differentiate without accent since it would be a very (too much) easy way to do it. In particular, how can I do if I can’t have access to French and English dictionaries?

    The model:

    We assume we have a sample of French and English texts which will be used as benchmarks. We call frenchText(i) the i-th French text we have, and englishText(j) the j-th English text we have. The aim is to determine if the text called TEXT is French or English.

    For every text, we count the proportion of every letter. We only take into account the 26 alphabet letters in order to keep the program as simple as possible. Then we compute the average for all the French texts and for all the English texts. In such a way we obtain the normal proportion of each letter in an “average” French (or English) text.

    Then, we count the proportion of occurrence of the letters in TEXT and we compute the Euclidean distance between TEXT and the average French text (we call this distance d(TEXT, averageF)) and between TEXT and the average English text (we call this distance d(TEXT, averageE)). If d(TEXT, averageF) is greater than d(TEXT, averageE) we consider that TEXT is more similar to the English texts and therefore is certainly English. Respectively, if d(TEXT, averageF) < d(TEXT, averageE), we consider the text to be French.

    The Euclidean distance between two vectors p and q of dimension n.

    What is interesting is that, once we have considered a first TEXT, we can, according to the decision we made, take into account TEXT in the calculation of the average letter occurrences in the French texts (respectively English texts), if it has been considered as a French text (resp. as an English text). And then we can use this more accurate average to determine the language of the new text. This is why it is called statistical learning. The more text we compute, the more likely the decision is right.

    It would be interesting to use a weighted average in order to take into account the probability of the event [TEXT is French] (resp. [TEXT is English]).

    The results:

    For the initialization, I have used two French texts and two English texts. Each text is small (about 100 words).

    Then I have tested the program on different French and English texts. The results are relevant with the languages of the texts.

    If you want to run this program, you will need the files .txt. You can find them on my Google page : https://sites.google.com/site/probaperceptionstock/, the file is a ZIP called textPost5. You can unzip and save this file on your computer in any folder you want.

    Warnings:

    This method must be use carefully, if the first tests on the first texts are wrong, then, the results are likely to be inconsistent. Indeed, a wrong proportion of letters has consequences on the determination of a text.

    Besides, we use the Euclidean distance in order to make a clean and simple presentation. However, it is far from being the best distance for such a problem. Data analysis methods, may well be more relevant. For example, as I explain in a previous post (http://probaperception.blogspot.co.uk/2012/09/v-behaviorurldefaultvmlo.html) the Mahalanobis distance could be interesting in this context.

    The code (R):


    setwd("U:/Blog/Post5")
    #you will have to change this directory according to your own folder

    countLetter = function(lowerCase,upperCase, myTable){
                if(is.na(myTable[lowerCase]+myTable[upperCase])){
                            if(is.na(myTable[lowerCase])){return(myTable[upperCase])}
                            else{return(myTable[lowerCase])}
                }
                else{return(myTable[lowerCase] + myTable[upperCase])}
    }


    proportion = function(myText){

    myTextSplit = strsplit(myText,NULL)
    table = table(myTextSplit)

    a = countLetter("a", "A", table)
    b = countLetter("b", "B", table)
    c = countLetter("c", "C", table)
    d = countLetter("d", "D", table)
    e = countLetter("e", "E", table)
    f = countLetter("f", "F", table)
    g = countLetter("g", "G", table)
    h = countLetter("h", "H", table)
    i = countLetter("i", "I", table)
    j = countLetter("j", "J", table)
    k = countLetter("k", "K", table)
    l = countLetter("l", "L", table)
    m = countLetter("m", "M", table)
    n = countLetter("n", "N", table)
    o = countLetter("o", "O", table)
    p = countLetter("p", "P", table)
    q = countLetter("q", "Q", table)
    r = countLetter("r", "R", table)
    s = countLetter("s", "S", table)
    t = countLetter("t", "T", table)
    u = countLetter("u", "U", table)
    v = countLetter("v", "V", table)
    w = countLetter("w", "W", table)
    x = countLetter("x", "X", table)
    y = countLetter("y", "Y", table)
    z = countLetter("z", "Z", table)
    total = sum(c(a, b , c, d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z), na.rm = T)

    list = c(a, b , c, d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u,v,w,x,y,z)/total
    list[which(is.na(list))] = 0
    return(list)
    }


    distance = function(myVector, myGroup){
                mean = apply(myGroup,2,mean)
                dist = 0
                for (i in 1:length(myVector)){
                            dist = dist + (mean[i]-myVector[i])**2
                }
                return(sqrt(dist))
    }

    choice = function(myVector, mypropF, mypropE){
                if(distance(myVector, mypropF) > 1.1*distance(myVector, mypropE)){
                            cat("the text is certainly English")
                            mypropE = rbind(mypropE, myVector)
                }
                else if(distance(myVector, mypropE) > 1.1*distance(myVector, mypropF)){
                            cat("the text is certainly French")
                            mypropF = rbind(mypropF, myVector)
                }
                x = vector("list", 2)
                x[[1]] = mypropF
                x[[2]] = mypropE
               
                return(x)
    }

    fileName = 'tf1.txt'
    tf1 = readChar(fileName, file.info(fileName)$size)
    fileName = 'tf2.txt'
    tf2 = readChar(fileName, file.info(fileName)$size)
    fileName = 'te1.txt'
    te1 = readChar(fileName, file.info(fileName)$size)
    fileName = 'te2.txt'
    te2 = readChar(fileName, file.info(fileName)$size)

    fileName = 'tm1.txt'
    tm1 = readChar(fileName, file.info(fileName)$size)
    propM1 = proportion(tm1)
    #French Text


    fileName = 'tm2.txt'
    tm2 = readChar(fileName, file.info(fileName)$size)
    propM2 = proportion(tm2)
    #French Text


    fileName = 'tm3.txt'
    tm3 = readChar(fileName, file.info(fileName)$size)
    propM3 = proportion(tm3)
    #French Text


    fileName = 'tm4.txt'
    tm4 = readChar(fileName, file.info(fileName)$size)
    propM4 = proportion(tm4)
    #English Text


    fileName = 'tm5.txt'
    tm5 = readChar(fileName, file.info(fileName)$size)
    propM5 = proportion(tm5)
    #English Text


    fileName = 'tm6.txt'
    tm6 = readChar(fileName, file.info(fileName)$size)
    propM6 = proportion(tm6)
    #English Text

    list = choice(propM1, propF, propE)
    propF = list[[1]]
    propE = list[[2]]

    list = choice(propM2, propF, propE)
    propF = list[[1]]
    propE = list[[2]]

    list = choice(propM3, propF, propE)
    propF = list[[1]]
    propE = list[[2]]

    list = choice(propM4, propF, propE)
    propF = list[[1]]
    propE = list[[2]]

    list = choice(propM5, propF, propE)
    propF = list[[1]]
    propE = list[[2]]

    list = choice(propM6, propF, propE)
    propF = list[[1]]
    propE = list[[2]]

    0

    Add a comment

Blog Archive
Translate
Translate
Loading