Ramón Escobedo's homepage

Infinitely many alleles Wright-Fisher model


(manuscript in preparation -- October 2010)

This collection of movies (3 minutes 20 secs each one approx.) describe the time-evolution of
the haplotype distribution of a population of N = 5.000 individuals* and a mutation rate* μ = 0.001.

    *individual = "haplotype", e.g. 0.2938746598173644, representing a segment of mtDNA (AGTGAAC...)
    *measured in "number of substitutions per individual per generation"; it equivals to the probability of mutation


1. Short description of the model [1]

The population size and the mutation rate are constant. Generations are non overlapping.
A new generation is built by constructing each of its individuals (haplotypes) as the result
of two mechanisms:

    1) Random copy: a member of the previous generation is chosen randomly
          and copied as a member of the new generation.
    2) Mutation: if a random number is larger than μ, the haplotype obtained in 1)
          is substituted by a new one which has never been used before.

Example: N = 5 haplotypes of 4 characters.
    From generation 1 to generation 2, the haplotype "0004" is lost and there are no mutations
    (the five random numbers resulted to be greater than μ), so the diversity has decreased.
    From generation 2 to generation 3, the haplotype "0005" is lost by random copy, and there
    is one mutation which introduces a new haplotype, thus preserving the diversity.

         

The first mechanism tends to homogenize the population by reducing the number of haplotypes,
whereas the second mechanism contributes to increaase the number of haplotypes. Surprisingly,
the combination of both mechanisms leads to a stochastic equilibrium of the mean number of
haplotypes, which can be estimated with the following expression [1,2]:
         
In fact, the frequencies of the haplotypes converge to a stationary distribution which can
be estimated by the following frequency spectrum function,
         
with which we have calculated the frequencies of the 20 most common haplotypes; see movies
bellow.

This is not an equilibrium in the usual sense. In reality allele frequencies are always changing;
new mutations continue to come into the population, and eventually they are always eliminated.
The term "steady state" is probably more appropriate for this kind of behavior, since the actual
alleles are not fixed at a constant frequency, but rather are entering and leaving the population.
The population remains at a steady state in the sense that the number of alleles and the level of
autozygosity remain fairly constant. If the number of alleles and the level of autozygosity do not
change much, then it is reasonable to assume that there is also a steady-state distribution of
alleles frequencies. By this we mean that the most common allele always has a frequency P1,
and the next most common has a frequency of P2, and so on. [...E]ven though the most common
allele is expected to have a frequency of P1, the "identity" of the most common allele is expected
to change with time.
Hartl, D. & Clark, A. G. Principles of Population Genetics. Sinauer Associates Inc. 2nd Ed. (1989).

Two initial conditions are considered:
    1) a single founder haplotype: 0.00000000001 = 10-16,
    2) a population of N different haplotypes: from 10-16 to N x 10-16.

For each case, we represent the evolution of the haplotype distribution
during the first:
    a) 2.000 generations (where each generation is depicted),    
    b) 20.000 generations (represented each 10 generations),    
    c) 100.000 generations (represented each 100 generations).

The movies illustrate how this equilibrium is reached, starting from each different initial condition,
and how haplotypes evolve and adjust themselves to maintain it.

2. The movies

There are two sets of movies: A. frequencies, B. ages.

In Sec. A, we represent the distribution of allele frequencies (vertical red/green lines) ordered
from left to right in decreasing order. Only the 20 higher frequencies are shown, together with
the number of alleles at each frequency. Color lines switch from red to green and vice versa each
time the oldest allele goes extinct.

- An arbitrary haplotype chosen randomly is also depicted in yellow, to allow the tracking of its
evolution. When the yellow haplotype goes extinct, another one is chosen randomly and depicted
with this color. No yellow haplotype means that it is not one of the 20 most common haplotypes.

- Horizontal blue lines denote the estimated frequency spectrum calculated with Ewens' expression.

    Snapshot of the 20 larger allele frequencies at generation 5890, in decreasing order
   
     (right click on the picture and "see image" for a larger version)

In Sec. B, an alternative representation is shown where the numerical value of the haplotypes
is used as an indicator of the age of the haplotype, that is, the generation in which it appeared.
The (estimated) number of new haplotypes that appear in each generation is μ N. Denoting
by Ai the numerical value of the i-th haplotype of the generation t, the (stochastic) mean age of
this haplotype can be estimated by

         
where N0 is the number of haplotypes of the first generation (1 or N).

Two interesting (new?) observations are in order:

    1. The mean age of the oldest haplotype is constant and equal to the renovation time.
        (it makes sense; the renovation time is the mean number of generations until all
        individuals currently existing in the population have been replaced by new haplotypes
        not currently existing in the population; see [1,2])

    2. The mean number of previous generations which have at least one representant in
        the actual generation is constant (we say that an haplotype represents a generation if
        it appeared precisely in this generation).

We have represented the instantaneous value in the current generation and the mean value
over all passed generations of these two magnitudes in the upper central part of the movies.

    Haplotype frequencies (all) at generation 5890 ordered by their normalized age
   
     (right click on the picture and "see image" for a larger version)

The links to the movies are pictures (JPG) representing an intermediate state of the movie.
To see the image, right-click on the image and select "see image".

  Index of movies:   the pictures are links to

      A. Most frequent haplotypes -- frequency spectrum
            1. Single founder initial condition
                        a) 2.000 generations (1 by 1)
                        b) 20.000 generations (each 10)
                        c) 100.000 generations (each 100)
            2. Uniform initial condition (N different haplotypes)
                        a)   b)   c)   idem

      B. Haplotypes ordered by age
            1.,   2.   and   a)   b)   c):   same as before

A. Most frequent haplotypes (20 out of 62)

      A1. Single founder initial condition
            A1a) First 2.000 generations (all)
           
  A1b) From 10 to 20.000 (each 10)
 
  A1c) From 100 to 100.000 (100)
 
      A2. Uniform initial condition (N different haplotypes)
            A2a) First 2.000 generations (all)
           
  A2b) From 10 to 20.000 (each 10)
 
  A2c) From 100 to 100.000 (100)
 

B. Haplotypes ordered by age (all the haplotypes are represented)

      B1. Single founder initial condition
            B1a) First 2.000 generations (all)
           
  B1b) From 10 to 20.000 (each 10)
 
  B1c) From 100 to 100.000 (100)
 
      B2. Uniform initial condition (N different haplotypes)
            B2a) First 2.000 generations (all)
           
  B2b) From 10 to 20.000 (each 10)
 
  B2c) From 100 to 100.000 (100)
 

- In B, the vertical lines move to the left, towards the "older haplotypes" region. This displacement
is due to the fact that each generation the age is normalized with the older and the younger ones.
Then, as an haplotype that is already present in the population becomes older with respect to the
new ones, it moves to the left. The older haplotype doesn't move; when it goes extinct, the color
changes from red to green or vice versa, denoting the renormalization (this does not happens for
Single founder initial condition in the first 2.000 generations).

Short discussion

The infinitely alleles model was already studied in 1979 by W. J. Ewens [1], although some results
can be due (also) to Kimura, Kimura and Crow, or others. For a population of N = 5000 individuals
and a mutation rate of μ = 0.001, the classical diffusion theory provides the following estimates:
      - mean number of haplotypes in the population: 62
      - mean number of generations for total renovation of the bank of haplotypes: 3142
        (or, equivalently, mean number of generations for founder's haplotype extinction)
      - mean sojourn*: 12
        (*mean number of generations a new haplotype remains in the population before going extinct)

Essential bibliography

[1] "Mathematical population genetics", W. J. Ewens, Springer (2004), 2nd ed.
[2] "Mathematical population genetics", W. J. Ewens – Lecture Notes, Cornell University (2006)
[3] "Principles of Population Genetics", D. Hartl & A. G. Clark, Sinauer Associates Inc. (1989) 2nd Ed.


Free counter and web stats