Infinitely many alleles Wright-Fisher model

(manuscript in preparation -- October 2010)

This collection of movies (3 minutes 20 secs each one approx.) describe the time-evolution ofthe haplotype distribution of a population of

*individual = "haplotype", e.g. 0.2938746598173644, representing a segment of mtDNA (AGTGAAC...)

*measured in "number of substitutions per individual per generation"; it equivals to the probability of mutation

1. Short description of the model [1]

The population size and the mutation rate are constant. Generations are non overlapping.

A new generation is built by constructing each of its individuals (haplotypes) as the result

of two mechanisms:

1) Random copy: a member of the previous generation is chosen randomly

and copied as a member of the new generation.

2) Mutation: if a random number is larger than μ, the haplotype obtained in 1)

is substituted by a new one which has never been used before.

Example: *N* = 5 haplotypes of 4 characters.

From generation 1 to generation 2, the haplotype "0004" is lost and there are no mutations

(the five random numbers resulted to be greater than *μ*), so the diversity has decreased.

From generation 2 to generation 3, the haplotype "0005" is lost by random copy, and there

is one mutation which introduces a new haplotype, thus preserving the diversity.

The first mechanism tends to homogenize the population by reducing the number of haplotypes,

whereas the second mechanism contributes to increaase the number of haplotypes. Surprisingly,

the combination of both mechanisms leads to a stochastic equilibrium of the mean number of

haplotypes, which can be estimated with the following expression [1,2]:

In fact, the frequencies of the haplotypes converge to a stationary distribution which can

be estimated by the following *frequency spectrum function*,

with which we have calculated the frequencies of the 20 most common haplotypes; see movies

bellow.

This is not an equilibrium in the usual sense. In reality allele frequencies are always changing;

new mutations continue to come into the population, and eventually they are always eliminated.

The term "steady state" is probably more appropriate for this kind of behavior, since the actual

alleles are not fixed at a constant frequency, but rather are entering and leaving the population.

The population remains at a steady state in the sense that the number of alleles and the level of

autozygosity remain fairly constant. If the number of alleles and the level of autozygosity do not

change much, then it is reasonable to assume that there is also a steady-state distribution of

alleles frequencies. By this we mean that the most common allele always has a frequency P1,

and the next most common has a frequency of P2, and so on. [...E]ven though the most common

allele is expected to have a frequency of P1, the "identity" of the most common allele is expected

to change with time.

Hartl, D. & Clark, A. G. Principles of Population Genetics. Sinauer Associates Inc. 2nd Ed. (1989).

Two initial conditions are considered:

1) a single founder haplotype: 0.00000000001 = 10^{-16},

2) a population of *N* different haplotypes:
from 10^{-16} to N x 10^{-16}.

For each case, we represent the evolution of the haplotype distribution

during the first:

a) 2.000 generations (where each generation is depicted),

b) 20.000 generations (represented each 10 generations),

c) 100.000 generations (represented each 100 generations).

The movies illustrate how this equilibrium is reached, starting from each different initial condition,

and how haplotypes evolve and adjust themselves to maintain it.

2. The movies

There are two sets of movies: A. frequencies, B. ages.

In **Sec. A**, we represent the distribution of allele frequencies (vertical red/green lines) ordered

from left to right in decreasing order. Only the 20 higher frequencies are shown, together with

the number of alleles at each frequency. Color lines switch from red to green and vice versa each

time the oldest allele goes extinct.

- An arbitrary haplotype chosen randomly is also depicted in yellow, to allow the tracking of its

evolution. When the yellow haplotype goes extinct, another one is chosen randomly and depicted

with this color. No yellow haplotype means that it is not one of the 20 most common haplotypes.

- Horizontal blue lines denote the estimated frequency spectrum calculated with Ewens' expression.

(right click on the picture and "see image" for a larger version)

In **Sec. B**, an alternative representation is shown where the numerical value of the haplotypes

is used as an indicator of the **age** of the haplotype, that is, the generation in which it appeared.

The (estimated) number of new haplotypes that appear in each generation is *μ N*.
Denoting

by *A _{i}* the numerical value of the

this haplotype can be estimated by

where *N*_{0} is the number of haplotypes of the first generation (1 or *N*).

Two interesting (new?) observations are in order:

1. The mean age of the oldest haplotype is constant and equal to the renovation time.

(it makes sense; the renovation time is the mean number of generations until all

individuals currently existing in the population have been replaced by new haplotypes

not currently existing in the population; see [1,2])

2. The mean number of previous generations which have at least one representant in

the actual generation is constant (we say that an haplotype represents a generation if

it appeared precisely in this generation).

We have represented the instantaneous value in the current generation and the mean value

over all passed generations of these two magnitudes in the upper central part of the movies.

Haplotype frequencies (*all*) at generation 5890 ordered by their normalized *age*

(right click on the picture and "see image" for a larger version)

The links to the movies are pictures (JPG) representing an intermediate state of the movie.

To see the image, right-click on the image and select "see image".

Index of movies: the pictures are links to

A. Most frequent haplotypes -- frequency spectrum

1. Single founder initial condition

a) 2.000 generations (1 by 1)

b) 20.000 generations (each 10)

c) 100.000 generations (each 100)

2. Uniform initial condition (*N* different haplotypes)

a) b) c) *idem*

B. Haplotypes ordered by age

1., 2. and a) b) c): same as before

**A. Most frequent haplotypes** (20 out of 62)

A1. Single founder initial condition

A2. Uniform initial condition (
A1a) First 2.000 generations (all)

A1b) From 10 to 20.000 (each 10)

A1c) From 100 to 100.000 (100)

*N* different haplotypes)

A2a) First 2.000 generations (all)

A2b) From 10 to 20.000 (each 10)

A2c) From 100 to 100.000 (100)

**B. Haplotypes ordered by age** (all the haplotypes are represented)

B1. Single founder initial condition

B2. Uniform initial condition (
B1a) First 2.000 generations (all)

B1b) From 10 to 20.000 (each 10)

B1c) From 100 to 100.000 (100)

*N* different haplotypes)

B2a) First 2.000 generations (all)

B2b) From 10 to 20.000 (each 10)

B2c) From 100 to 100.000 (100)

- In B, the vertical lines move to the left, towards the "older haplotypes" region. This displacement

is due to the fact that each generation the age is normalized with the older and the younger ones.

Then, as an haplotype that is already present in the population becomes older with respect to the

new ones, it moves to the left. The older haplotype doesn't move; when it goes extinct, the color

changes from red to green or vice versa, denoting the renormalization (this does not happens for

Single founder initial condition in the first 2.000 generations).

can be due (also) to Kimura, Kimura and Crow, or others. For a population of

and a mutation rate of

- mean number of haplotypes in the population: 62

- mean number of generations for total renovation of the bank of haplotypes: 3142

(or, equivalently, mean number of generations for founder's haplotype extinction)

- mean sojourn*: 12

(*mean number of generations a new haplotype remains in the population before going extinct)

[2]

[3]