Observe.Think.Touch Nature

April 18, 2012

Does positive selection reduce evolutionary variances of gene frequencies?

Filed under: Discussions — hebin @ 9:29 pm

During this week’s discussion about Van Valen and another author’s paper on mis- and/or over-use of hypothesis testing, one of us suggested that the one chosen to be the null hypothesis is usually broader, more general, and possess less predicative power. We used the example of neutral vs. positive selection. He suggested that since under neutral model, the frequency of an allele can go either up or down, while under positive selection, it is biased towards going up, the latter should result in less evolutionary variances of the frequencies after t generations. My intuition led me to think otherwise. Afterwards I can’t resist the temptation to do some simulation regarding this question (another night wasted…) Here are the results, which proved my intuition but also brought up some interesting points, in my view:

First, the setup of the simulation. I did Wright-Fisher simulation (i.e. discreet generations) in a diploid population of 10,000 individuals (code attached in the end). Initial frequency is assumed to be 25%. Fitness effect is assumed to be additive. I ran the simulation for 2,000 generations * 100 runs under each scenario, that is 4Ns = 0 (neutral), 20 or 100 (positively selected). Notice that 4Ns > 10 is usually considered as selection dominating.
Now the result summarized in a figure.

The result suggest that positive selection doesn’t necessarily produce a tighter frequency distribution after 2,000 generations (2,000 is chosen just to make sure that the majority of the simulation runs do not hit the 0 or 1 boundary, although this is not the case when 4Ns = 100, where most of the simulations end up with fixation events, resulting in a small variance).
This result can be understood from several aspects. First of all, one should notice that positive selection is NOT a “more complex” model, with more parameters or more restricted parameter space than the neutral model. In fact, for the central parameter, the selective coefficient 4Ns, the neutral model contains just one point in its parameter space, which is 0; while under the positive selection model, 4Ns can take any value greater than 0. The bottom line is, these two models are complementary, rather than one being a super- or sub-set of the other.
Another thing to notice is that the source of stochasticity has nothing to do with selection. Rather it comes from the sampling due to finite population sizes. To see this, let’s go to the extreme of assuming an infinite population size, in which case one can perfectly predict the frequency of an allele under both neutral (stay the same) and positive selection models (exponential). Therefore, it is clear that the stochasticity purely comes from the sampling process, and thus the variance of frequency after t generations should not depend on the neutral vs. selection models. So what does it depend on? My intuition is that it depends on the frequency (or frequencies that the allele go through during the process, i.e. the trajectory). If you are curious, just run the attached code with different initial frequencies (I’ve tried 10% and 50%) and see the results.
Finally, in the fourth panel of the figure, I showed results where rather than fixing the selection coefficient, which is a parameter we typically don’t know, I randomly generated an S = 4Ns for each simulation run from a uniform distribution from 10 to 100. As expected, this uncertainty about S greatly increased the level of variance.

[Code]

WFsims_multinomial = function(N_inds, initfreq, Fitness, runlength=10*N_inds)
#adapted from Jonathan Pritchard's code for a class
#This function simulates genetic drift and selection in a population by doing multinomial sampling
#N_inds = number of individuals (make this even)
 #initfreq = starting frequency of the 1 allele (number of copies: 0<initfreq<2N_inds)
 #Fitness[] = relative fitnesses of the 3 genotypes
#the two alleles are coded here as 0 or 1
#eg try parameters:
 # WFsims_multinomial(N_inds=1000, initfreq=10, Fitness = c(1.0, 1.05, 1.10))
 {
 #initialize starting array of genotypes.
 #'Parents' is an array of length 3 that stores the number of individuals with each genotype
 #'freq' is the current allele frequency of the 1 allele
 SaveFreqs = numeric()
freq = initfreq/(2*N_inds)
#plot(NA,NA,xlim=c(0,runlength), ylim=c(0,1), xlab="generation", ylab="frequency")
 for (generation in 1:runlength)
 {
 ## assuming HWE
 geno_freqs = c( (1-freq)*(1-freq), 2*freq*(1-freq), freq*freq)
 geno_weights = geno_freqs * Fitness
 Parents = rmultinom(1, N_inds, geno_weights)
freq = (Parents[2] + 2*Parents[3])/(2*N_inds)
#print(c(generation, freq)) #print generation and current allele freq
SaveFreqs[generation] = freq #store allele freqs
## plot frequencies
 #points(generation, freq, cex=0.2)
 if (freq==1 || freq==0)
 return(SaveFreqs) #break out of the function if an allele fixes
 }
 return(SaveFreqs)
 }
## plot the trajectories
 png("neutral_positive_diffusion_variances.png", width=800, height=800)
 layout(mat=matrix(1:4,nrow=2,byrow=T))
 par(mar=c(4,4,4,2),mgp=c(2,0.8,0))
 plot(NA,NA,xlim=c(0,2000), ylim=c(0,1), xlab="Generation", ylab="Frequency", main="Neutral", cex.lab=1.5)
 ## neutral ##
 pt <- numeric(0)
 for(i in 1:100){
 SaveFreqs <- WFsims_multinomial(1e4, 5e3, c(1,1,1), 2000)
 lines(SaveFreqs, col=rgb(0,0,0,0.2))
 pt[i] <- SaveFreqs[length(SaveFreqs)]
 }
 mtext(text=paste("Variance =",round(var(pt),4)),line=.2)
 ## positive ##
 for(S in c(20,100)){
 plot(NA,NA,xlim=c(0,2000), ylim=c(0,1), xlab="Generation", ylab="Frequency", main=paste("4Ns =",S))
 for(i in 1:100){
 s = S / 4e4
 SaveFreqs <- WFsims_multinomial(1e4, 5e3, c(1,1+s,1+2*s), 2000)
 lines(SaveFreqs, col=rgb(1,0,0,0.2))
 pt[i] <- SaveFreqs[length(SaveFreqs)]
 }
 mtext(text=paste("Variance =",round(var(pt),4)), line=.2)
 }
 plot(NA,NA,xlim=c(0,2000), ylim=c(0,1), xlab="Generation", ylab="Frequency", main="4Ns ~ U[10,100]")
 s <- runif(100,10,100)/4e4
 for(i in 1:100){
 SaveFreqs <- WFsims_multinomial(1e4, 5e3, c(1,1+s[i],1+2*s[i]), 2000)
 lines(SaveFreqs, col=rgb(139/255,90/255,0,0.2))
 pt[i] <- SaveFreqs[length(SaveFreqs)]
 }
 mtext(text=paste("Variance =",round(var(pt),4)), line=.2)
 dev.off()

April 16, 2012

Null hypothesis and hypothesis testing

Filed under: Discussions,Papers — hebin @ 8:54 pm

Two readings for this week’s evoprocesses discussion.

YoccozEsaBull91_pvalues

Hypotheses and Predictions Van Valen

[Notes]

First, a bit of background on the Neyman-Pearson’s approach to hypothesis testing vs. that of R. A. Fisher. In short, the former emphasizes a fixed level testing, i.e. a pre-determined test level (type I error rate) before doing the experiment. Consequently, one merely reports significant or not significant and the level at which the test was conducted. In that school of thought, it is considered cheating if one first looks at the data and then determines the level of testing (It is in fact a very common practice: one calculates the p-value, then report the MOST SIGNIFICANT level among the several widely used thresholds that the test can reach, such as 0.05, 0.01 or 0.001. This is, however, cheating. Further reasons will be given below). R.A. Fisher’s view, however, is that instead of doing a significance test, one can use p-value merely as a measure of statistical evidence against the null hypothesis and that by reporting the p-value, rather than a conclusion based on a predetermined significance level, the interpretation is left to the readers.

One common misinterpretation of p-values is that they are the probability of the null hypothesis being true given the data. It is wrong because in reality, p-values defies any interpretation in the limitation of the given experiment. Rather, it is derived in a strictly frequentist’s way by imagining running the experiment over and over again and counting the frequency of obtaining a test statistic that is as large as (or small as) the observed, or more extreme than it under the null hypothesis. Since this calculation includes situations of more extreme values than the observed, it is wrong to apply the p-value narrowly on the current observation.  In fact, in the concept of q-values, one evaluates the delta interval right around the observed value to estimate false discovery rate.

Since hypothesis testing is by design asymmetrical with respect to the null and alternative hypotheses, by doing so one over-emphasizes the null hypothesis, which is often not interesting to the researchers, which limits the scope of biological insights one may get from analyzing the data. As Van Valen argues, if one views neutral evolution not as a preferred explanation (which is what one does when putting it as the null hypothesis), which puts all the burden on proving adaptation did happen, but rather view both neutral evolution and adaptation as equally likely explanations, one would elect to use methods such as likelihood ratios, or estimate confidence intervals, to evaluate which hypothesis is better supported by the data. He also suggests that when one hypothesis is indeed favored a priori, one should incorporate such prior information through a Bayesian framework, no matter how crude such prior information may be. One more word related to Bayesian decision theory, which is inspired from the following paper I’m going to talk about: in economics and other areas, decision is often made by using a cost function, i.e. one needs to explicitly define the “cost” of making a type-I error, along with the cost of making a type-II error (falsely accepting the null), and only in that case could one make informed decisions. In biology, especially evolutionary biology, however, the cost of making a type-I error is often undefined…

Nigel Yoccoz’s article listed several points of what he considered common mis-practices of statistics in ecology and evolution. The first concerns reporting merely the statistical significance, but not the biological one. This I understands as reporting the p-value without the actual difference in the mean, for example. In my words, he is suggesting a routine report of effect sizes along with the statistical significance. Another thing he suggest doing is to report confidence intervals instead of significance levels. I actually think this is a good idea, as the confidence interval include in it both the significance levels and a measure of the effect size. There is a potential confounding factor, however, when multiple testing is huge. Winner’s curse would prevent one from estimating the effect size from the same sample where candidates were chosen based on statistical tests. In such scenarios, one could only talk about significance, but not the effect sizes.

One question that remains is, why is Bayesian approaches not more commonly practiced, even though it has been around for decades and have been hot research topics in statistics.

August 25, 2011

GWAS review from Barbara Stranger

Filed under: Discussions,Papers — hebin @ 10:28 am

Stranger, B. E., Stahl, E. A. & Raj, T. Progress and promise of Genome-Wide association studies for human complex trait genetics. Genetics 187, 367-383 (2011). URL http://dx.doi.org/10.1534/genetics.110.120907.

“As with other recent GWAS discoveries, the loci validated in STAHL et al. (2010)have modest effect sizes (OR 1.1–1.3; Figure 3). On the basis of their ORs and allele frequencies, we can calculate the proportion of phenotypic variance explained in RA for each SNP under a liability threshold model (FALCONER and MACKAY 1996) and these can be assumed to sum to the total percentage of variance explained by validated RA risk alleles. Figure 3 shows that additional GWAS discoveries contribute little to the total variance explained, which seems to reach a plateau at 15–16% (RAYCHAUDHURI 2010).”

So there is still substantial amount of “missing heritability” that is not due to power issue?

August 12, 2011

A maybe naive question about insulin

Filed under: Discussions — hebin @ 5:58 pm

Why we need insulin to tell us glucose level? Being a hormone, isn’t it the same if our cells have glucose sensors instead of insulin sensors?

And why we always talk about insulin malfunction, but not its counterpart : glucagon?

August 9, 2011

Genetic Load — an old problem revisited

Filed under: Discussions — hebin @ 10:50 am

This is an old problem for the field — it was first raised by JBS Haldane in the 1950s, followed by Kimura in the 50 and 60s, during which time Kimura also proposed his neutral theory which is partly based on load argument.

It is also an old problem for me: it is part of my prelim exam, during which time I’m supposed to understand this issue and give a comprehensive overview of it. But apparently I didn’t manage to do so at that time almost four years ago, since when this problem is mentioned again at this year’s SMBE, I cannot make sense why there is a genetic load associated with adaptation. And that’s why I’m revisiting this problem for myself here.

First I should make it clear that I can understand genetic load due to deleterious mutations: if every generation 1 in 10 individuals will carry a lethal mutation (think about childbirth death in any species), the population must produce 110% offsprings relative to its population size in order to maintain at a constant size.

The idea of substitution load, as Kimura named it, can be visualized as follows:

Imagine a population of size 10 — think of 10 boxes to be filled. Let’s assume the original genotype is represented by a white ball, hence 10 white balls in the 10 boxes. Now one advantageous mutation happens, to turn one of the 10 white balls into a red one. If each individual merely produce one offspring, then the only way for the advantageous allele to spread is to wait for a second mutation event. But that’s simply impossible since for mutation to occur at the same site (or class of sites) out of a genome of millions to billions of sites is an astronomically small probability. Therefore adaptation can only occur if there is a reproductive excess, i.e. each individual produce more than one offsprings. This is true in almost every species on earth, but to varying degrees — virus can produce a progeny of millions, while large mammals produce on the order of 10 offsprings in their life time. Whatever the degree, this excess gives the chance for adaptation, because if the red ball can produce more offsprings, or if its offsprings survive better to adulthood, or if the survived offsprings have more success in mating so that they had more chances to reproduce, when any of these three, we say there has been natural selection (fertility, viability and sexual selection, respectively).

The concept of substitution load stems from the calculation of the exact quantity of the reproductive excess necessary for the “allele replacement” process to occur. Another related idea, “selective death” is complementary to those individuals that make to the 10 boxes. It refers to those individuals that failed to survive or reproduce. Therefore they represent a “reproductive waste”, since the energy contributed to these individuals didn’t succeed. Kimura calculated that in order for one replacement in every two generations, each parent has to produce 3 millions of offsprings in a constant-size population. That is to say, each parent needs to invest in 3 millions of offsprings, the majority of which, for the inferior genotype, will be simply “wasted”. Since this doesn’t fit our knowledge about the mammalian species (it is, however, totally consistent with many bacteria or virus species, where one individual virus can produce a progeny of thousands or millions of particles), Kimura proposed that most of the fixed DNA differences between species should be selective neutral. It is necessary to make it clear that the difference between a neutrally fixed and an adaptively fixed mutation is mainly in their time to fixation. The former takes much longer time (on average 4N generations, where N is the effective population size), while the later takes 1/2s on average, where s is the selective coefficient. Since 2Ns needs to be greater than 5 in order for us to consider the allele to be under positive selection, the time to fixation is therefore at least 5 times longer for a neutral allele.

The opponents of the substitution load raised several possibilities. One of them is that the adaptation process is accompanied by an increase in the population size. Imagine that the new advantageous allele allows the individuals to explore previously unaccessible resources. As a result their number can grow, instead of “replacing” the old ones. In this case the load will be much smaller if not non-problem.

Something I still don’t understand quite well is the effect of epistasis and linkage between different alleles, which can be both selected for or some are selected against, on the quantity of load. This sounds like an interesting question to pursue further.

July 31, 2011

SMBE 2011 @ Kyoto — Conference Extract

Filed under: Discussions,Seminars — hebin @ 11:24 pm

Day 1 (7/27)

John Novembre – Recombination rates in admixed individuals revealed by ancestry-based inference
The basic idea is to utilize unrelated individual for estimating recombination rate. Here “unrelated” is relative to family data. It’s just like GWAS vs. Linkage mapping. Essentially what the method does is to take admixed individuals between two ethnic groups and inferring the recombination events in their genome by an HMM model utilizing two reference genome from both parental population. Here it is easy to collect large amounts of individuals, unlike in family based studies. But the inference is indirect and more model-based.

– My session –

Chao Lin — He talked about an experimental work designed to test the relationship between population fitness and compensatory evolution, although the details are blurred in my mind now. needs to read further.

Hideki Innan — he talked about one of his earlier work on the rate of compensatory evolution, modeled after Kimura’s 1985 work, but under a different weak selection parameter regime. His conclusion is that the rate of compensatory evolution is very limited.

Deepa Agashe — she works with Allan Drummand and Christ Marx. The most striking in her work is that the strain of bacteria carrying a genetically modified enzyme protein, for which she changed the original 50% frequent codon and 5% rare codon composition to 100% frequent codon, had nearly as poor fitness as the knockout strain for that enzyme. A number of other constructs also go in various unexpected directions. She has tried to look at mRNA secondary structure for her modified enzyme and said she didn’t find any clue there. I don’t have a good idea of what has gone wrong. But I would try to do all kinds of experiments related to the engineered gene product at both mRNA and protein level, to discover the potential mechanism.

My presentation went reasonably well. Met Alan Moses, who acknowledged the contribution of this work and we chatted about the importance of pursuing the question about selection on weak binding sites.

– end of my session –

(the rest are not ordered)

Alan Moses — His recent work modeled after his previous studies on TFBS turnover, but now turned to post-translational modifications, namely phosphorylation. He first established that phosphorylation sites have specific motifs and something I didn’t know before — these sites are preferentially located in “disordered regions” in a protein, where protein-protein interaction happens. The challenge here is alignment, since protein sequences in these disordered regions evolve very fast. But knowing that phosphorylation sites occur here reduce the target for searching for them, therefore leading to a lower false positive rates. His result so far suggest a deep conservation of the phosphorylation code. He also found evidence for turnover of these sites, but no knowledge about the potential selective values yet. He also suggest that the post-translational modification system, like enhancers, are also combinatorial, as there are usually multiple kinds of sites in those disordered regions, in addition to phosphorylation sites, there are motifs responsible for protein localization and degradation, which often depend on the phosphorylation state.

Andy Clark — He gave a talk on the same topic as his last talk in Chicago — effect of deleterious mutations in an exploding human population. However, I feel this time I understand more of the talk and can start to see the value of it. Essentially there are two issues underlying what he talked about: (1) the super-exponential growth in human population; (2) a sample size that is larger than the estimated Ne of the population. These two can lead to violation of the assumptions used in standard coalescence calculations. So the effect is mainly that many commonly used formula and methods no longer apply to the current day human dataset, where mis-use of them might lead to wrong and confusing results. Another effect, which is the theme of his last talk, is that in our growing population, lineages may experience little loss so that although selection should be more efficient in a larger population, the growth, i.e. non-equilibrium state, implies accumulation of large amount of deleterious mutations in the tip of the coalescence tree. The effect on GWAS could be attributed to common disease rare variants hypothesis. Since current GWAS still rely on incomplete genotyping data, they could miss all these rare variants (singletons).

Charles Robin — from Australia. He presented a super cool study on the evolution of resistance to DDT in drosophila melanogaster. He and other people have previously found alleles that confer DDT resistance in the species. Using stocks collected before or after DDT use, he clearly demonstrated the allele frequency change, with the resistant allele rising from barely detectable to almost fixed within a span of about 50 years, remarkable! The underlying genetics is more complicated than previously thought. A “beagle” element is further improved in terms of the resistance effect from a randomly inserted p-element! And the later allele is on the rise against the original resistant allele. They are currently using the DGRP lines to do GWAS, in order to discover modifier locis for DDT 3hr knockdown and 24hr mortality traits.

Zhu Yuan — from Dimitri Petrov’s lab. As part of her thesis work, she wants to assess the quality of using next generation sequencing method on pooled fly samples in order to estimate the allele frequency. This is directly related to the current trend of combining sequencing with long term artificial selection in order to reveal the rich dynamics of evolution. However, she showed that three biases could be introduced : (1) amount of DNA difference between individuals — this error is not severe, usually the amount of DNA contributed from different individuals vary within 20%. for this she suggests pooling at least 100 individuals in order to get a reliable estimate of the allele frequency. yet it seems that if this is a serious concern, the proposed pooling more individuals could only work for non-rare frequency alleles. (2) sequencing errors — this is the predominant source of errors. for this she suggests sequencing to at least 50X genome-wide coverage and filtering out low coverage regions.

Chris Illingworth — He works with Ville Mustonen. They have developed an analysis method for treating artificial selection sequencing data. See abstract below

Quantifying selection acting on a trait from allele frequency time-series

Chris Illingworth1, Leopold Parts1, Stephan Schi!els2

, Gianni Liti3, Ville Mustonen1

1Wellcome Trust Sanger Institute, UK,

2Universität zu Köln, Germany,

3University of Nottingham, U

We present a population genetic method to analyse time-series data of allele frequencies, illustrated here using data from an artificial selection experiment in a yeast population. Our method measures the consistency of a range of proposed evolutionary scenarios with allele frequency changes observed over time. Population genetic theory is utilized to formulate equations of motion for the allele frequencies under each scenario, following which likelihoods for having observed the sequencing data under each scenario are derived. Comparison of these likelihoods gives an insight into the prevailing dynamics for the system under study. Using our method we discover that about 10% of polymorphic sites evolve non-neutrally. We further identify 37 genomic regions containing one or more driver alleles, quantify their selective advantage, get estimates of local recombination rates within the regions, and show that the dynamics of the drivers display a strong signature of fitness effects going beyond additive models of selection. The combined experimental and analytical approach we present offers a paradigm for understanding evolution in a range of systems under many different evolutionary pressures

Plenary talks

Ken Wolfe’s talk is very impressive. His group characterized genome evolution in yeast, and has worked in detail the evolution of the mating locus. Gene gain and loss after WGD is their topic.

Daniel Hartle — Y chromosome variations could influence quantitative traits, perhaps not through genes on Y but epigenetically.

May 30, 2011

rpr locus, grim/skl/hid and large intergenic regions with potential regulatory elements

Filed under: Discussions,Papers — hebin @ 11:22 pm

Apoptosis (reaper and other genes) — Andreas Bergmann

[Notes]

* reaper itself doesn’t have a role in eye development, but hid does. Knock-out of rpr probably wouldn’t affect eye development.

* rpr and hid mediate cell death in response to cytotoxic stimulus.

* rpr pathway (Kuranaga 2002)

* the region between rpr and grim (where our interaction SNP lies) potentially contains regulatory elements that might co-regulate rpr, grim, skl
(Tan 2011)

  •  the authors checked within earlier modencode data (the newer data splits developmental stages), where they found no evidence of additional mRNA or ncRNA in the long (>90kb) and conserved intergenic region between grim and rpr.
  • deletion or null mutant of rpr, hid or skl doesn’t lead to ecotopic survival of Neuroblast cells, the focus of their study. However, deletion of grim mimics the phenotype of MM3/MM3, suggesting that the candidate regulatory element might belong to grim.
  • The expression regulation of the rpr, grim, skl and hid locus is probably very complex: the expression of grim and skl are often co-localized, but much less with rpr, which sits in between the other two genes.
  • Using whole embryo qRT-PCR, they didn’t discover obvious differences in the expression level of the three genes (grim, rpr, skl) between the wild type and MM3 homozygotes. they suggest that “the lack of differences in overall expression suggests that any potential regulatory element deleted in MM3 might be specific for a small number of cells …
  • They then used in-situ to examined the spatial patterns, which was not altered by the deletion. Quantification of the in-situ images suggest that MM3 affect the expression levels of the three genes, and it does so to a larger degree for grim and skl (about 40-50% reduction) compared to rpr (about 25%).

The authors found that flies of genotype MM3/MM3 as well as MM3/XR38 both exhibit abdominal Neuroblast cells proliferation, suggesting that the overlapping part of MM3 and XR38 must contain a NECESSARY element for apoptosis for the aforementioned cell types.

[Reference]

Tan, Y. et al. Coordinated expression of cell death genes regulates neuroblast apoptosis.  Development (Cambridge, England)138, 2197-2206 (2011).

Kuranaga, E. et al. Reaper-mediated inhibition of DIAP1-induced DTRAF1 degradation results in activation of JNK in drosophila. Nature cell biology 4, 705-710 (2002).

 

March 9, 2011

Michael Lässig talk and Influenza virus adaptation

Filed under: Discussions — hebin @ 11:26 am

The first part of Lässig’s talk triggered an interesting debate between him and Wen-hsiung over how many substitutions were positively selected in influenza evolution, as well as the importance of clonal interference.

Wen-hsiung’s opinion, as detailed in this paper (A. C. Shih et al PNAS 2007), propose the following points:

1) advantageous mutations occur quite frequently, instead of sporadically
2) most of the time single mutation doesn’t confer enough fitness for it to sweep through the population
3) multiple advantageous mutations accumulate, which sped up the fixation process.
4) as a result, despite the continuous generation of advantageous mutations and action of positive selection, fixations tend to occur in groups

A few side points during discussion with Wen-hsiung

1) there is not many hitchhiking. Note that this is not equivalent to lack of selective sweeps. In fact, when selective sweeps occur often and rapidly, it would purge the system of polymorphism and therefore deprived the raw genetic variation for hitchhiking.

In contrast, Michael’s view is that hitchhiking is responsible for an appreciable fraction of the fixations. His main evidence is that, in the same 40 year influenza sequence dataset, he observed fixations clustered in time and space (within the hemagglutinin gene), not only for nonsynonymous changes but synonymous changes as well. His interpretation is that synonymous mutations were dragged to fixation by linked advantageous nonsynonymous ones.

(comment: is it correct to say that selective sweeps could increase the temporal variances in the poisson process of fixation, i.e. clustered fixation events, but will not affect the long-run evolution rate. The later is a classic result, which could be easily understood by noticing that if an advantageous mutation occurs in tight linkage with a neutral polymorphism, it could occur either on the ancestral state background or the derived state background. Granted that mutations are random, the probability of the former occurring on the derived state background is just equal to the population frequency of the derived allele of the neutral polymorphism. Conditional on the advantageous mutation being fixed, the fixation probability of the linked neutral polymorphism is therefore unchanged compared to it alone.)

A further point Lässig alluded to in this part of the talk is that when adaptations occur frequently, it may also bring along deleterious mutations. As a result, while the virus is constantly evolving its antigenic protein to evade host immune system by trying out new mutations, it must also be compromising its other functions?! this is an interesting point that would merit further discussions.

In a later discussion with Michael, I brought up the question of which systems may be at evolutionary equilibrium and how to test it. Michael suggested using the fitness flux idea he developed to examine this problem. He explained that just as fitness is about the current state, one could view fitness flux as for a process. More specifically, he said that this measure links the forward and backward evolutionary process. If a system is at equilibrium satisfying detailed balance, one would expect the forward process as likely as the backward process (as many up and down moves), and therefore a zero fitness flux. However if the system is at non-equilibrium state, fitness flux may be used to measure its distance from equilibrium. An interesting idea to explore further.

March 7, 2010

Premature stop codon and translational readthrough

Filed under: Discussions,Papers — hebin @ 6:27 pm

Because Scott Rifkin’s recent nature paper utilized three mutant lines consisting of premature stop codons, I got into the question of how efficient is a stop codon and what’s the probability of read through. Furthermore, on a second thought I was quite impressed by the fact that termination of a protein is coded in such a way–you could imagine numerous ways, not necessarily a triplet, to signal the end of translation, or even some structural changes in mRNA can do that. Why a stop “codon”? And how conserved is this signal across different phyla of life? Is stop codon the only signal for stopping? (if that’s the case, then nonsense mutation would be sufficient to create a genetic null mutant, which seems to be the case). Below is what I found out:

  1. stop codons are not recognized by tRNA but by a protein (named Release Factor). There are three RF proteins in prokaryotes (RF1,RF2,RF3). RF1 and RF2 each recognizes two of the three stop codons (covering all three). RF3 helps the release. In eukaryotes there are 2, eRF1 and eRF3 (from wiki/Release Factor). eRF1 recognizes all three stop codons and hydrolyses peptidyl-tRNA.
  2. suppressor mutations in tRNA can cause the later to recognize stop codons as a sense codon and thus cause translation read through. But because most suppressor mutations are weak and they need to compete with the RF proteins, read through won’t occur at full capacity.
  3. In many genes there will be two stop codons in a row to ensure.  (what’s the percentage of genes in a species, say human, that have dual stop codon in a row?) a paper (I. Williams , J. Richardson , A. Starkey , and I. Stansfield
    Genome-wide prediction of stop codon readthrough during translation in the yeastSaccharomyces cerevisiae
    Nucl. Acids Res. 32: 6605-6616.) showed that there is a higher probability than expected to observe a second stop codon 3′ of the first one.


October 23, 2009

Discussion with Dick–On applying MK test to pooled samples of linked but not interdigitated sites

Filed under: Discussions — hebin @ 6:55 pm

There are two problems here:

  1. the sites are not homogeously interdigitated but linked regions.
  2. pool loci together

Understanding of the problem

  1. Why would it be a problem that different regions have different geneaologies?
    think in the coalescence way: if during the course of evolution, recombination has broken two pieces apart, then they could have different depth of coalescence tree. Think of this: in one region of the genome the 6 alleles coalesce very quickly while in another region they coalesce very deep, close to the species divergence, obviously they would have very different levels of neutral polymorphism. This tree depth variance is not going to affect divergence as much if divergence far exceeds polymorphism. In cases where this is not true, for example in my case the 6 sim alleles could have a coalescence tree that is almost as deep as the two species tree. — Because I’m using lineage specific changes to do MK test, the problem of variance in tree depth could adversely affect polymorphism and divergence–the deeper the tree, the more polymorphism and meanwhile the less divergence, vice versa. Thus two neutral regions can have very different polymorphism to divergence ratio depending on the tree depths.

    this problem is most significant for intermediate level of recombination between the two classes of sites. In the two extreme case, complete linkage is equal to the original setting and free recombination would imply each site is independent and thus the numbers of polymorphism and divergence would all be Poisson distributed and the test would work as well.

Thinkings and conclusions

  1. He thinks the most problematic thing is probably the variance in the relative size of enhancer and the target gene for different enhancers. Otherwise lumping them together is not going to be a serious problem
  2. His intuition is that after doing a lot of simulation and studies, we might come to the conclusion that the extremely significant p-value we have is still significant, although modified to some extent. So he suggest we do some simulations to get a flavor of it.
  1. it’s the 1767fig1
Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.