Years ago, while I was screening Table of Content alerts for interesting stuff (I’ve been doing this once a week for years: Friday, I’m in love with newly published science), my eye was caught by one of those high-impact, very technical human genetics papers where they find a new gene underlying some very serious disease. It turned out that the newly identified causal variant accounted for 0.6% of genetic variance. Wow. Then I said to myself: hey, what do you know about how the field of disease genetics works? Even a 0.6% effect can be important, if it can save a life.
And then came the Boyle et al. paper, few days ago, on the ‘omnigenic’ genetic structure of complex traits.The paper shows that the genome is not even scattered with, it is smeared with loci controlling complex traits. Many of those loci have very small effects, and could only be detected in studies with very large sample sizes. There is nothing strange to this: the more intensely you chart the territory, the more detailed the map, and the smaller the features that appear in the map.
The vast lists of genes associated to a given trait are not particularly enriched in some functional categories; probably, many genes have an impact on many characters (they are pleiotropic) and are involved in many partially overlapping regulation networks, as suggested by the fact that many causal variants happen to be in regulatory regions.
Indirectly, this also suggests that candidate-gene strategies may have a problem (you’ll certainly find a particular category of genes to be associated with the trait: all categories are…), as well as looking for relevant variants only in coding regions (well, we all knew this, did we not?).
But let us go back under the canopy.
In trees, it is quite common that traits diverge across populations (forest scientists call them “provenances”: the word is the heritage of the strategy of planting multiple populations together to assay their performances), but gene frequencies do not. This phenomenon is well described by Kremer and Le Corre (2003, 2012 (1), 2012 (2)) and suggests that adaptation (and therefore the control of underlying adaptive traits) is highly polygenic. There you are. One can expect that, given enough power, association studies in trees will end up detecting very large numbers of small signals, too.
There are very few GWAS’s in trees (Fahrenkrog et al. (2016), for example, has about 18,000 genes mapped from a population of about 400 trees. I can hear the average human (or Arabidopsis) geneticist sneering).
One good reason is that to perform a GWAS you first need a G[enome], and there are not so many tree genome sequences so far. Other reasons may be less clear, except for the argument that GWAS is still relatively expensive and forestry is not the research field that attracts the largest funds. Forest GWAS studies in which millions of trees are screened are even rarer (read: non-existent). Plus, if you are looking for weak effects, you must be very careful about how you pick your sample: even slight, undetected differences in the ontogeny (developmental path: how the trees have grown) or environmental conditions can have a larger effect than weak genetic differences, which will be drowned in the background noise.
Yet, there is hope.
From the evolutionary point of view, small effects may be more relevant than when trying to predict susceptibility to a disease. First because, as stated above, they can have a cumulative effect on fitness (even without accounting for interactions, which could amplify their effects); and secondly, because selection is a powerful force (we have commented on this before, haven’t we?) and can lead to major allele and phenotype frequency changes over relatively short time scales. As an excercise, you can compute the time to fixation for an advantageous allele starting at an arbitrary frequency, using the equations in Kimura and Ohta (1969) (NB: compare to time to fixation for a neutral allele in the same configuration to check what the effect of selection really is). You can play around with the formula using this simple code that I have written in R:
#Calculations from Equations (17) and (14) in:
#Kimura M, Ohta T. 1969.
#The Average Number of Generations until Fixation of a Mutant Gene in a Finite Population.
#Genetics 61: 763–71.
#Ne * s = S
S<-Ne * s
#starting frequency of positively selected allele
# Equation (17): fixation of selected allele
#function to integrate for term J(1)
(exp(2*S*csi)-1)*(exp(-2*S*csi)-exp(-2*S)) / (csi*(1-csi))
#function to integrate for term J(2)
((exp(2*S*csi)-1)*(1-exp(-2*S*csi))) / (csi*(1-csi))
#coefficient for the integrals J(1), J(2)
Jcoef<- 2 / (s*(1-exp(-2*S)))
uP<-(1 – exp(-2*S*p)) / (1 – exp(-2*S))
#average time to fixation under selection
t1p<- Jcoef * integrate(J1der, lower = p, upper = 1)$value + ((1-uP)/uP) * Jcoef * integrate(J2der, lower = 0 , upper = p)$value
# Equation (14): fixation of a neutral allele
#average time to fixation (neutral)
Moreover, part of the ‘omnigenic’ effect is caused by linkage disequilibrium, which extends over much larger spans in humans than in most tree species. In trees, it is less likely that a variant correlated to a causal SNP will also appear as a causal SNP.
Another consideration: for reasons of power and efficiency, many rare variants are eliminated from GWAS studies through the cruel MAF (minimum alelle frequency): anything with a frequency under a certain threshold is usually thrown out. This makes sense, because some of them may be artefacts, and anyway statistical power at those loci will be low. But what if they have a large effect? In natural tree populations, rare variants with large effects may be just waiting for selection to pick them up. I, for sure, do not throw them away!
And finally: first things first. Let us find major effects, if they are there (some have already started: see Sam Yeaman’s et al. Science paper as a brilliant example), and then we’ll scratch our heads with minor ones.