Summer school, and the teaching is easy

People with diverse background, objectives meet and mix within this particular type of course

The GenTree project held its 2018 Summer School “From genotypes to phenotypes: assessing forest tree diversity in the wild” on 4-7 June 2018 in Kaunas, Lithuania. About twenty “students” (most of them Ph.D. students, but some of them also confirmed scientists) convened to learn theory and practice of population (quantitative) genetic analysis from five teachers (including myself).

WP_20180605_013

We were warmly hosted by colleagues of the Aleksandras Stulginskis University (ASU); the course was organised by fellow forest scientist Darius Danusevičius, with essential support by his students.

The course covered a variety of subjects, from the basics of population genetics theory and the coalescent to the application of multiple programs for Genotype-Environment Association and Genotype-Phenotype Association, and included a day out in the forest, were a demonstration of the usage of drones for the survey of forest stands was held. Very interesting, with plenty of information, although sometimes with a quite steep learning curve!

A summer school has the great advantage of letting us explore new teaching – and learning – strategies, because the goals are left sometimes ‘open’ and the teachers can adjust to the students’ needs and limits (and of course, the students adjusts to teachers’ limits!).

IMG_20180603_163240
students and teachers visiting Lithuanian historical sites (Credit: ASU Kaunas)


Informal learning sessions extend out of the official program, very often during the night, when traditional tools facilitating the transmission of knowledge (slides, computer scripts, whiteboards, chalk, paper and pencil) are replaced by more unconventional ones (jokes, crisps, beers). We saw multiple teaching approaches, spanning from the “zero-electronics restless teacher” (myself: only chalk and blackboard, walked several kilometers while teaching), through the “activity-time interactive teacher” (Tanja Pyhäjärvi: having students stand and do some exercise
, and then sitting and doing some more exercises, this time through shinyApps), to “100% hands-on teaching” (Santi González-Martínez and Leo Sanchez, with their rich array of software packages, scripts, and datasets to put to test) (I cannot say what Basti Richter did with his drones out in the forest; I had to leave earlier). All this was peppered with contests (spelling out population genetics laws, presenting a piece of one’s country’s popular culture, declaring one’s favourite sports team, movie, even philosopher – which actually provided some surprises: for example, I was unaware that Donald Trump was a philosopher at all, but after all I am not a philosopher, how on earth was I supposed to know).

The mystery of number 19

At some point we thought we were close to finding some fundamental natural pattern, when the number nineteen started popping up recurrently in our life. For example, there were nineteen of us in the bus that took us from Vilnius to Kaunas, and we concurrently learned that the first Lithuanian Republic was founded in 1919. Some of us were even reported to have drunk in excess of nineteen drinks in a single evening. After having observed that there were no public seats on the otherwise very green ASU campus, we even formulated the hypothesis that there may be only nineteen public seats throughout the whole country (a short walk in downtown Kaunas allowed us tho reject the hypothesis).
After all, we dropped the idea that number 19 would contain some important meaning; so the only universally meaningful number is still number 42.

Baroque paradise

And finally, on my way back I had the opportunity to do some walking in downtown Vilnius. In my ignorance, I did not know that its city centre is a Unesco World Heritage site, and that it harbours some very nice examples of baroque architecture. Nice place, you should all go visit it.


WP_20180607_007

 

 

 

In GPS we trust

Of lotteries and men

Nowadays, everybody rests on GPS, and nobody is capable of reading a map anymore.
Good old forest plot maps, drawn with compasses and distances measured on the ground? Gone. Trees are mapped by GPS, with variable precision and success. And there is no way anybody is capable of giving you directions on the road. You got a GPS? Use it, for Hermes’ sake.

But this post is not and old man’s rant about technology.
I’m not talking about Global Positioning System.
I’m thinking of Grant Proposal Selection.

As everybody knows, the way grant proposals are chosen for funding is a lottery (when your favourite proposal is turned down) or a very meritocratic process carried out by clever, competent reviewers (when you get funded).

Yet, it must be either one or the other, or a mix of the two.
So, while I was submitting my last grant proposal, I wondered: is this all worth the effort? What’s the point of all the energy, stress, nights up moving a sentence there, changing a word here, days on the phone discussing with collaborators, if it all boils down to random outcome?
When success rates are very low, one may suspect that all the money spent by funding agencies to rank all those very good proposals is a waste: at the end, who’s in and who’s out may just be a matter of a fluke in the review process. Maybe one reviewer has had a bad night yesterday, so today he’s upset and will turn a mark down a notch, and out goes your great idea. It would be better to draw tickets from a lottery.

How can one check whether this is true, or not? To know whether the real good proposals are really the ones that have been funded, one should know beforehand which are the good ones. But this is tantamount to evaluating the proposals, which brings us back to square one.

There is a way to assess the process, though: playing games. I mean: doing some modelling.

So I set out and built a simple model, mimicking the French ANR’s selection process, which proceeds in two steps: a selection on short pre-proposal, followed by a second round of selection on full proposals which have passed the first check.
According to the data provided by the agency, in 2017 about 3500 proposals, out of about 7000, passed the pre-proposal phase, and an the end about 900 were funded, for a success rate of about 12-13%.

So I simulated “true” scores for 1000 “proposals” according to a gamma distribution.

The distribution of proposal true values looks like this:

histTrueValues
Then I supposed that two reviewers examined the proposals, each providing a score that was built by summing the true value to an “error” drawn from a gaussian with mean zero – basically, each reviewer introduced “white noise” in the score. The final pre-proposal score was the mean of the two reviewers’ scores. True to the fact that I introduced noise, the scores were dispersed around the true values.
TrueValVsEvalMean
The top half of the ranking (on the y axis, the reviewers’ scores) went on to phase two, and then the process was started all over, with a smaller error (full proposals provide more details, so it should be easier to assess their “true” value). The top 20% tier was “funded”. Then I compared the “true” scores of the winners with the final reviewers’ scores. In a perfect world, the successful proposals should be those with the best “true” values. Is this the case?

Yes and no.
Dispersal of ranks looked large: the relationship between “true” ranking of a given proposal (x axis) and its final ranking (y axis) did not look so precise, even though there was a clear trend in favour of the best proposals:

rankVsRank

How many of the proposals belonging to the top 20% post-evaluation are also in the top 20% of the “true” values? Around 70%.

selection

In other terms, approximately one third of the proposals that should have been selected have been turned down, and have been replaced by proposals that do not belong there. Is it satisfactory? Unsatisfactory? I’ll let you decide.
What is the alternative (apart from suppressing altogether this system for funding science)? Let us suppose that we skip the second round of selection, and that we randomly draw proposals to fund  from the proposals that have passed phase one. How do we fare?
random

Quite poorly. Only 16% of the funded projects are “good” ones. So, after all, there seems to be some value in the GPS (even though it can be as imprecise as a GPS under thick forest cover).

Of course, the outcome depends on the size of the error introduced by the evaluation process: increase it, and the funded projects will belong less and less to the group of the “good” ones. And I am not counting for the effect of PI previous record, nor for the fact that, if you have been funded previously, there are high chances that you’ll be funded again. Actually, a recent study – that I warmly recommend you to read – shows that, when the funding system has such a “memory”, it produces large inequality and favours luck over merit!

The R code I used to run the simulations is posted below. You can play around with it and se what happens. Enjoy the game playing – if you are not busy with a GPS.

——————–

#rule of the game:
#overall funding rate is 10%
#we start with 1000 projects.
#at first round, 50% of the proposals are selected
#at second round, 20% of the remaining projects are
#selected for funding.
#Project “true” values are gamma-distributed.
#At each step, the reviewers’ evaluation equals
#the “true” value of each project plus a white (gaussian) noise
#noise is twice as strong in the first round as in the second
#
#generating distributions of “true” project marks
nSubmitted = 1000
excludedFirstRound = 0.5
funded = 0.1
marks.gamma<-rgamma(nSubmitted,shape = 1)
#plotting:
jpeg(filename = “histTrueValues.jpg”)
hist(marks.gamma, breaks = 20, main = “True project values”,
xlab = “Project value”, col = “blue”)
dev.off()
#
#generating first round evaluator’s marks (by introducing noise)
evaluator1.noise<-marks.gamma+rnorm(n=nSubmitted, sd = 2)
evaluator2.noise<-marks.gamma+rnorm(n=nSubmitted, sd = 2)
#producing final marks (mean of two reviewers)
evalMean.noise<-rowMeans(cbind(evaluator1.noise,evaluator2.noise))
#visualising relationship:
#plotting:
jpeg(filename = “TrueValVsEvalMean.jpg”)
plot(evalMean.noise~marks.gamma,
xlab = “True Values”,
ylab = “Mean 1st round mark”,
pch = 21, bg = “red”)
dev.off()
#building a data frame
projects.df<-data.frame(seq(1,nSubmitted,1),marks.gamma,evalMean.noise)
names(projects.df)<-c(“projId”,”trueVal”,”score1stRound”)
#1st round of selection:
excluded1round<-which(projects.df$score1stRound<=quantile(projects.df$score1stRound,
probs = excludedFirstRound))
#generating 2nd round scores:
evaluator1.noise<-marks.gamma+rnorm(n=nSubmitted,sd = 1)
evaluator2.noise<-marks.gamma+rnorm(n=nSubmitted,sd = 1)
evalMean.noise<-rowMeans(cbind(evaluator1.noise,evaluator2.noise))
projects.df$score2ndRound<-evalMean.noise
projects.df$score2ndRound[excluded1round]<-NA
#computing ranks:
projects.df$trueRanking<-rank(-projects.df$trueVal)
projects.df$ranking2ndRound<-rank(-projects.df$score2ndRound)
projects.df$ranking2ndRound[excluded1round]<-NA

#let us have a look at the rankings based on true scores vs final rankings:
#plotting:
jpeg(filename = “rankVsRank.jpg”)
plot(projects.df$ranking2ndRound~projects.df$trueRanking,
xlab = “True Ranks”,
ylab = “Final evaluation ranks”,
pch = 21, bg = “aquamarine”)
dev.off()
#
projectsFunded.df<-projects.df[which(projects.df$ranking2ndRound<=nSubmitted*funded),]
projectsRandomlyFunded.df<-projects.df[sample(
which(is.na(projects.df$ranking2ndRound)==F), size = nSubmitted*funded),]
projectsBestTrueRanks.df<-projects.df[which(projects.df$trueRanking<=nSubmitted*funded),]
#how effective is the selection process?
length(intersect(projectsBestTrueRanks.df$projId,projectsFunded.df$projId))
length(intersect(projectsBestTrueRanks.df$projId,projectsRandomlyFunded.df$projId))
library(VennDiagram)
#plotting
venn.diagram(list(True = projectsBestTrueRanks.df$projId,
Selected = projectsFunded.df$projId),
filename = “selection.png”,
imagetype = “png”,
fill=c(“palegreen”,”palevioletred”))
#
venn.diagram(list(True = projectsBestTrueRanks.df$projId,
Random = projectsRandomlyFunded.df$projId),
filename = “random.png”,
imagetype = “png”,
fill=c(“palegreen”,”sandybrown”))
#

 

Did you say, “gradualist”?

How gradual must evolution be for an evolutionist to be called “gradualist”?

Today I was reading the interesting, and by all means very good, paper by Lowe et al (2017) Trends Ecol. Evol. 32: 141-152, and the Authors say in the Introduction:

“In the past decade, ecologists have embraced the concept of eco-evolutionary dynamics, which emphasizes the power of ecological selection to cause rapid adaptation and, likewise, for adaptive evolution to influence ecological processes in real time [10,11]. The perceived novelty of this concept appears to stem from the fast rate of interaction between ecological conditions and phenotypic adaptation, which contrasts with traditional, gradualistic models of adaptation.” (my highlight).

A quick question and comment crossed my mind.

All this thing of evolutionary processes being “traditionally” considered gradual is a vast hoax.

Consider uncle Charlie (Darwin) himself. Where did he get (the mechanism behind) the theory of evolution by natural selection? from fossils? rare mutations in DNA sequences?

NO. He got it from observing people around him selecting pigeons, sheeps and ornamental plants. Did that happen over millions of years? No. It happended over few generations (of pigeons!).

So the whole thing that we realised only in the last decade that evolution can happen quickly, while “traditional” science (read: conservative, backward “standard” evolutionary biologists) thought this impossible, is plain wrong.

It is true that one only (generally) finds things that she looks for. Consequently, if we think that evolution over few generations (or, for that matter, genetic divergence over few tens of meters) cannot happen, then we’ll never look for it and never find it.

I invite you reader, though, to take half an hour and go wander through the older ecological-genetic literature, and you’ll find abundant proof that people have kept looking for – and finding! – fast evolutionary shifts for at least a century. I’m not even talking about Biston betularia or the LTEE. I’m talking plenty of observations that have recurrently witnessed fast evolution everywhere.

 

It’s the demography, stupid!

Of Sharks, Giraffes and Malthus. Mostly, Malthus.

First, let me make it clear that, as far as I know, the sentence “It’s the economy, stupid!” was never used, orally or in writing, by the (Bill) Clinton campaigns. It may be a nice summary, but it was never used as such. But let us go back to our topic.

One day, I was teaching teachers who teach biology teachers how to teach biology (yes, it is a true sentence). And I asked the teachers’ teachers to spell out the mechanism of evolution by natural selection. Everybody told me: there is variation; variation is heritable; then the best individuals survive/produce more offspring and evolution happens.

That’s right. Selection is perceived as a sort of mechanism testing ‘adequacy’ (we call it ‘adaptation’) of individuals to their environment. Somehow, we’re still pretty much innately Lamarckian*, after all: we think in terms of how an individual copes with its everyday problems. While this is certainly an important component of ‘adaptation’ in general terms, and while for sure it is individuals (and not genes, or populations, or – heavens forbid – species) who survive or die, reproduce or not, the description falls short of describing the actual mechanism of evolution, because it misses an essential component.

Thinking in terms of individual properties and problems is not the exactly right way of looking at selection and adaptation. As the Australian saying goes, “when there is a shark in the water, you do not need to swim faster than the shark: you need to swim faster than the slowest swimmer” (I’ll let you generalise to the case where there are n sharks in the water, I know you can manage; and I’m sure you know a regional version of the saying, with other threats than sharks). The point is that selection is not about individual relationships to the environment, but about whether you do better or worse than somebody else.

Great_white_shark_scatters_mackerel_scad.jpg

And this brings to the fore the essential cog of selection’s machinery that many people (including biologists – except for evolutionary biologists themselves) miss: demography. It is because there are always many more siblings than the environment can carry that eventually some of them die / do not reproduce.

King_Penguins_(Youngs)

The “fitness” part of the game is, of course, that those who exploit the available resources better / better cope with stress perform better from the survival / reproduction point of view, and leave more offspring (the way the parents’ traits are inherited does not change a thing). If there were infinite space and resources, nobody would suffer from selection, and everything would behave according to neutral evolution. Darwin borrowed the idea from Malthus, as everybody knows, and this is the piece that makes the difference between any evolutionary hypothesis and the Modern Synthesis’ successful one** (in terms of explanatory power). It is because some individuals die / do not reproduce that there is adaptation. Somehow, we should be happy to observe (moderate amounts of) mortality (in forests there’s a lot of it) and unequal fecundity in populations, because this is how adaptation occurs.

The Malthusian piece of Darwin’s genius idea is understandably hard to swallow. As one of the teachers’ teachers exclaimed, after I had pointed out the strict necessity of the cruel Malthusian piece in the Theory: “oh, that’s so SAD”. Yes, life is unfair, but adaptive biological evolution happens only if there are winners and losers. Now, if I were in the losers’ camp, I’d rather prefer no evolution-by-selection to happen at all, but this is the way it is – no social or anthropocentric judgement to be attached.

 

 

*Lamarck, in spite of his post-Darwin very bad press covfefe, was a true evolutionary biologist, and a clever one at that. He lacked some important pieces of understanding of how selection works, but then again, Darwin too had silly views about heritability of traits.

** I refrain from attributing the idea entirely to dear uncle Charles for two reasons. First, he lacked the mathematical formalism to model the mechanism; second, I disagree with the identification of Evolutionary theory with one person, no matter how grateful we should all be to the genius that was Charles Darwin. After all, nobody talks in terms of “Einsteinism” or “Röntgenism”, so why should we talk about “Darwinism”?

Where have all the forest geneticists gone?

Missing mass of forest population geneticists at conferences leaves me wondering why they stay home

I’m back from a couple of conferences: the ESEB meeting in Groningen and the SIBE meeting in Rome.

Both were terrific, and both allowed me to come back home with the usual mix of excitement (for the impressive amount of good science that people do, and for the truckload of good ideas I could grab) and frustration (for not having done myself all that good science!).

Among other things, I must stress the feeling of being (at 47) among the eldest at both conferences – and this is a very positive remark: of course, one gets older and thus climbs the pyramid of ages, but I reckon that evolutionary biology conference-goers are, on average, pretty young and impressively competent. This spells good for the future of evolutionary biology!

primaryschool

Yet, I’ve been wondering throughout both conferences where all my fellow forest evolutionary biologists were hiding. Certainly, those two conferences do not focus on forests, but they do not focus on fruit flies and mice either, and I’ve been hearing plenty of talks on those critters. For sure, forest trees are not “model” species, but the share taken by model species at both conferences was, globally, very small, so there must not have been a “filter” against papers on trees. The fact is, there were very few forests across the conference landscape. Somehow, I felt slightly lonely with my forest population genetics talks and posters.

C7IVeyhXkAEwQMA.jpg:large

Yet – although I’ll provide no list, for fear of omitting somebody – I know plenty of forest scientists having provided major contributions to asking and answering overarching (*) evolutionary questions and to developing evolutionary theory: evolutionary biology is a relevant playground for forest geneticists. So why was I so lonely? Why the attendance of forest geneticists, young and old, to general conferences is decreasing? Are they all busy tending to their science, with nothing worth sharing in their hand? Or is their budget, both in terms of time and money, decreasing so abruptly that they cannot afford those meetings any more? Or maybe they are folding back on their community?

To check, I had a look at the program of the IUFRO general meeting, that will be held in Freiburg next week – IUFRO is the United Nations of forestry research, every forest scientist goes to a IUFRO meeting every so often. And even there, although I have carefully scrolled all symposia and checked speaker lists, I could barely find the names of acclaimed and less known forest geneticists. Essentially, our research field will not be represented there, either (well, I confess: I am not attending, but I could not go to three conferences in less than a month).

Forest geneticists are deserting both general evolution / evolutionary genetics events and forest-focused meetings. Why? And – apart from forest genetics conferences – where do they go? I’d very much like to know the answers to those questions. Plus, I would like to say that it is very important, for junior and senior scientists alike, to get out of our “comfort zone”, and mix with people doing (relatively speaking) entirely different things. As I said above, one comes home with his suitcase full of great ideas.

(*) It is good to fit the word overarching into a text, from time to time. It makes you feel important.

A whole biome ablaze?

And it burns, burns, burns,
The ring of fire, the ring of fire.

Mediterranean forests are burning.
All of a sudden, Portugal, France, Italy, Greece… fires – sometimes large, out-of-control, deadly wildfires – are burning all over Mediterranean forest ecosystems.

Hot temperatures, little rainfall, strong winds, and a dense human population: all factors are there for the perfect firestorm. If this is what climate change has in store for us, well, the outlook for Mediterranean forests is bleak.

Incêndios_Pedrógão_Grande_2017-06-18_(02)

Besides stopping climate change (ha ha ha!) and fencing humans out of forests (unlikely to work, either) what can be done?

There is only one word: MANAGEMENT (the alternative is: ashes).

My lab‘s director, Eric Rigolot, has provided some clues  in an interview (in French) with the French Huffington Post website. What does he say? That we have to use managed fires to prevent big, uncontrolled wildfires. This technique is current in other continents, but not in Europe.

I would add: vegetation itself (the fuel) must be managed in ways that minimise fire expansion, if not ignition. This is particularly true where human beings are likely to wander, because they are most of the time, albeit often unconsciously, the source of fires.

Forests must be tended to, must be gardened. In Europe, they’ve stopped being wilderness a long time ago, so the potential argument that, by managing forests, we alter some fancy natural equilibrium, is nonsense. It is maybe valid for some truly pristine biomes (if there is any), but not in Europe, not around the Mediterranean basin.

This means we are responsible for the health of our forests, including by limiting the effects of fires that we are the primary cause of.

Firs are dying, beeches are almost fine, but for how long?

Going through and iconic mountain forest in Southern Europe leaves little hope for what is coming next.

Yesterday I was on Mont Ventoux (Southern France) to sample beech leaves for the BEECHGENOMES project.

WP_20170718_002

One can see the silver firs dying there (at around 900 m a.s.l.). The understory shows the occasional fir (and more commonly, beech) sapling and seedling, but what mostly grows there is a shrub, boxwood (Buxus sempervirens), and even boxwood, when it grows in a gap, does not fare so well. What will be left of the beautiful Ventoux forest in 30 years?

WP_20170718_001

Out of 166 adult beech trees belonging to the long-term survey cohort, we’ve found “only” nine dead (“only nine”!? that’s 5.4%… and the last check was only few years ago). Most of the others looked fine with no visible sign of stress, but this year, with so little rainfall and many strong heatwaves, they are likely to shed their leaves early August. Growing season over.

Not very happy, my goodness.

WP_20170718_003

Of budgets and schedules

You know perfectly your grant proposal’s cost per nucleotide and per fieldwork day. But did you budget data analyses?

Once again, I had to write a project’s final report. Once again, I found myself writing that ‘data have been produced, and we are carrying out data analyses’. This seems to be accepted as consolidated report. Nobody expects that, when the project is over, the data have been analysed. Yet, we all obviously claim that science is not about accumulating data, but producing and interpreting results*.

10178670633_fa038999fa_b

Why do we take it as granted that a research program is complete without data analyses? The answser is ridicously simple: it is because we seldom schedule or budget data analyses. In our unconscious mechanistic-positivist-reductionist-platonian mindset (yes, you too you have such a mindset. You were raised like this, as a scientist), data analyses automatically derive from first principles, so they cost no time and no effort; they are an instantaneous act of revelation of patterns and laws from the data.

This reminds me of the joke, common among physicists, about the mathematician who dies of starvation because he never actually cooks his meals: once he has verified that all ingredients are in the cupboards, he considers that the meal is done. So he never eats. [I do not think mathematicians are like this. Physicists, especially experimental physicists, do].

But when we think again, we perfectly know that there is no such thing as instantaneous, self-organising data analysis. Data analyses cost “blood, toil, tears and sweat and enormous amounts of time and money (think not only computers and licence fees, but also salaries).

1024px-Workers_in_the_fuse_factory_Woolwich_Arsenal_Flickr_4615367952_d40a18ec24_o

Since I have stopped spending my days doing silly things like pouring acrylamide gels and scoring bands on an X-ray film or even peak profiles on a screen (that’s a long time ago, luckily), data analyses take about 80% of my working time (not counting for grant proposal writing, paper writing, emptying the coffee room compost tank, and writing blog posts about time spent writing blog posts).

It should be obvious to all, but a project is only over when (at least) data analyses are over. To achieve this feat, we first need to honestly schedule data analyses.

So atone, you sinner, and go back correctly scheduling the next six months of your activities.

 

*The publication of ‘data papers’ is becoming current, but this is a different matter: the need to publish stand-alone data sets highlights even more the need to make them available to a larger community, so that they can be more easily analysed.

Genes, genes all over the place

Genome-wide association studies show that characters are genome-wide associated. So what’s the point?

Years ago, while I was screening Table of Content alerts for interesting stuff (I’ve been doing this once a week for years: Friday, I’m in love with newly published science), my eye was caught by one of those high-impact, very technical human genetics papers where they find a new gene underlying some very serious disease. It turned out that the newly identified causal variant accounted for 0.6% of genetic variance. Wow. Then I said to myself: hey, what do you know about how the field of disease genetics works? Even a 0.6% effect can be important, if it can save a life.

And then came the Boyle et al. paper, few days ago, on the ‘omnigenic’ genetic structure of complex traits.The paper shows that the genome is not even scattered with, it is smeared with loci controlling complex traits. Many of those loci have very small effects, and could only be detected in studies with very large sample sizes. There is nothing strange to this: the more intensely you chart the territory, the more detailed the map, and the smaller the features that appear in the map.

zooMap

The vast lists of genes associated to a given trait are not particularly enriched in some functional categories; probably, many genes have an impact on many characters (they are pleiotropic) and are involved in many partially overlapping regulation networks, as suggested by the fact that many causal variants happen to be in regulatory regions.

Indirectly, this also suggests that candidate-gene strategies may have a problem (you’ll certainly find a particular category of genes to be associated with the trait: all categories are…), as well as looking for relevant variants only in coding regions (well, we all knew this, did we not?).

But let us go back under the canopy.

In trees, it is quite common that traits diverge across populations (forest scientists call them “provenances”: the word is the heritage of the strategy of planting multiple populations together to assay their performances), but gene frequencies do not. This phenomenon is well described by Kremer and Le Corre (2003, 2012 (1), 2012 (2)) and suggests that adaptation (and therefore the control of underlying adaptive traits) is highly polygenic. There you are. One can expect that, given enough power, association studies in trees will end up detecting very large numbers of small signals, too.

There are very few GWAS’s in trees (Fahrenkrog et al. (2016), for example, has  about 18,000 genes mapped from a population of about 400 trees. I can hear the average human (or Arabidopsis) geneticist sneering).
One good reason is that to perform a GWAS you first need a G[enome], and there are not so many tree genome sequences so far. Other reasons may be less clear, except for the argument that GWAS is still relatively expensive and forestry is not the research field that attracts the largest funds. Forest GWAS studies in which millions of trees are screened are even rarer (read: non-existent). Plus, if you are looking for weak effects, you must be very careful about how you pick your sample: even slight, undetected differences in the ontogeny (developmental path: how the trees have grown) or environmental conditions can have a larger effect than weak genetic differences, which will be drowned in the background noise.

Yet, there is hope.

From the evolutionary point of view, small effects may be more relevant than when trying to predict susceptibility to a disease. First because, as stated above, they can have a cumulative effect on fitness (even without accounting for interactions, which could amplify their effects); and secondly, because selection is a powerful force (we have commented on this before, haven’t we?) and can lead to major allele and phenotype frequency changes over relatively short time scales. As an excercise, you can compute the time to fixation for an advantageous allele starting at an arbitrary frequency, using the equations in Kimura and Ohta (1969) (NB: compare to time to fixation for a neutral allele in the same configuration to check what the effect of selection really is). You can play around with the formula using this simple code that I have written in R:

#Calculations from Equations (17) and (14) in:
#Kimura M, Ohta T. 1969.
#The Average Number of Generations until Fixation of a Mutant Gene in a Finite Population.
#Genetics 61: 763–71.
#
#
#sel coefficient
s<-6e-3
#effective size
Ne<-10000
#Ne * s = S
S<-Ne * s
#starting frequency of positively selected allele
p<-0.005
#
# Equation (17): fixation of selected allele
#function to integrate for term J(1)
J1der<-function(csi)
{
(exp(2*S*csi)-1)*(exp(-2*S*csi)-exp(-2*S)) / (csi*(1-csi))
}
#function to integrate for term J(2)
J2der<-function(csi)
{
((exp(2*S*csi)-1)*(1-exp(-2*S*csi))) / (csi*(1-csi))
}
#coefficient for the integrals J(1), J(2)
Jcoef<- 2 / (s*(1-exp(-2*S)))
#u(P) function
uP<-(1 – exp(-2*S*p)) / (1 – exp(-2*S))
#average time to fixation under selection
t1p<- Jcoef * integrate(J1der, lower = p, upper = 1)$value + ((1-uP)/uP) * Jcoef * integrate(J2der, lower = 0 , upper = p)$value
#
# Equation (14): fixation of a neutral allele
#average time to fixation (neutral)
t1pNeutr<- (-1/p)*(4*Ne*(1-p)*log(1-p))

Moreover, part of the ‘omnigenic’ effect is caused by linkage disequilibrium, which extends over much larger spans in humans than in most tree species. In trees, it is less likely that a variant correlated to a causal SNP will also appear as a causal SNP.

Another consideration: for reasons of power and efficiency, many rare variants are eliminated from GWAS studies through the cruel MAF (minimum alelle frequency): anything with a frequency under a certain threshold is usually thrown out. This makes sense, because some of them may be artefacts, and anyway statistical power at those loci will be low. But what if they have a large effect? In natural tree populations, rare variants with large effects may be just waiting for selection to pick them up. I, for sure, do not throw them away!

And finally: first things first. Let us find major effects, if they are there (some have already started: see Sam Yeaman’s et al. Science paper as a brilliant example), and then we’ll scratch our heads with minor ones.