Of budgets and schedules

You know perfectly your grant proposal’s cost per nucleotide and per fieldwork day. But did you budget data analyses?

Once again, I had to write a project’s final report. Once again, I found myself writing that ‘data have been produced, and we are carrying out data analyses’. This seems to be accepted as consolidated report. Nobody expects that, when the project is over, the data have been analysed. Yet, we all obviously claim that science is not about accumulating data, but producing and interpreting results*.

10178670633_fa038999fa_b

Why do we take it as granted that a research program is complete without data analyses? The answser is ridicously simple: it is because we seldom schedule or budget data analyses. In our unconscious mechanistic-positivist-reductionist-platonian mindset (yes, you too you have such a mindset. You were raised like this, as a scientist), data analyses automatically derive from first principles, so they cost no time and no effort; they are an instantaneous act of revelation of patterns and laws from the data.

This reminds me of the joke, common among physicists, about the mathematician who dies of starvation because he never actually cooks his meals: once he has verified that all ingredients are in the cupboards, he considers that the meal is done. So he never eats. [I do not think mathematicians are like this. Physicists, especially experimental physicists, do].

But when we think again, we perfectly know that there is no such thing as instantaneous, self-organising data analysis. Data analyses cost “blood, toil, tears and sweat and enormous amounts of time and money (think not only computers and licence fees, but also salaries).

1024px-Workers_in_the_fuse_factory_Woolwich_Arsenal_Flickr_4615367952_d40a18ec24_o

Since I have stopped spending my days doing silly things like pouring acrylamide gels and scoring bands on an X-ray film or even peak profiles on a screen (that’s a long time ago, luckily), data analyses take about 80% of my working time (not counting for grant proposal writing, paper writing, emptying the coffee room compost tank, and writing blog posts about time spent writing blog posts).

It should be obvious to all, but a project is only over when (at least) data analyses are over. To achieve this feat, we first need to honestly schedule data analyses.

So atone, you sinner, and go back correctly scheduling the next six months of your activities.

 

*The publication of ‘data papers’ is becoming current, but this is a different matter: the need to publish stand-alone data sets highlights even more the need to make them available to a larger community, so that they can be more easily analysed.

Genes, genes all over the place

Genome-wide association studies show that characters are genome-wide associated. So what’s the point?

Years ago, while I was screening Table of Content alerts for interesting stuff (I’ve been doing this once a week for years: Friday, I’m in love with newly published science), my eye was caught by one of those high-impact, very technical human genetics papers where they find a new gene underlying some very serious disease. It turned out that the newly identified causal variant accounted for 0.6% of genetic variance. Wow. Then I said to myself: hey, what do you know about how the field of disease genetics works? Even a 0.6% effect can be important, if it can save a life.

And then came the Boyle et al. paper, few days ago, on the ‘omnigenic’ genetic structure of complex traits.The paper shows that the genome is not even scattered with, it is smeared with loci controlling complex traits. Many of those loci have very small effects, and could only be detected in studies with very large sample sizes. There is nothing strange to this: the more intensely you chart the territory, the more detailed the map, and the smaller the features that appear in the map.

zooMap

The vast lists of genes associated to a given trait are not particularly enriched in some functional categories; probably, many genes have an impact on many characters (they are pleiotropic) and are involved in many partially overlapping regulation networks, as suggested by the fact that many causal variants happen to be in regulatory regions.

Indirectly, this also suggests that candidate-gene strategies may have a problem (you’ll certainly find a particular category of genes to be associated with the trait: all categories are…), as well as looking for relevant variants only in coding regions (well, we all knew this, did we not?).

But let us go back under the canopy.

In trees, it is quite common that traits diverge across populations (forest scientists call them “provenances”: the word is the heritage of the strategy of planting multiple populations together to assay their performances), but gene frequencies do not. This phenomenon is well described by Kremer and Le Corre (2003, 2012 (1), 2012 (2)) and suggests that adaptation (and therefore the control of underlying adaptive traits) is highly polygenic. There you are. One can expect that, given enough power, association studies in trees will end up detecting very large numbers of small signals, too.

There are very few GWAS’s in trees (Fahrenkrog et al. (2016), for example, has  about 18,000 genes mapped from a population of about 400 trees. I can hear the average human (or Arabidopsis) geneticist sneering).
One good reason is that to perform a GWAS you first need a G[enome], and there are not so many tree genome sequences so far. Other reasons may be less clear, except for the argument that GWAS is still relatively expensive and forestry is not the research field that attracts the largest funds. Forest GWAS studies in which millions of trees are screened are even rarer (read: non-existent). Plus, if you are looking for weak effects, you must be very careful about how you pick your sample: even slight, undetected differences in the ontogeny (developmental path: how the trees have grown) or environmental conditions can have a larger effect than weak genetic differences, which will be drowned in the background noise.

Yet, there is hope.

From the evolutionary point of view, small effects may be more relevant than when trying to predict susceptibility to a disease. First because, as stated above, they can have a cumulative effect on fitness (even without accounting for interactions, which could amplify their effects); and secondly, because selection is a powerful force (we have commented on this before, haven’t we?) and can lead to major allele and phenotype frequency changes over relatively short time scales. As an excercise, you can compute the time to fixation for an advantageous allele starting at an arbitrary frequency, using the equations in Kimura and Ohta (1969) (NB: compare to time to fixation for a neutral allele in the same configuration to check what the effect of selection really is). You can play around with the formula using this simple code that I have written in R:

#Calculations from Equations (17) and (14) in:
#Kimura M, Ohta T. 1969.
#The Average Number of Generations until Fixation of a Mutant Gene in a Finite Population.
#Genetics 61: 763–71.
#
#
#sel coefficient
s<-6e-3
#effective size
Ne<-10000
#Ne * s = S
S<-Ne * s
#starting frequency of positively selected allele
p<-0.005
#
# Equation (17): fixation of selected allele
#function to integrate for term J(1)
J1der<-function(csi)
{
(exp(2*S*csi)-1)*(exp(-2*S*csi)-exp(-2*S)) / (csi*(1-csi))
}
#function to integrate for term J(2)
J2der<-function(csi)
{
((exp(2*S*csi)-1)*(1-exp(-2*S*csi))) / (csi*(1-csi))
}
#coefficient for the integrals J(1), J(2)
Jcoef<- 2 / (s*(1-exp(-2*S)))
#u(P) function
uP<-(1 – exp(-2*S*p)) / (1 – exp(-2*S))
#average time to fixation under selection
t1p<- Jcoef * integrate(J1der, lower = p, upper = 1)$value + ((1-uP)/uP) * Jcoef * integrate(J2der, lower = 0 , upper = p)$value
#
# Equation (14): fixation of a neutral allele
#average time to fixation (neutral)
t1pNeutr<- (-1/p)*(4*Ne*(1-p)*log(1-p))

Moreover, part of the ‘omnigenic’ effect is caused by linkage disequilibrium, which extends over much larger spans in humans than in most tree species. In trees, it is less likely that a variant correlated to a causal SNP will also appear as a causal SNP.

Another consideration: for reasons of power and efficiency, many rare variants are eliminated from GWAS studies through the cruel MAF (minimum alelle frequency): anything with a frequency under a certain threshold is usually thrown out. This makes sense, because some of them may be artefacts, and anyway statistical power at those loci will be low. But what if they have a large effect? In natural tree populations, rare variants with large effects may be just waiting for selection to pick them up. I, for sure, do not throw them away!

And finally: first things first. Let us find major effects, if they are there (some have already started: see Sam Yeaman’s et al. Science paper as a brilliant example), and then we’ll scratch our heads with minor ones.

Where the lost seedlings go

Billions of seedlings germinate and disappear every year – but they may be as important for forest dynamics as majestic adult trees.

Walk down a wild forest in spring (or during the rainy season, if you are walking down a tropical forest). You’ll walk on a carpet of tree seedlings, growing from the seeds dispersed by the trees in the canopy above your head.

It is a spectacular view, that gives you the feeling of how powerful forest dynamics are. It is also amazing to see all those tiny cubs, sitting at the feet of their enormous mothers.

pyramid

But there is also a tragic side to this. Look better: there are much fewer saplings, even fewer sub-adults. No matter the tree species, most of those youngsters will die soon and suddenly, in a massacre that makes WWI trench warfare pale in comparison. The pyramid of ages is very steep.

What does this slaughter tell us about how forest ecosystems work? Jean-Pierre Pascal (‘JPP’), a brilliant and witty, now retired, CNRS (France) forest ecologist, used to say that it was useless to study all those naturally regenerating seedlings, which will basically all die.

And he was right: from the forest manager’s point of view, interesting things start when trees reach ’10 cm d.b.h.’ (for the outsider: ‘d.b.h.’ is ‘diameter at breast height’, that is, at 130 cm from the ground (for a short guy like me)). And it is so exactly because only few (wild) stems reach that size, hence it is pointless to take care of all those which don’t.

A tree can live and be fertile for several decades or even a few centuries. Every year or every few years, a dominant tree produces hundreds or thousands of seeds. If one imagines a ‘stable’ forest stand, one that does not expand or retreat (a very unlikely case indeed*), then each tree will be replaced by one tree over the course of its life. In other terms: of all those hundreds of thousands of seeds produced by one tree, how many, on average, will be left after a tree generation? ONE.
This is plainly insane. You can call it an extreme strategy – if you like understatements.

seedlings

What may be going on there, from the population-genetic point of view? Seedlings die for a variety of reasons: they may be damaged by herbivores, they may find it hard to tap into soil resources, undergo competition from other seedlings, be attacked by fungi – the list of good reasons for a seedling to die is endless. If all this is purely random, then well, there is not much work to do for the evolutionary geneticist. But those populations are large, and we know that selection for survival can be particularly effective in large populations. So yes, selection can happen there, and the seedling may even be the essential developmental stage at which that kind of selection happens.

In an elegant modelling exercise, Oddou-Muratorio and Davi have shown that, actually, selection for survival occurs at young stages, while older cohorts undergo principally fertility selection (logically: almost nobody is left, so almost nobody can die). In the case of the beautifully named tropical tree Symphonia globulifera, we have found (the article is in preparation) that, in controlled experiments involving reciprocal transplants between habitats, ecotype differences in growth, germination and survival can be identified at ages 1-5 years. The TIPTREE project is seeking signatures of selection on short timescales (within one generation) in seedlings, while the GENTREE project is establishing a monster-size seedling survival experiment with two trees, Scots pine and silver birch. In a very elegant experiment, Antonie Kremer (INRA – BIOGECO) is using diachronic approaches to study seedling adaptation. In a classic of tree population genetics, Alistair Jump et al. have shown that allele frequencies of current adults have been influenced by climate at establishment time.

For decades, forest science has looked at properties of seedlings and saplings mostly hoping that they could help predict characters in adults. This is a much needed strategy if one wants to make forest management choices in production systems. On the contrary, the study of seedling properties in themselves may be the key for the management of wild forests under climate change. Stay tuned for more information about this, surprises may come soon.

* no, there are no ‘stable’ forest stands. Forests themselves keep expanding and contracting, even in the absence of human intervention, and within forests, populations can undergo their own cycles of expansion and contraction, over long time spans.

 

The Darwinian neo-synthesis strikes back

The Charlesworths and Nick Barton remind us that no, epigenetics did not make Darwinism obsolete.

space

Despite the title, I do not think the Darwinian Modern synthesis (as defined by Julian Huxley in 1942) is some evil empire (titles only serve the purpose of catching your attention, right?). I do not think, either, that it is threatened by some insurgent group on a remote planet. Actually, I think that it is very healthy and solid.

Yet, every now and then some press release informs us that somebody has made some discovery that challenges Darwin’s ideas (some other people, meanwhile, at about the same rate, confirm Einstein’s relativity theory; the reason why it is so exciting when somebody challenges the one theory, and at the same time it is exciting when someone confirms the other, escapes me entirely).

Hence an article by three heavyweight figures of evolutionary and population genetics (D Charlesworth, NH Barton, B Charlesworth (2017) The sources of adaptive variation. Proc Royal Soc B 284: 20162864). The three Modern synthesis fighters probe the solidity of proof in favour of mechanisms that would fundamentally question the Modern synthesis. Are they relevant? common? do they work differently – evoluton-wise – than classical sequence variation? The paper is a review of the evidence supporting the generality and impact of several non-Mendelian inheritance mechanisms, and the Authors generally conclude that they are limited to a small number of particular cases or they have little impact on trait variation. Nothing is left standing after the paper’s judgement: not the role of epigenetic* alleles in the determination of traits, not the transmissibility of epigenetic marks; and not, of course, the possibility of directed mutagenesis. All such effects are dismissed as poorly supported or of minor consequence. Case closed: “no radical revision of our understanding of the mechanism of adaptive variation is needed”, as the last sentence of the summary says.

In summary, what should we say about epigenetics, Lamarck and all the rest? First, let me tell you: Haldane and Huxley are not Darwin and Wallace. They added something fundamental to Darwin’s brilliantly right, but incomplete, idea. The Modern synthesis made a fundamental act of integration of genetic heredity, population dynamics and evolution by natural selection. Is it complete, finished? certainly not. Is adding some new element (say, epigenetic inheritance, in spite of our trio’s skepticism) equivalent to disproving the Modern synthesis, or even Darwin – and going back to Lamarck? In other terms, should incompleteness of a more advanced theory force us to fall back to the previous one, even more incomplete and erroneous?
And then again, speaking precisely of epigenetic inheritance: Darwin did not have any clue of how inheritance of traits worked, and yet his theory was powerfully right. Now, why should epigenetics – a variation on the theme of Mendelism and a ripple in the ocean of solid facts supporting the Modern synthesis – make a century of population and evolutionary genetics wrong?

But let me be absolutely clear: I do not overlook the importance of epigenetic inheritance in the determination of traits and fitness. After all, if I am – say – a tree, my seeds will likely fall (with some notable exception) all around me, and they will probably undergo the same environmental conditions as myself. It is probably fitness-wise useful to provide my seeds with the same gene expression setup as I have, because it is likely to be the one that allowed me to successfully reproduce in the place where I live – and so it may help my progeny to sort it out. This hypothesis requires solid proof of course, but I would not dismiss it too quickly. To me, epigenetic inheritance could be viewed as a clever way to transmit gene regulation (not genetic variation) down to the next generation, and this may be adaptive, the same way phenotypic plasticity can be adaptive.

bock

On the contrary, one may also say: how can this stuff be relevant at all? After all, if it were so important, some important deviation of observation from theory should have appeared earlier in the history of modern genetics. This is certainly a sensible argument, except for two points: (a) we are very good at ignoring small but non-zero deviations and (b) as a professor of genetics in Milan, Italy used to say a long time ago: “there can be no genetics (as a research field) without genetic variation“. In other terms, we can only research effects that lead to the segregation of traits, and we are unable by construction to spot mechanisms that lead to uniformity. Epigenetic inheritance seems to produce equal patterns in all of an individual’s progeny, and so sits right in the middle of genetics’ blind spot.

So, in the end, by what means could new discoveries really hurt the Modern synthesis? I’d say that they should prove that new things escape the fundamental forces of evolution. If they exist, vary, and evolve in spite of selection, drift, migration, recombination, and non-random mating, then we have a problem.

My guess is this is unlikely to happen, and that we will discover that the reality of biological adaptation is more beautifully complex than we thought. If values of traits as determined by epigenetic inheritance – or the mechanism itself of epigenetic inheritance – can be proven to undergo selection for increased fitness, then this will be yet another nice addition of the Modern synthesis.

*nowadays the ‘epigenetic’ buzzword is used by some people to describe, well, gene regulation. We should not indulge in such lack of precision: good old regulation, operated by transcription factors and other proteins and RNAs and cued by the intracellular signalling of environmental factors, has nothing to do with epigenetic inheritance.

The abominable mystery of the swapped populations (well done, Mr Popper!)

Mistakes in your data can make the hypothetico-deductive method shine.

On a clear, windy day in winter 2016 (Avignon is (in)famous for its strong northerly Mistral wind), we finally got the results of the SNP genotyping of Spanish and Italian populations of Aleppo pine, performed in collaboration with our long time partners, INIA-CIFOR in Madrid, Spain and CNR-IBBR in Florence, Italy.

Reference assembly had gone smoothly, individual samples had produced a wealth of good-quality Illumina reads. We expected clean, straightforward data analyses, as we already had good results for silver fir, European larch and Atlas cedar (this is part of the ANR-funded FLAG project).

But then, a strange signal showed up in the data. We had two groups of samples: adult trees and seedlings, and when we looked at their polymorphism patterns, they looked very different:

alignAleppoWrong

only few polymorphic sites were in common, and often the two groups were fixed for different alleles (each column in the picture is a SNP site, each colour (blue, light blue, gray) one of the three possible genotypes of a bi-allelic SNP). The whole thing just did not make any sense.

The two groups were from different plates (one full plate for adults, two half-plates for the seedlings), so with my postdoc Hadrien Lalagüe we looked for possible differences in DNA quality, sequencing batch, anything technical that would explain such a result. No pattern appeared. We were desperate for a solution for the problem, lest we accepted to toss the whole data set.

Then few weeks later we looked again at the picture. It seemed like the two groups belonged to two different, closely related species, not even to different populations.
Oh, wait a minute… did we say ‘two different species’? Perhaps, if they look like two different species, it is because they are two different species!

Can you hear that kind of ‘falsifiable hypothesis’ stuff opening its way to our brains? Popper’s spook was hovering over us.

Then we had a wider look at our data set. What other species could half of the samples belong to?

One DNA plate of maritime pine had been shipped to the sequencing lab together with the Aleppo pine plates. We held our suspect number one.

platesDrawing

What was the most likely mistake? Nobody could possibly mistake two half plates for one full plate, right? So the only possible swap was between plate A and plate B.

What would be the expected polymorphism pattern? Because plate B was genotyped against an internal reference sequence, nothing wrong would appear in its data, no matter what the species was. On the contrary, plates A, C1 and C2 would now be a mix of two pine species – exactly the kind of mix that would produce a pattern of partially shared, partially non-overlapping polymorphisms.

The hypothesis held. Popper’s spook was smiling, but he expected us to do more: to put our hypothesis to experimental test.

If our hypothesis was correct (plate A contained maritime pine, not Aleppo, and plate B had Aleppo, not maritime pine), then a read-mapping with plates B, C1, C2 would yield the expected nice pattern of shared SNPs.

And ta-dah! No differences in polymorphism patterns when plate A was replaced with plate B in the read-mapping (plate A, mapped against itself, was fine, of course).

alignAleppoRight

We did a further check on the absolute number of polymorphisms that appeared within different groups of plates (more polymorphisms expected when mixing two different species), and showed that any group including plate A (now maritime pine) has systematically more SNPs than any group made from the remaining plates (now all Aleppo pine). So we re-ran our SNP analyses and all turned out to be fine.

Popper’s spook was now looking very happily at us.

There are few lessons to draw from this story.

First: even though we forest scientists are used to make statistical inferences, as opposed to testing hypotheses, the good old hypotetico-deductive method is still alive and kicking. It is a vital piece of our toolkit.

Second: look at your data with your eyes. The most likely hypothesis will probably pop up.

Third: never throw out an apparently bad data set too quickly. Maybe it is just begging you to look at it again, and from a different angle.

Another Ph.D. grant proposal (not in our lab)

This time I host an announcement from my brilliant colleagues in Ferrara, Italy

Opening of a PhD position to study the population genomics and conservation of three Alpine grouse species

A fully-funded 3-year PhD position is available for an enthusiastic student with a background in conservation, population or evolutionary genetics, to analyse patterns of genomic variation in three Alpine grouse species for conservation and management purposes.

The student will spend approximately half their time in the two collaborating research groups, those of Giorgio Bertorelle (University of Ferrara, Italy – UNIFE), and Heidi C. Hauffe (Fondazione Edmund Mach, Trento, Italy – FEM). Barbara Crestanello in the Hauffe group, and expert in tetraonid conservation genetics and genomics, will act as an additional supervisor.

The student will register at UNIFE, and academic training include seminars and courses, as well as participation in national and international conferences.

Brief project description:
This project will focus on three charismatic alpine bird species of conservation and management concern for which data on mtDNA and STR markers for more than 200 individuals per species are already available from Trentino and surrounding regions. The main goals of the project are to a) type SNP markers for a subset of the above samples using GBS; b) compare the power of mtDNA/STRs and SNPs to reconstruct demographic history; c) identify if and how different ecological niches, reproductive systems, and hunting pressures affect the genomic variation; d) translate the results into efficient management and conservation strategies; e) use available technologies to develop SNP sets that can be used for future cost-effective conservation genomic investigations.

Informal enquiries for further details of the aims of the project should be sent to barbara.crestanello@fmach.it, heidi.hauffe@fmach.it, or giorgio.bertorelle@unife.it.

The position is for candidates with a degree equivalent to an Italian “Magistrale” degree (Master), and in an appropriate subject (e.g. Biology, Biotechnology, Mathematics). A keen interest in data analysis as applied to conservation, and preferably, at least 6 months’ experience in a basic molecular biology laboratory, are requested. Good English skills are necessary, but knowledge of Italian is not essential (although it obviously helps for living in Italy!).

The formal online application form will be available around mid June at the site www.unife.it/studenti/dottorato/concorsi. However, interested candidates are welcome to send already to giorgio.bertorelle@unife.it an application letter, stating the applicant’s motivation for the position, experience and skills related to the requirements listed above, a full CV, and contact information (including email addresses) for 2 potential referees. Please send your application file as a single pdf.

Ferrara is an ancient Medieval and Renaissance town located in North-Eastern Italy, 50 km North of Bologna and 100 km south of Venice.  Far from being shrouded in the past, Ferrara is a cyclist- and pedestrian-friendly sustainable town where young people can experience a high quality of life, take advantage of well-maintained infrastructures, and pleasantly blend in. More information can be found at http://www.unife.it/international/student-life

The campus of the Fondazione Mach is located in San Michele all’Adige in the eastern Italian Dolomites, a World Heritage Site. The Province of Trento is rated as one of the best places in Italy for outdoor recreation and overall quality of life. See also: https://www.visittrentino.info/en, and http://www.fmach.it.

Read your classics!

Incongruence between data and literature betrays some scientist’s bad habits.

THE INDIANA UNIVERSITY LIBRARY in Bloomington, IN is a stocky, ten-storey building where windows are a rarity.

IU-Library

In the dusk and dust of the tenth floor, one day in summer 2004, I was lucky enough to find the full series of the Proceedings of the Carlsberg Laboratories. I was looking for a rather unusual reference:

Winge O., 1923 On sex chromosomes, sex determination and preponderance of females in some dioecious plants. C. R. Trav. Lab. Carlsberg 15: 1–26.

When I eventually found it, it felt like Indiana Jones grasping the long-sought talisman.

carlsberg

Why on earth would I need such an esoteric paper?

Few weeks earlier, as a post-doc in Lynda Delph’s lab (a very happy time indeed), I was doing linkage mapping with AFLP markers (sounds like the Stone Age, doesn’t it?) in Silene latifolia. The data, produced by Michele Arntz, were nice and tidy, and the map was coming out beautifully. Statistical support was excellent – actually, I’ve never seen such enormous LOD-scores, after or before then – so I was 100% confident in the map.

slatifolia

But something was wrong. ALL the literature on the species, which happens to be dioecious (i.e. it has separate sexes; the species, not the literature), claimed that, as in humans, the two sex chromosomes recombine only at one end (details vary, but all articles agreed on the one-end-only crossing overs).

And I had two nice recombinant blocks, one on each side of the non-recombining region. Oops. Ouch.

We went through all the analyses again, checked the data, the samples, re-ran the mapping with subsets of the mapping population. No way. The data resisted our efforts to force them to conform to the literary, consensus evidence.

XYmodels

It was time to ask for expert advice. So we called the best expert we could reach, a scientist with a very long experience in all matters plant sex chromosomes (let us treat this scientist as an anonymous referee, and let us not reveal her identity). She listened to us carefully, then after a pause she said: “but that’s… HERETIC!”.

I still consider this sentence as the best professional compliment I have ever received. But this did not push us forward an inch. We still had a blatant contradiction between our data and what science expected them to say, and nobody to explain why. Then Lynda, with her typical matter-of-factly, rigorous approach to science, decided we should go through the literature. ALL the literature on the subject.

Everybody quoted the 1923 Winge paper and several papers by Westergaard, published in the 40’s and in the 50’s (Westergaard (1946), Hereditas 32: 419-443; Westergaard (1948), Hereditas 34: 257-279; Westergaard (1958), Advances in Genetics 9: 217-281). All more recent literature quoted those papers as saying that “Silene latifolia sex chromosomes recombine at one end only”. Clearly, if there was a confrontation to be had, it was between our data and those papers.

Hence my travel in space and time to the last floor of IU library (Westergaard’s papers were easier to obtain).

The reading of those papers revealed the complex and surprising truth. Winge did not say a word about chiasmata patterns. Westergaard had carefully characterised the chiasmata on the sex chromosomes in the 1946 paper, but some data were hard to interpret, and so he chose to describe only “unusually favourable [mitotic] plates” (what we would call today “cherrypicking”, even though I acknowledge his great honesty in declaring it). Plus, he dismissed some chiasmata observed in the supposed “differential” arms as being “rare”. And went on describing differential and homologous arms of X and Y chromosomes.

Then, fatally, he depicted in Figure 5 a summary showing X and Y chromosomes with only one homologous arm, as in the figure above, the upper pane of which is based on his one.  Westergaard, by all means, was a rigorous scientist, but he probably oversimplified his results in that drawing, and there we go: starting with this figure (and if you do not read the paper), you are dead sure that only one side recombines. Yet there was no contradiction between the data in the paper and our data. Unfortunately, following authors likely “quoted” only that figure from the 1946 paper. The result: against evidence, only one arm recombined now. Period.

This is serious. Such things should never happen in science. Certainly, most scientists read most of the papers they quote most of the time (for sure, compared to many politicians and opinion makers, we are diamond-grade examples of intellectual rigour, to be honest); but this story tells me that one must be suspicious when everybody quotes a very old, often hard to access, article (I bet very few people have read many of the Sewall Wright’s papers they quote, for instance; of the few who have read them, most have not understood a line of them; I belong to the latter category).

So, this is my take-home message from this story: read carefully all the papers you quote. If you cannot read them, then quote some other, more recent paper (e.g. a review), not the original message. In this case, the Chinese whispers chain must be explicit, if it has to exist at all (which I’d rather prefer not, anyway). Let us not convert science into a matter of ipse dixit. Let us stick to facts.

P.S. we actually published our linkage map, with sex chromosomes nicely recombining at both ends, here and here.

A PhD Thesis on the population genomics of European beech

We are hiring a graduate student to work on the genetics of local adaptation in a keystone tree species.

European beech (Fagus sylvatica L.) is a keystone tree species in European forests, making up 14-18% of total forest cover and with a range spanning from Greece to Sweden. Beech is the focus of multiple, long-lasting and intensive international research programs in ecology and population genetics. Despite this, little is known about the genetic bases of the adaptation of this species to environmental variation, no genomic reference is available and patterns of genomic diversity are virtually unexplored. Based on genome sequencing and re-sequencing data, obtained on natural populations and provenance tests throughout Europe, we propose to study patterns of adaptive diversity in coding and promoter regions, to: (a) determine patterns of genomic diversity determined by adaptive processes at multiple geographical scales, from stand to region to range; (b) estimate intensity of selection, through a combination of analytical and modelling approaches; (c) model and predict the ability of European beech to cope with climate change through adaptation.

For more information, please contact Ivan Scotti (ivan.scotti[at]inra.fr) and visit the BEECHGENOMES project post.

European_beech_(Fagus_sylvatica)_in_Humlamaden_1516

 

The BEECHGENOMES project

The genomics tide reaches the shores of beech ecology.

Genome sequence variation in European beech (Fagus sylvatica L.): analysing adaptation and adaptability in an ecologically and economically major European forest tree species challenged by climate change.

The European beech (Fagus sylvatica L.) is a major keystone forest species, covering more than fifteen percent of European forests, and is a commercially important timber species. It is the focus of intensive, high-impact science in ecology, forestry, genetics and tree physiology. NotwiSlideshowBeechLeavesthstanding its importance, genomic resources and a solid knowledge of the genomic bases of adaptation are lacking for the species.

The BEECHGENOMES project (2017-2020), funded by the France Génomique call and led by INRA-URFM (Ivan Scotti), will tackle two topics: (1) producing a reference genome sequence for European beech; (2) obtaining high density variant maps, through genotyping-by-sequencing, from a large sample (>2000 trees) collected throughout Europe; (3) identifying patterns of local adaptation at multiple scales, from stand to landscape to region to range. The BEECHGENOMES project has tight links to a former program (FLAG) and to a current H2020-funded program (GENTREE), both also led by INRA-URFM. The program will be carried out by a consortium of fourteen research teams from six countries.

The MAP below shows the wide choice of sampling sites for intensively studied sites (blue), regional transects (yellow, purple), latitudinal transect (red) and provenances in the provenance tests (green)

coords

We are seeking candidates for a Ph. D. thesis on the project – see post

Contact: Ivan Scotti, ivan.scotti[at]inra.fr.

Information, opinions, discovery.

Let us have a chat about science.

 

Do you ever feel the need to go beyond your everyday research activity, to stop and think about how science actually works?

Do you ever feel that only a part of what you do, think and find in your research activity can fit the strict frame of peer-reviewed publication and conference talk? And yet, such things have to be said and written?

I do. So I invite you to come by my campfire and have a chat around forest science.

Scroll down for the latest content.