Conifers (Gymnosperms) are an ancient group of plants that trace their origin back to at least the late Carboniferous period about 300 million years ago. Despite their old age, conifers are relatively species poor group of plants with about 700 described species. This stands in stark contrast to diversity in flowering plants (Angiosperms) where close to 300,000 spaces are currently described. That said, conifers are still a remarkable success story in evolutionary terms. Not only have conifers, as a group, persisted for over 300 million years, but they have also remained dominant species in many ecosystems over this time and nowhere is this perhaps more obvious than in boreal forests.

Collectively, conifers are known for their large genomes, which usually exceeds 20Gb (or 6-7 times larger than a human genome) and can in some species reach up to 40Gb. Their large and complex genomes have long hampered our understanding of conifer evolution, and it is only relatively recently that these questions have started to be addressed. The first conifer genome was published in 2013 and over the last 3-4 years, a number of additional genomes have been published. Due to the repetitive nature of conifer genomes (with >80% repetitive content), most conifer genome assemblies are still highly fragmented. Comparative mapping suggests, however, that synteny is largely conserved among species although there is conflicting evidence as to whether conifers have gone through a genome duplication or not. The low impact of genome duplications is further highlighted by the very low rate of polyploidy in conifers. Ignoring Gnetales, a highly diverged and basal group of plants that share features with both conifers and angiosperms and who’s phylogenetic placement is still highly contentious, only three polyploid conifer species are known.

Earlier studies have suggested that sequence evolution in conifers is low and that this in part is driven by a much (about an order of magnitude) lower per year mutation rate in conifers compared to angiosperms. However, the mutation rate of conifers have been highly debated and different studies have arrived at different conclusions, most likely because all studies have relied on small gene sets with little or no overlap between studies. The general view has been that conifers as a group have an extremely conserved genome structure that this has contributed to the limited speciation rate and low morphological diversity in the group.  With this in mind, my former postdoc Amanda De La Torre thought it would be a good idea to revisit the question of the rate of molecular evolution in conifers now that data from a number of species have started to become available.

Together with collaborators at VIB in Ghent, we collected data from publicly available data sets with the goal of identifying homologous genes from as many species as possible. Since homology is much hard to ascertain when dealing with multi-gene families, we searched for genes that have remained single copy over the course of seed plant evolution. In the end, we identified 42 genes that were mostly single copy and that was present in 31 conifers and 34 flowering plants. In agreement with many earlier studies, it was obvious that evolutionary rates drastically differ between conifers and angiosperms, with conifers (again excluding Gentales) showing lower sequence evolution at synonymous sites (dS) at all 42 genes compared to flowering plants .  When we instead looked at non-synonymous sites (dN) the patterns were similar, but not quite so extreme  – 88% of the genes still show lower rates of sequence evolution on conifers. The main reason for these stark differences between conifers and flowering plants is that conifers indeed have much lower mutation rates, up to 7-fold lower than in flowering plants. The average mutation rate estimate we arrived at for conifers across the 42 genes (μ=7.71×10-10 ) falls right within earlier published estimates, so our data set seem to do a good job of representing variation in the mutation rate among genes and species we looked at.

What was more surprising was that conifers were showing a much higher substitution rate ratio compared to flowering plants. The substation rate ratio, often denoted by ω, is the ratio of substitution rates at non-synonymous and synonymous sites (ω=dN/dS). The rationale for calculating ω is that even if mutation rates vary across the genome of a species, it should be roughly similar for non-synonymous and synonymous sites for a single gene as they are interspersed across the coding region. Also, ω tells us something about how strong selection is on non-synonymous mutations (assuming that synonymous sites are neutral1). If ω is less than 1, non-synonymous sites will be evolving at a lower rate than synonymous sites, and this most likely reflects that some non-synonymous mutations are under purifying selection, that is, they are deleterious or lethal and will therefore never reach fixation and be counted as substitutions. So in conifers ω equalled 0.263 whereas in flowering plants it was only 0.102, so ω was more than twice as high in conifers. There are (at least) two reasons for this, either purifying selection is weaker in conifers or there are more sites that are under positive selection, i.e. selection favouring beneficial mutations, and that the increased number of such sites results in a higher substitution rate ratio in conifers.

Separating between these two alternative hypotheses is not all that simple. To solve this issue, Amanda went back into the population genetics literature and applied a method that can be used to calculate the underlying selection coefficient that is acting on individual substitutions from the distribution of ω across sites. When she applied this method to data from the 42 genes included in the study, the results show is that even if selection is on average purifying in both conifers and flowering plants, both purifying and positive selection is stronger in conifers. So there are more mutations in conifers that have very strong deleterious effects but there are also more mutations that are strongly beneficial! This may seem counter-intuitive at first but we think the explanation is that conifers have much larger effective population sizes compared to flowering plants (again on average). Since we know from population genetics theory that selection is more efficient (relative to genetic drift) in larger populations, if conifers have larger effective population sizes this would explain why they simultaneously experience stronger selection on both deleterious and beneficial mutations.

Figure 1. Differences in number of synonymous substitutions (dS), nonsynonymous substitutions (dN), absolute rate of silent- site divergence (μ), and substitution rate ratio (ω), among life forms defined as angiosperms herbs (green), angiosperms shrub/trees (light green), and gymnosperms (blue).

Another idea that has been kicked around in the literature is that life form, that is, if a plant grows as a herb, shrub or a tree, can have effects on the rates of sequence evolution. Plant life form is largely correlated to height and earlier studies that have indeed documented correlations between substitution rates and plant height. A number of explanations have been put forward to explain this observation such as a longer generation time in trees and shrubs compared to herbs, differences in metabolic rate or effective population size and rates of mitotic cell division in the apical meristem. When we analyse our data we do indeed see such a pattern in flowering plants, trees and shrubs have slower molecular evolution compared to herbs (Figure 1). However, substitution rates  are even lower in gymnosperms, even if gymnosperms also mainly occur as trees and we again get very different results whether we include Gnetales, that include Welwitschia and Ephedra which are the only non-woody gymnosperms, or not, suggesting that life form on its own  is not sufficient to explain the results we observe between gymnosperms and flowering plants.

There are are a number of interesting observations that clearly are worth looking into in greater detail the future and perhaps the most perplexing observation is the strong positive correlation we observe between genome size and ω. When if there is a plausible explanation – that differences in effective population sizes between gymnosperms and flowering plants is driving the efficacy of selection, so that selection is more effective both on deleterious and beneficial mutations in gymnosperms – it is clear that as more data become available from gymnosperms these are questions well worth re-visiting.

La Torre, De, A. R., Li, Z., Van de Peer, Y., & Ingvarsson, P. K. (2017). Contrasting Rates of Molecular Evolution and Patterns of Selection among Gymnosperms and Flowering Plants. Molecular Biology and Evolution 34: 1363–1377.   https://doi.org/10.1093/molbev/msx069 


1 This is strictly not true, since also synonymous sites can be under selection due to things like codon bias. However, selection on synonymous sites is usually considered to be much weaker than at non-synonymous sites so that they behave as ‘approximately’ neutral