The DNA for genome sequencing of wintersweet was obtained from an accession planted in the campus of Huazhong Agricultural University. DNA was extracted and sequenced by combining three different sequencing methods that include Illumina HiSeq, 10X Genomics, and PacBio SMRT sequencing. A total of 76.96 Gb of PacBio long reads were achieved (Additional file 1: Table S1), approximately 98.83-fold high-quality sequence coverage of the 778.71 Mb genome (size estimated by k-mer frequency analysis) (Additional file 2: Fig. S1a and Additional file 1: Table S2). Flow cytometry determined an estimated haploid genome size of 805.88 Mb (Additional file 2: Fig. S1b), which was consistent with the k-mer method. After interactive error correction, the PacBio reads were assembled into primary contigs using FALCON . The primary generated contigs were then polished with Quiver, yielding 1623 contigs with an N50 length of 2.19 Mb (Table 1). The sequence error correction of the final contigs were performed using 36.48 Gb (46.85X) Illumina short reads by pilon . The consensus sequences were further scaffolded by integrating with 156.26 Gb (200.67X) 10X Genomics linked reads (Additional file 1: Table S1). The final assembly consists of 1259 scaffolds totalling 695.31 Mb with a scaffold N50 size of 4.49 Mb, covering 89.2% of the genome size estimated by genome survey (Table 1 and Additional file 1: Table S3). In order to improve the assembly, we used 93 × Hi-C data to assist the assembly correction and anchored 1027 of 1259 scaffolds into 763 super-scaffolds (Additional file 1: Table S4 and S5). All the super-scaffolds were accurately clustered and ordered into 11 pseudochromosomes (Additional file 2: Fig. S2), covering 99.42% of the original 695.31 Mb assembly, with a super-scaffold N50 of 65.35 M and a maximum scaffold length of 85.71 Mb (Additional file 1: Table S5). The number of groups corresponded well with the experimentally determined number of chromosomes in somatic cells (2n = 22) (Additional file 2: Fig. S3). In addition, 185.93 Gb Illumina sequence data was also generated and used to assemble the Calycanthus chinensis (a close relative of wintersweet belonging to the same family) genome (Additional file 1: Table S1). The size of the assembled C. chinensis draft genome was 767.4 Mb, representing ~ 92.78% of estimated genome size (Additional file 1: Table S2), with 291,991 contigs (N50 = 38.7 kb) and 241,923 scaffolds (N50 = 20.34 Mb) respectively (Additional file 1: Table S3).Table 1 Major indicators of the wintersweet genomeFull size table
To assess the genome assembly quality, we performed BUSCO and CEGMA analysis and found that 95% and 92.74% complete eukaryotic conserved genes were identified in wintersweet genome respectively (Additional file 1: Table S6), suggesting a high degree of completeness of the final assembly. In addition, the high-quality short reads generated from Illumina were mapped to the assembled genome, which exhibits excellent alignments with a mapping rate of 99.95%. Taken together, the above results indicate a high degree of contiguity and completeness of the wintersweet genome.
Based on de novo and homology-based predictions and transcriptome data, a total of 23,591 protein-coding genes were predicted with an average length of 9017 bp and an average CDS length of 1250 bp, which were comparable to that in Amborella and Lotus (Additional file 1: Table S7). The spatial distribution of these protein-coding genes along the chromosome was uneven with higher densities located at the ends of the chromosomal arms (Fig. 1b). A total of 21,940 (93.1%) predicted protein-coding possessed functional annotation (Additional file 1: Table S8). A total of 2749 non-coding RNAs (ncRNAs) including 245 ribosomal RNAs (rRNAs), 567 transfer RNAs (tRNAs), 909 microRNA, and 1028 small nuclear RNAs (snRNAs) (Additional file 1: Table S9) were also identified.Comparative evolutionary analyses of wintersweet and other typical flowering plant species
The expansion or contraction of gene families has a profound role in driving phenotypic diversity and adaptive evolution in flowering plants . In comparison with gene families in its relative species C. chinensis, wintersweet exhibited significant enrichment and reduction of 12 and 45 gene families respectively (Fig. 2a). KEGG functional enrichment analysis of the expanded gene families demonstrates that they were mainly assigned in “Sesquiterpenoid and triterpenoid biosynthesis,” “Monoterpenoid biosynthesis,” “Flavonoid biosynthesis,” and “Phenylpropanoid biosynthesis” pathways (Additional file 2: Fig. S4a and Table S10), which are responsible for the major trait (strong fragrance) specific to wintersweet.Fig. 2
Evolution of the wintersweet genome and gene families. a Phylogenetic tree of 17 plant species. The blue numbers denote divergence time of each node (MYA, million years ago), and those in brackets are 95% confidence intervals for the time of divergence between different clades. The red numbers on the branch represent bootstrap value. The pie diagram on each branch of the tree represents the proportion of gene families undergoing gain (red) or loss (green) events. The numbers below the pie diagram denote the total number of expansion and contraction gene families. Basal angiosperm (Ba). b The distribution of single-copy, multiple-copy, unique, and other orthologs in the 17 plant species. c Venn diagram represents the shared and unique gene families among five closely related magnoliids (C. praecox, C. kanehirae, L. chinense, P. nigrum, and C. chinensis). Each number represents the number of gene familiesFull size image
Defining the relationship of gene families among flowering plant species has been a powerful approach in investigating the genetic basis of plant evolution. Based on pairwise sequence similarities, we applied the predicted proteomes of wintersweet and 16 other sequenced species to identify putative orthologous gene clusters. A total of 37,137 orthologous gene families composed of 554,042 genes were identified from 17 plant species, of which 5339 clusters of genes were shared by all investigated species, representing ancestral gene families (Fig. 2b). On the other hand, 8733 gene families were present across wintersweet, C. chinensis, L. chinensis, and C. kanehirae, which most likely represent the “core” proteome of the magnoliids (Fig. 2c). There are 339 gene families containing 507 proteins specific to the wintersweet genome (Additional file 1: Table S11). Gene Ontology (GO) term enrichment analyses of wintersweet-specific genes revealed that the functional categories termed “oxidoreductase activity” and “pectinesterase activity” involved in metabolism were enriched (Additional file 2: Fig. S4b).Repetitive content and recent burst of LTR retrotransposons
In the wintersweet genome, repetitive elements occupied 45.73% of the genome, of which 96.69% were annotated as transposable elements (TEs) (Additional file 1: Table S12). Long terminal repeat retrotransposons (LTRs) were the major class of TEs that accounts for 36.2% of the assembly. Among the LTRs, the LTR/Gypsy elements were the most abundant, composing 23.3% of the genome, followed by LTR/Copia elements (8.6%, Additional file 1: Table S12). Besides the main groups of LTR elements, 3.65% of the genome was annotated as DNA elements and 3.45% as long interspersed nuclear elements, whereas the rest were assigned to other repeat families or could not be assigned (Additional file 1: Table S12). Transposable elements are unevenly distributed across the chromosomes and found to be particularly abundant in centromeric regions (Fig. 1b). Further comparative analysis of the distribution of TEs indicated a higher proportion in intergenic regions (79.19%) when compared to genic regions (16.04%) and regions adjoining genes (4.77%) (Additional file 2: Fig. S5a). Within genic regions, the TEs exhibited unequal distribution between exons and introns. 98.98% of TEs in the genic regions occurred in introns and constituted 25% of the total length of introns (Additional file 2: Fig. S5b). Comparison of gene structure with other species revealed that the average length and number of exons is similar, while the average length of introns is slightly longer and to some extent can be attributed to repeat accumulation. Moreover, the time of the LTR-RT burst in wintersweet was estimated using the 8812 putative complete LTR-RTs and revealed a peak substitution rate at around 0.03 (Additional file 2: Fig. S6). We assumed a mutation rate of 1.51 × 10− 9 per base per year , resulting in an insertion time of approximately 9.9 Ma.
In order to investigate the evolution of TEs in Magnoliids, phylogenetic trees of domains in reverse transcriptase genes were constructed for both Ty1/Copia and Ty3/Gypsy superfamily. In the phylogenetic tree of Ty3/Gypsy superfamily, the majority of LTR-RTs from wintersweet were clustered into the tork clade (Additional file 2: Fig. S7a). Compared with L. chinensis and C kanehirae, the LTR-RTs in wintersweet and C. chinensis exhibited higher diversity and abundance within the tork clade, indicating greater expansion and divergence in wintersweet and C. chinensis genome. The Copia superfamily displayed a different pattern, with four major clades consisting of elements from all these four species (Additional file 2: Fig. S7b), suggesting a conserved evolution pattern of the Copia superfamily, as described previously [21, 22].Phylogenomic placement of Magnoliids sister to eudicots
The phylogenetic relationships of Magnoliids, monocots, and eudicots have been somewhat controversial in plant taxonomy. In an effort to infer the phylogenetic position of the Magnoliids relative to monocots and eudicots, a set of 213 evaluated single-copy ortholog sets (OSCG) were first identified with OrthoMCL  using genome data from 17 flowering plant species that includes 5 monocots, 6 eudicots, 5 magnoliids, and 1 basal angiosperm. We applied both coalescent and concatenation approaches to reconstruct phylogenetic trees using the 213-gene dataset. Both coalescent and concatenation analyses yielded an identical highly supported topology with magnoliids as a sister group to eudicots after their divergence from monocots (Fig. 2a and Additional file 2: Fig. S8a). To avoid the potential errors in ortholog identification, we also used SonicParanoid  to extract single-copy genes (SSCG) from the 17 plant genomes described above. Only those genes sampled from at least 14 species were utilized for the construction of phylogenetic trees. On the basis of 216 single-copy genes, the phylogenetic trees were then similarly inferred by both coalescent and concatenation methods as those described above. The resulting species trees were topologically identical to the phylogenetic findings revealed by OrtholMCL described above (Additional file 2: Fig. S8b).
Although the same set of phylogenetic relationships among Magnoliids, monocots, and eudicots was consistently recovered, the topological conflicts were also observed among coalescent-based gene trees (Additional file 2: Fig. S8c). To estimate the discordance among gene trees in OSCG and SSCG datasets, we took advantage of the quartet score in ASTRAL  to display the proportions of gene trees in support of three different branching orders for Magnoliids, monocots, and eudicots (Additional file 2: Fig. S8d) and found that the percentages of gene trees supporting Magnoliids and eudicots together forming a sister group with monocots is higher than the other two topologies (Additional file 2: Fig. S8c). However, in the phylogenetic analyses of a concatenated sequence alignment of 38 chloroplast single-copy genes for 26 taxa, the magnoliids were placed as a sister group to the clade consisting of eudicots and monocots (Additional file 2: Fig. S9). Furthermore, the short phylogenetic branches among magnoliids, eudicots, and monocots clades, representing rapid speciation events, were also observed in these phylogenetic genes. The phylogenetic incongruence between nuclear and plastid genomes may be caused by incomplete lineage sorting (ILS), which appears more frequently during the rapid divergence of early mesangiosperms. As the inadequate taxon sampling could result in incongruent phylogeny, we improved the taxon sampling by adding additional genome data from 11 phylogenetically pivotal species and a transcriptome data set of chloranthales to reconstruct the phylogenetic tree. This approach recovered the same phylogenetic relationships among magnoliids, eudicots, and monocots (Additional file 2: Fig. S10). Thus, from these results, we believe the phylogenetic relationship proposed in our study is relatively accurate under the current dataset. Based on the high-confidence phylogenetic tree and calibration points selected from articles and TimeTree website, the divergence time between the magnoliids and the eudicots were estimated to be 113.0–153.1 Ma (95% confidence interval) (Fig. 2a), which overlaps with three recent estimates (114.6–164 Ma, 117–189 Ma, and 136.0–209.4 Ma) [13, 25, 26].Whole-genome duplication and genome evolution analysis
Whole-genome duplication (WGD) has long been regarded as the major driving force in plant evolution . To investigate WGD events during the evolutionary course of wintersweet, we first searched for genome-wide duplications and assigned them into four different modes with MCScanX analysis (Additional file 2: Fig. S11). The WGD/segmental duplication was identified as the dominant type that includes 4511 paralogous gene pairs in 265 syntenic blocks. Among these syntenic blocks, 36.09% were found to share relationships with three other blocks across the genome (Additional file 2: Fig. S12). The widespread synteny and well-maintained one-versus-three syntenic blocks suggest that two WGD events might have occurred during wintersweet genome evolution. It is well accepted that Amborella is a single living species that is the sister lineage to all other groups within the angiosperms, and there is no evidence of lineage-specific polyploidy after it diverged from the last common ancestor of angiosperms . Collinearity and synteny analysis between the wintersweet and Amborella genome also provided clear structural evidence for two WGDs in wintersweet with a 1:4 syntenic depth ratio in Amborella-wintersweet comparison (Fig. 3a). To further elucidate the polyploidy of wintersweet genomes, we performed a comparative genomic analysis of wintersweet with C. kanehirae and L. chinensis. Syntenic depth ratios of 4:4 and 2:4 were inferred in the wintersweet-Cinnamomum and wintersweet-Liriodendron comparisons, respectively (Fig. 3b). Based on the syntenic relationships between and within each species, our analyses collectively indicate that wintersweet underwent two WGD events.Fig. 3
Comparative genomics analyses. a Synteny patterns between genomic regions from wintersweet and Amborella. This pattern shows that some typical ancestral regions in the basal angiosperm Amborella have four corresponding copy regions in wintersweet. This collinear relationship is highlighted by one syntenic set shown in red and green colors. b Syntenic blocks between genomes. Dot plots of orthologs show a 4–4 chromosomal relationship between the wintersweet genome and C. kanehirae genome, and 2–4 chromosomal relationship between wintersweet genome and L. chinense genome. c Distribution of synonymous substitution levels (Ks) of syntenic orthologous (dashed curves) and paralogous genes (solid curves) after evolutionary rate correction. d Evolutionary model of the Laurales genomes. The Laurales ancestral chromosomes are represented by ten colors. Polyploidization events are shown by 3 dots of different colors, along with the chromosome fusions (Fu) and fissions (Fi). The modern structure of the Laurales genomes is illustrated at the bottom of the figure. In some regions, we could not determine which ancestral chromosome they derived from, and those regions were represented as white spacesFull size image
To estimate the timing of the two WGD events in the wintersweet genome, we characterized synonymous substitutions on synonymous nucleotide sites (Ks) between collinear homoeologs within or between wintersweet and other three species including C. chinensis, Cinnamomum kanehirae, and Liriodendron chinensis from Magnoliids. The Ks distributions of one-to-one orthologs identified between Amborella and the other four species show different Ks peaks, suggesting divergent evolutionary rates among these four species (Additional file 2: Fig. S13). After correction for evolutionary rate , the synonymous substitutions per site per year as 4.21 × 10− 9 for Laurales were calculated using the mean Ks values of syntenic blocks, resulting in the estimated time of the WGD event at approximately 77.8 million and 112.1 million years ago (Ma), respectively (Fig. 3c). Previous analysis of the genome of Cinnamomum suggested that the ancient WGD event seems shared by Magnoliales and Laurales , and the absolute dating of the identified WGD events in Liriodendron tulipifera also supported this hypothesis . In our study, we also detected two and one polyploidization events in Cinnamomum and Liriodendron respectively, but no common WGD event was shared by these two species. Furthermore, the wintersweet genome shares an ancient WGD event with Cinnamomum but not with Liriodendron. Moreover, the trees of the syntenic gene groups of wintersweet and Liriodendron vs. Amborella indicated that wintersweet and Liriodendron experienced a WGD event respectively after their divergence from a common ancestor (Additional file 2: Fig. S14 and Additional file 3: Supplementary Note 3). Thus, from these results, we conclude that the ancient wintersweet WGD event has occurred before the divergence of Calycantaceae and Lauraceae but after the divergence of Calycantaceae and Magnoliaceae.
We also used orthologous and paralogous genes derived from the intergenomic and intragenomic analysis of the wintersweet and C. chinensis as well as C. kanehirae genomes to construct a putative ancestral genome of the Laurales, and proposed an evolutionary scenario where these three lineages were derived from a putative ancestor (Fig. 3d and Additional file 2: Fig. S15), which consisted of ten chromosomes and 4216 genes. This ancestor went through a WGD event to reach a 20-chromosome intermediate and then experienced chromosomal rearrangements to form present-day karyotypes. In wintersweet, all the chromosomes underwent rearrangements and every chromosome came from at least two ancient chromosomes. A minimum of 49 chromosomal fissions and 48 chromosomal fusions were predicted to have occurred in wintersweet to reach its current structure of 11 chromosomes (Fig. 3d).Genetic basis of floral transition, floral organ specification, and early blooming in winter
Wintersweet is one of the perennial trees that bloom in the deep winter. It took approximately 10 months for C. praecox to complete its reproductive development. To investigate this whole process of the flower development that may influence the final flowering time, we first performed a systematic study on the floral ontogeny and developmental patterns by paraffin sections through observation. The results indicated that the floral bud was initiated in April, floral patterning and floral organ specification occurred from April to July, slow growth in summer, the male and female gametophytes were formed in October and December respectively, the flower bud transitioned into dormancy, then break occurred in December, and the flower bloomed in deep winter (Fig. 4a). To investigate the molecular mechanisms underlying the critical flower developmental stages, we generated and analyzed RNA-seq data for representative flower developmental stages from the timing of floral initiation to maturation.Fig. 4
Schematic depiction of the key developmental stages of flower bud and the analysis of floral organ identity and flowering-time related genes. a Flower developmental stages including floral transition, meristem specification, floral organ specification, floral bud dormancy and release, and blooming. Abbreviations for the flower bud developmental stages: undifferentiated flower bud stage (FBS1); flower primordium formation stage (FBS2); tepal primordium formation stage (FBS3); stamen primordium formation stage (FBS4); pistil primordium formation stage (FBS5); flower organ development and differentiation stage (FBS6); slow growth stage (FBS7); ovule appearance stage (FBS8); pollen formation stage (FBS9), blooming in winter. ap, shoot apex; fp, floral primordium; t, tepals; s, stamens; p, pistil; a, anther; o, ovule; po, pollen; es, embryonic sac. b The expression patterns of genes related to flowering time and floral organ identity at different flower developmental stages. c The phylogenetic tree of MADS-box genes and gene expression patterns of B-/C-function genes from various floral organs. The MADS-box proteins in wintersweet are marked by the yellow boxFull size image
Floral initiation is controlled by the spatial and temporal expression of flowering-time-related genes in multiple pathways . Many genes from these pathways have been identified and characterized in various herbaceous and perennial species and reported to have the conserved function [30,31,32]. A database of flowering-time gene networks was recently constructed in Arabidopsis thaliana . Taking advantage of this database, we identified 594 flowering-time genes in eight pathways (Additional file 1: Table S13). Analysis of RNA-seq data shown that during the floral transition the flowering-time genes related to gibberellin biosynthesis and signaling transduction pathway were significantly activated (padj