53), this is inferred to have occurred before the divergence of RaTG13 and SARS-CoV-2 and thus should not influence our inferences. Phylogenetic Assignment of Named Global Outbreak Lineages https://doi.org/10.1038/s41564-020-0771-4, DOI: https://doi.org/10.1038/s41564-020-0771-4. Meet the people who warn the world about new covid variants Google Scholar. Effect of closure of live poultry markets on poultry-to-person transmission of avian influenza A H7N9 virus: an ecological study. Anderson, K. G. nCoV-2019 codon usage and reservoir (not snakes v2). Evolutionary rate estimation can be profoundly affected by the presence of recombination50. Nat. Bruen, T. C., Philippe, H. & Bryant, D. A simple and robust statistical test for detecting the presence of recombination. Adv. Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). 1, vev016 (2015). Extended Data Fig. 4). The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. 5. Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Furthermore, the other key feature thought to be instrumental in the ability of SARS-CoV-2 to infect humansa polybasic cleavage site insertion in the Sproteinhas not yet been seen in another close bat relative of the SARS-CoV-2 virus. After removal of A1 and A4, we named the new region A. Syst. Nature 579, 265269 (2020). J. Virol. The plots are based on maximum likelihood tree reconstructions with a root position that maximises the residual mean squared for the regression of root-to-tip divergence and sampling time. USA 113, 30483053 (2016). Get the most important science stories of the day, free in your inbox. Holmes, E. C., Rambaut, A. Boxes show 95% HPD credible intervals. The latter was reconstructed using IQTREE66 v.2.0 under a general time-reversible (GTR) model with a discrete gamma distribution to model inter-site rate variation. SARS-CoV-2 and RaTG13 are the most closely related (their most recent common ancestor nodes denoted by green circles), except in the 222-nt variable-loop region of the C-terminal domain (bar graphs at bottom). A tag already exists with the provided branch name. There are outstanding evolutionary questions on the recent emergence of human coronavirus SARS-CoV-2 including the role of reservoir species, the role of recombination and its time of divergence from animal viruses. The virus then. Boni, M. F., Zhou, Y., Taubenberger, J. K. & Holmes, E. C. Homologous recombination is very rare or absent in human influenza A virus. Ge, X. et al. Holmes, E. C. The Evolution and Emergence of RNA Viruses (Oxford Univ. & Li, X. Crossspecies transmission of the newly identified coronavirus 2019nCoV. Removal of five sequences that appear to be recombinants and two small subregions of BFRA was necessary to ensure that there were no phylogenetic incongruence signals among or within the three BFRs. 32, 268274 (2014). Nature 583, 282285 (2020). PubMed PubMed SARS-CoV-2 is an appropriate name for the new coronavirus. Conservatively, we combined the three BFRs >2kb identified above into non-recombining region1 (NRR1). Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. PDF single centre retrospective study & Bedford, T. MERS-CoV spillover at the camelhuman interface. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. The idea is that pangolins carrying the virus, SARS-CoV-2, came into contact with humans. Divergence time estimates based on the three regions/alignments where the effects of recombination have been removed. 4), but also by markedly different evolutionary rates. Two exceptions can be seen in the relatively close relationship of Hong Kong viruses to those from Zhejiang Province (with two of the latter, CoVZC45 and CoVZXC21, identified as recombinants) and a recombinant virus from Sichuan for which part of the genome (regionB of SC2018 in Fig. 30, 21962203 (2020). Virological.org http://virological.org/t/ncovs-relationship-to-bat-coronaviruses-recombination-signals-no-snakes-no-evidence-the-2019-ncov-lineage-is-recombinant/331 (2020). G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. These means are based on the mean rates estimated for MERS-CoV and HCoV-OC43, respectively, while the standard deviations are set ten times higher than empirical values to allow greater prior uncertainty and avoid strong bias (Extended Data Fig. Mol. In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. 11,12,13,22,28)a signal that suggests recombinationthe divergence patterns in the Sprotein do not show evidence of recombination between the lineage leading to SARS-CoV-2 and known sarbecoviruses. 2). 5). Coronavirus: Pangolins found to carry related strains. Med. & Andersen, K. G. The evolution of Ebola virus: insights from the 20132016 epidemic. 3). Prolonged SARS-CoV-2 Infection and Intra-Patient Viral Evolu : The Phylogenetic classification of the whole-genome sequences of SARS-CoV-2 Open reading frames are shown above the breakpoint plot, with the variable-loop region indicated in the Sprotein. 6, eabb9153 (2020). A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Abstract. 5, 536544 (2020). performed codon usage analysis. CAS CAS Did Pangolin Trafficking Cause the Coronavirus Pandemic? Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Rambaut, A., Lam, T. T., Carvalho, L. M. & Pybus, O. G. Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen). The web application was developed by the Centre for Genomic Pathogen Surveillance. Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. It compares the new genome against the large, diverse population of sequenced strains using a Nature 503, 535538 (2013). MC_UU_1201412). Biol. PubMedGoogle Scholar. We compiled a dataset including 27human coronavirus OC43 virus genomes and ten related animal virus genomes (six bovine, three white-tailed deer and one canine virus). 82, 18191826 (2008). Nat. Coronavirus origins: genome analysis suggests two viruses may have combined As informative rate priors for the analysis of the sarbecovirus datasets, we used two different normal prior distributions: one with a mean of 0.00078 and s.d. This is not surprising for diverse viral populations with relatively deep evolutionary histories. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. 2a. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. Sci. We focused on these three non-recombining regions/alignments for divergence time estimation; this avoids inappropriate modelling of evolutionary processes with recombination on strictly bifurcating trees, which can result in different artefacts such as homoplasies that inflate branch lengths and lead to apparently longer evolutionary divergence times. and P.L.) With horseshoe bats currently the most plausible origin of SARS-CoV-2, it is important to consider that sarbecoviruses circulate in a variety of horseshoe bat species with widely overlapping species ranges57. obtained the genome sequences of 10 SARS-CoV-2 virus strains through nanopore sequencing of nasopharyngeal swabs in Malta and analyzed the assembled genome with pangolin software, and the results showed that these virus strains were assigned to B.1 lineage, indicating that SARS-CoV-2 was widely spread in Europe (Biazzo et al., 2021). Combining regions A, B and C and removing the five named sequences gives us putative NRR1, as an alignment of 63sequences. We used TreeAnnotator to summarize posterior tree distributions and annotated the estimated values to a maximum clade credibility tree, which was visualized using FigTree. stand-alone pangolin work flows or Illumina DRAGEN COVID Lineage App (v3.5.5) following the default parameters. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage - Nature Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. This boundary appears to be rarely crossed. The divergence time estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent among the three approaches we use to eliminate the effects of recombination in the alignment. PLoS ONE 5, e10434 (2010). Nucleotide positions for phylogenetic inference are 147695, 9621,686 (first tree), 3,6259,150 (second tree, also BFR B), 9,26111,795 (third tree, also BFR C), 12,44319,638 (fourth tree) and 23,63124,633, 24,79525,847, 27,70228,843 and 29,57430,650 (fifth tree). In early January, the aetiological agent of the pneumonia cases was found to be a coronavirus3, subsequently named SARS-CoV-2 by an International Committee on Taxonomy of Viruses (ICTV) Study Group4 and also named hCoV-19 by Wu et al.5. Med. According to GISAID . 35, 247251 (2018). The canine viral genome was excluded from the Bayesian phylogenetic analyses because temporal signal analyses (see below) indicated that it was an outlier. Note that breakpoints can be shared between sequences if they are descendants of the same recombination events. Aside from RaTG13, Pangolin-CoV is the most closely related CoV to SARS-CoV-2. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. 17, 15781579 (1999). Google Scholar. Genetics 172, 26652681 (2006). 94, e0012720 (2020). Stegeman, A. et al. It is clear from our analysis that viruses closely related to SARS-CoV-2 have been circulating in horseshoe bats for many decades. 4, vey016 (2018). However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Because the estimated rates and divergence dates were highly similar in the three datasets analysed, we conclude that our estimates are robust to the method of identifying a genomes NRRs. Bioinformatics 22, 26882690 (2006). SARS-CoV-2 Variant Classifications and Definitions Lemey, P., Minin, V. N., Bielejec, F., Pond, S. L. K. & Suchard, M. A. We aimed to analyze 3 naso-oropharyngeal swab samples collected between August and December 2021 to describe the amino acid changes present in the sequence reads that may have a role in the emergence of new . RegionB is 5,525nt long. Host ecology determines the dispersal patterns of a plant virus. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Pangolin relies on a novel algorithm called pangoLEARN. Divergence dates between SARS-CoV-2 and the bat sarbecovirus reservoir were estimated as 1948 (95% highest posterior density (HPD): 18791999), 1969 (95% HPD: 19302000) and 1982 (95% HPD: 19482009), indicating that the lineage giving rise to SARS-CoV-2 has been circulating unnoticed in bats for decades. wrote the first draft of the manuscript, and all authors contributed to manuscript editing. Trends Microbiol. 1, vev003 (2015). Proc. ac, Root-to-tip (RtT) divergence as a function of sampling time for the three coronavirus evolutionary histories unfolding over different timescales (HCoV-OC43 (n=37; a) MERS (n=35; b) and SARS (n=69; c)). https://doi.org/10.1093/molbev/msaa163 (2020). These residues are also in the Pangolin Guangdong 2019 sequence. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. volume5,pages 14081417 (2020)Cite this article. Phylogenetic Assignment of Named Global Outbreak LINeages, The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance. Menachery, V. D. et al. Yuan, J. et al. Evol. A new coronavirus associated with human respiratory disease in China. Divergence time estimates based on the HCoV-OC43-centred rate prior for the separate BFRs (Supplementary Table 3) show consistency in TMRCA estimates across the genome. But some theories suggest that pangolins may be the source of the novel coronavirus. If stopping an outbreak in its early stages is not possibleas was the case for the COVID-19 epidemic in Hubeiidentification of origins and point sources is nevertheless important for containment purposes in other provinces and prevention of future outbreaks. In the absence of a strong temporal signal, we sought to identify a suitable prior rate distribution to calibrate the time-measured trees by examining several coronaviruses sampled over time, including HCoV-OC43, MERS-CoV, and SARS-CoV virus genomes. Influenza viruses reassort17 but they do not undergo homologous recombination within RNA segments18,19, meaning that origins questions for influenza outbreaks can always be reduced to origins questions for each of influenzas eight RNA segments. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004. All three approaches to removal of recombinant genomic segments point to a single ancestral lineage for SARS-CoV-2 and RaTG13. Phylogenies of subregions of NRR1 depict an appreciable degree of spatial structuring of the bat sarbecovirus population across different regions (Fig. & Andersen, K. G. Pandemics: spend on surveillance, not prediction. The rate of genome generation is unprecedented, yet there is currently no coherent nor accepted scheme for naming the expanding . 1c). Lu, R. et al. Even before the COVID-19 pandemic, pangolins have been making headlines. Because there is no single accepted method of inferring breakpoints and identifying clean subregions with high certainty, we implemented several approaches to identifying three classic statistical signals of recombination: mosaicism, phylogenetic incongruence and excessive homoplasy51. The origins we present in Fig. GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. Trends Microbiol. Microbiol. Sibling lineages to RaTG13/SARS-CoV-2 include a pangolin sequence sampled in Guangdong Province in March 2019 and a clade of pangolin sequences from Guangxi Province sampled in 2017. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). July 26, 2021. Of the nine breakpoints defining these ten BFRs, four showed phylogenetic incongruence (PI) signals with bootstrap support >80%, adopting previously published criteria on using a combination of mosaic and PI signals to show evidence of past recombination events19. Lam, H. M., Ratmann, O. The pangolin coronaviruses show lower similarity to SARS-CoV-2 than bat coronavirus RaTG13 across the whole genome, but higher similarity in the spike receptor binding domain, although the similarity at either scale remains too low to implicate . CoV-lineages GitHub J. Virol. Probable Pangolin Origin of SARS-CoV-2 Associated with the COVID-19 For the HCoV-OC43, MERS-CoV and SARS datasets we specified flexible skygrid coalescent tree priors. An initial genomic sequence analysis found that the reemergence of COVID-19 in New Zealand was caused by a SARS-CoV-2 from the (now ancestral) lineage B.1.1.1 of the pangolin nomenclature ( 17 ). Trova, S. et al. 874850). Of the countries that have contributed SARS-CoV-2 data, 30% had genomes of this lineage. Duchene, S., Holmes, E. C. & Ho, S. Y. W. Analyses of evolutionary dynamics in viruses are hindered by a time-dependent bias in rate estimates. Current sampling of pangolins does not implicate them as an intermediate host. The histogram allows for the identification of non-recombining regions (NRRs) by revealing regions with no breakpoints. Boni, M.F., Lemey, P., Jiang, X. et al. Scientists defined the pangolin lineage of this variant to be B.1.1.523 and it was originally recognized as a variant under monitoring on July 14, 2021. J. Gen. Virol. Because the SARS-CoV-2 S protein has been implicated in past recombination events or possibly convergent evolution12, we specifically investigated several subregions of the Sproteinthe N-terminal domain of S1, the C-terminal domain of S1, the variable-loop region of the C-terminal domain, and S2. Eight other BFRs <500nt were identified, and the regions were named BFRAJ in order of length. A deep dive into the genetics of the novel coronavirus shows it seems to have spent some time infecting both bats and pangolins before it jumped into humans, researchers said . J. Med. Current Overview on Disease and Health Research Vol. 6