Coronavirus, coming to a neighborhood near you! In this episode we interview Xueting Qiu, a molecular epidemiologist at the Harvard T.H. Chan School of Public Health. After a brief recap of your fave viruses, including MERS (camels!), bird flu (poultry!) and COVID19 (bats? pangolins?), we chat about how next-generation sequencing and viral genetics are rapidly shaping our understanding of how COVID19 entered into humans, how it is spreading, and how its genome could affect therapeutics and vaccines.
What is a coronavirus?
Coronaviruses (CoV) are members of a diverse species. It’s single-stranded positive-sense RNA viruses. They can cause respiratory and intestinal infections in humans and also many other animal species. The genome size of coronaviruses ranges from about 26 to 32 kilobases. It’s the largest among known RNA viruses. The name corona is from Latin, meaning “crown” or “halo”. It describes the characteristic appearance of the virus particles under an electron microscope. Coronaviruses have been classified into 4 types – alpha, beta, gamma, delta.
When and where did the nCOV19 outbreak start?
So for this recent novel coronavirus (SARS-CoV-2), it was first detected in Wuhan, China, and has spread rapidly since December 2019. The first cluster of cases were reported at a seafood market in Wuhan. Imported cases and small transmission clusters have been reported globally now. It has reached Europe, North America, Australia and Africa. As of Feb 18, 2020, globally it has caused over 73,000 confirmed infections and about 1,900 deaths.
Overall, about 14% of the illnesses were severe, which included pneumonia and shortness of breath, and about 5% have critical disease, like by respiratory failure, septic shock, and multiple organ failure. Overall, the case fatality rate was 2.3%, and the majority of death cases were in people age 60 and older or those with underlying medical conditions.
What are the major concerns from medical and public health perspective?
Actually, this new coronavirus is not the first time that we saw a coronavirus. There are four seasonal types are associated with mild respiratory symptoms in humans each year – they just cause the common cold. Another two, SARS detected in 2003 and MERS detected in 2012 can cause severe diseases.
So compared to SARS and MERS, the case fatality rate is not as high, but the outbreak scale is way larger than, for example, SARS, so the situation so far is very severe. These official case numbers are likely an underestimate because of limited reporting of mild and asymptomatic cases, and the virus is clearly capable of efficient human-to-human transmission. Based on the possibility of spread to countries with weaker healthcare systems, the World Health Organization (WHO) has declared the COVID-19 outbreak a Public Health Emergency of International Concern.
From the medical perspective, the challenges are the outbreak is right on top of flu season. It overburdens the healthcare system. Plus, there is no effective medication to treat the infections but mostly symptom-related supportive treatment. Antiviral medicines are under development now but the effects have not been well evaluated.
From public health perspective, we are concerned about how many undetected cases, and what the proportion of pre-symptomatic transmission, which will impact the effectiveness of our current control measures. So why is pre-symptomatic transmission so important here? It’s because if a person is infected but has no symptoms and starts shedding the virus and infecting people, it’s very hard to capture that person and control its spread to other people. So far, we don’t know what the proportion of pre-symptomatic transmission which is a key parameter we try to get from epidemiological data.
Nowadays global travels make the containment of outbreak very hard. Infectious diseases from a local place can easily become a global problem. So other challenges are a potential pandemic with no vaccine available. I think public health authorities in countries need to make resources ready for quarantine, contact tracing and other prevention measures.
How many samples have we sequenced of this novel coronavirus; what have we learned from it?
As of February 18, 2020, 119 full genome sequences have been deposited in GISAID (global initiative on sharing all influenza data). Most of them are human isolates (some from China and from other countries as well), some are from environmental samples from the seafood market in Wuhan, some from pangolin.
First with comparative genetic analysis, it quickly identified that the novel virus is a beta-coronavirus that’s genetically close to some SARS-like virus circulating in bats.
One thing about next generation sequencing that is fascinating is that it can recover samples from environmental samples like from soil, water, or from other surfaces from environmental settings. From the sequences of environmental samples in the seafood market, it confirms the initial of the outbreak started there.
Further genomic analysis has tried to evaluate the key parameters of this virus’ evolution like the evolutionary rate or the ancestral time of this outbreak. In the beginning, sequencing data can be noisy, but generally the phylogenetic reconstruction has provided accurate estimations on the viral ancestral time and mutation rates. I have been following two top groups – Andrew Rambaut posts in virological.org and Trever Bedford and Richard Neher developed an interactive and visualization platform called NextStrain.org. They have been updating the analysis with new sequences added in. And similar results are generated from both groups.
The phylogenetic analysis reported limited genetic variation in the currently sampled viruses but more recent ones are showing more divergence as is expected for fast evolving RNA viruses. But the lack of diversity is indicative of a relatively recent common ancestor for all these viruses.
How has genetics informed our understanding of the animal reservoir and how it got into humans?
This is an important question because different scenarios need different control measures. If the situation is multiple introductions from animal reservoir, controlling the animal viral sources are critical to stop transmission. That’s what happened for the avian influenza H7N9 in 2013 in Shanghai, China. That was due to multiple introductions from poultry population to humans. So you have to stop the close contact between poultry and humans. So that time, people closed all the live poultry markets in Shanghai and surrounding areas. People stopped the contact at the interface with the host and the outbreak was rapidly controlled at that time. But, if it is single introduction from animal reservoir, then the animal viral source won’t be a big concern at this point and we can focus on the human transmission control.
The analysis showed that the lack of genetic diversity in human viral samples of this novel coronavirus support one single introduction from animal reservoir into the human population. Because after the initial zoonotic event, the genome sequence data shows no evidence that any non-human animal reservoir has been involved in generating new cases since January. If cases in January had been the result of new zoonotic jumps from a reservoir, we would expect more genetic diversity in our data. But so far that’s not the situation so it supports one single introduction from animal reservoir into human population.
We don’t know how exactly it got into humans yet. Some coronaviruses from bats and pangolin have high similarity with the novel coronavirus. But this virus may still have one or more other non-human animal species host. We don’t know at this point because we don’t have samples from other animals yet. With more surveillance and genetic data, we may be able to reconstruct the transmission chains among animal species and from which host it jumps to humans.
But with this discovery at an early stage of the outbreak, to control the outbreak, we wont worry much at this point about the viral sources from animals. But use resources to control transmissions only among humans.
Why is it important to understand the evolutionary rate of a virus?
Generally, people are scared about mutation. Especially with new, emerging viruses, people imagine – oh, it must be evolving so fast, it will cause high case fatality and severe outcomes. So that’s why we need to know exactly how fast a new virus will evolve. We have plenty of data to do this estimation. So far the estimated evolutionary rate for the novel coronavirus is about 0.92 * 10-3 mutation/basepair/year (95% CI 0.33×10-3 – 1.46×10-3 ). So that’s not as fast as people imagine, and it’s actually similar to the mutation rates of other coronaviruses. And it’s at least 3 to 5 times slower than the flu virus.
What are other advantages of genomic data during the early stage of an emerging outbreak?
Genomic data has only been used for infectious disease in the recent decade, because next generation sequencing became commercialized since 2005. It’s been new that we can have such rapid generation of viral sequences. And it does provide a lot of advantages during early stages of an emerging outbreak. It can provide estimation on some critical parameters to estimate the situation of the outbreak. For example, for this coronavirus outbreak, we have been providing an estimation on R0 – the basic reproductive number, which is the average number of secondary cases that get infected from the index case. Another parameter is the doubling time, the time it takes for the population to double in size. So basically the larger the R0 and the smaller the doubling time, the faster the disease will spread and the harder to contain the outbreak.
The traditional way to estimate these parameters uses epidemiological data and tracking information, which takes a lot of effort and it’s slow and we usually cannot get that data until very late in the outbreak. But with genomic data now, we can estimate the viral ancestral time and consider it as the start of the outbreak. So now you have a time point and then you have some of the cases. And even if it’s not perfect – you don’t have all the cases or who infected who – you don’t have to know the exact epidemiological information and you can estimate these parameters based on the ancestral time and exponential growth of the virus. That’s how we get early estimation from genomic data on R0 and also doubling time.
How has the scientific efforts around understanding this virus differed from previous epidemics like SARs, etc.?
Compared to SARS in 2003, this time we have relatively rapid response, and scientific efforts for this outbreak have been in great shape. For this outbreak, the data sharing is rapid and data collection is better – with more information, more epidemiological information, more records, shared online, on twitter; people organizing it, people translating it into English immediately. It’s almost real time sharing of data. The rapid identification of the virus is remarkable. For SARS, it took a few months before the data was shared and the global efforts was made after a few months after the outbreak. Another thing is the reviewing process from journals to make the information available to the public is more rapid because journals are making efforts to process papers faster than before.
Are people using the sequencing data to inform vaccine discovery?
For other pathogen, like influenza, people are using sequence data to design more broadly protective vaccines. But for the novel coronavirus, I am not so sure whether people are using sequencing data or not, but I know there are 4 ongoing projects of vaccine design, but we don’t know when it will come out and whether it’s effective or not.
Where do we go from here? What else should we be monitoring and preparing for?
First, we should continue the current prevention strategies until more findings provide other effective measures. Watch the global spread and be ready for the potential of outbreak, especially in areas with stretched health care systems. Like mentioned before, the proportion of pre-symptomatic transmissions or no-symptomatic infections is critical and they should be estimated as we can. If It is likely to be pandemic, based on R0 and the progress of the epidemic, we think it may infect more than half of the population.
For long term prevention of future potential emerging virus, we need to understand the virus origins clearly. So identifying the immediate non-human animal source and obtaining virus sequences from it would be the most definitive way. So the ongoing surveillance of pneumonia in humans and other animals is very important.
Any final takeaways from this outbreak so we can prepare for the future?
The history of human is also a history of fighting infectious diseases. Emerging diseases are unpredictable. We don’t know when it will happen again but we know it will definitely happen again. So we have to learn the most from this one to get ready for future outbreak response. The key point is time – the earlier with professional response, the better containment of the outbreak. We need more investment in resource preparation or training experts. It is necessary to have in place ahead of time sufficient funding and standard protocols for both the collection of samples and the accurate recording and archiving of associated epidemiological data and to ensure patient privacy. To be able to use better tools including real-time sequencing technologies is also critical.
Another thing concerns me a lot is that misinformation regarding the coronavirus outbreak on social media is spreading crazily in the middle of the viral epidemic. It causes panic. But what we need the most is resource and information, not panic. Being scared is useless, being educated and being able to pick reliable information sources to follow are better choices to do. I would recommend to follow people from our center, for example Marc Lipsitch, Coraline Buckee, Bill Hanage and Michael Mina, and also experts like Andrew Rambaut and Trever Bedford, as well as Ben Cowling and Joseph Wu in Hong Kong.
I surely believe we will control the outbreak with great efforts together, but the important things are how we can reduce the harm and protect more people during the outbreak, how we can consider more humanity and safety for those individuals while implementing control measures, and how we can learn more from this experience and do better in the future.