Xueting Qiu on the latest insights from 10,000 SARS-CoV-2 genomes

Returning champion Xueting Qiu tells us about the newest findings from analyzing 10,000+ SARS-CoV-2 genomes. We discuss what the genomes tell us about the origins and spread of the outbreak across the US, how fast the virus is mutating, whether different ‘strains’ of the virus exist, and – the question on everyone’s mind – how we can reopen the country.

Interview Highlights

The data

How many viral genomes have been sequenced at this point and what is their geographic distribution?

As of 4/17, we have had more than 10,000 sequences deposited in the GISAID database. By country, so far the US has submitted the most. But generally by region, most of data is from Europe, North America, East Asia and Australia.

It is amazing how fast the data have been generated. Two weeks ago, it was only about 4,500 sequences deposited in the database.

Lineage tracing

What have the genetics taught us about how the virus got to the United States?

Genomic comparative analysis is a great resource to infer viral sources – it can tell you the viral origins from what hosts or from where. Since the virus spreading is now dominated by human hosts, let’s focus on the geographic spreading here. To answer how the virus got to the US: if you check Nextstrain (https://nextstrain.org/ncov/global) which is a nearly real-time tracking platform, we will see the transmissions are globally connected. When we zoom in to see the introductions into the US, we see that there were multiple introductions driving the US epidemic and the earliest was in January. There were multiple paths the virus took to reach the US. There was a direct introduction from China that occurred in late January and there were multiple introductions from the European epidemic that occurred during the course of February.

Map of global transmissions of SARS-CoV-2 from nextstrain.org as of April 17, 2020

What have the genetics taught us about how the virus is spreading within the United States?

Last time, we talked about the story in the Washington state. Even before we started massive testing, based on the first two sequences in Washington, we inferred the viral transmissions have been local community transmission for weeks. With more retrospective epidemiological data from Seattle Flu Study, now the inference is confirmed.

In Washington state, about 5,000 new cases were diagnosed in March, most of them are the descendants of the first Washington case based on the genomic data. We later then have found its descendants in New York, California, Connecticut, Minnesota, and Wisconsin, some of the few states to publish viral genome sequences so far. It also spreads to other global regions. The connection by air flights nowadays makes the viral transmission easily a global problem. 

Mutations, strains

What is the genome size of SARS-CoV-2? How many genes does it have and what are the function of a few of the most important ones?

The full length of the SARS-CoV-2 is about 30,000 nucleotides, one of the largest genome among RNA viruses. It is a non-segmented genome, which encodes at least 12 functional proteins. Among these, the important ones are the surface proteins, for example, the spike glycoprotein. The name describes how it looks – they are the spikes on the surface of the virus. For viral functions, it contains the receptor-binding domain, which helps the virus enters human cell and initiates the infection. For human immune response, it contains important antigens to stimulate the immune system in human body. That is, they are important for vaccine design.

A picture containing clock

Description automatically generated
Anatomy of a coronavirus, from https://www.manuelbortoletti.com/ for The Economist, March 14, 2020

How quickly is the virus mutating? Are these results in keeping with the original estimates?

The substitution rate is 0.75 x 10-3 substitutions per site per year, about 24 substitutions per year or 2 substitutions per month for the whole genome, along a transmission chain.  It is slower than the original estimates (0.92 x 10-3 substitutions/site/year). This is normal as more data are available. It doesn’t mean the virus mutates slower – it is just the reduction on the uncertainty with more data and longer time period of the samples.  

Where in the genome are the mutations occurring?

Not evenly. There are some hotspots in different genes. It may be related to the functions of the protein and different region of the protein is under different selection pressure.

Hotspots of mutations (high bars) along SARS-CoV-2 genome from nextstrain.org as of April 17, 2020

A lot of attention has been paid to different “strains” of SARS-CoV-2 in the media. What does it mean to be a different “strain”?

A strain is a genetic variant or subtype of a microorganism. “Strain” is unfortunately an overloaded scientific term. Here we have to differentiate the concepts of strain vs genetic variant.

In many circumstances, every unique viral genome will be counted as a separate strain. If we use this criteria then we’ve seen thousands of “strains” out of these available SARS-CoV-2 genomes currently. But almost all of the changes in the genome will do very little to affect viral function. So, I would more prefer to call this situation as genetic variants here.

Another definition of strain is defined as a functionally distinct virus genotype. What’s very tricky is that we can’t know without doing experiments if one genetic variant behaves differently than another, especially when there’s only a small handful of genetic changes between them. For example, there have been only 11 mutations to proteins that are widely distributed. These are *potentially* functionally distinct variants that deserve attention and experimental and clinical follow up. This could be studied via cross-neutralization assays to see if sera from recovered individuals respond differently to these two variants.

Use flu as a more mature example: think about H3N2 – each season, we have to change the vaccine strain. It is because the genetic drift is causing substantial antigenic change. (In other words, genetic variation in viruses accumulate in the virus genes that code for virus-surface proteins that host antibodies recognize.) We can define them into different strains based on the genetic distance and antigenic distance with experiments.

If you get infected with one strain, will you have immunity against the others? 

It depends on the cross-reactivity between the strains, that is, how close these strains are. Two main factors can impact how quickly the strains are diverging. One is the viral mutation rate, one is the immune selection pressure in human.

The per-base mutation rate of SARS-CoV-2 compared to influenza is about 2-3 fold slower. Here we see that seasonal coronaviruses may behave similarly to seasonal flu in which frequent mutations to the spike protein (the protein targeted by immunity) are observed. Plus there is not much immune pressure in the population to select the strains. So the herd immunity takes time to establish, it will still take years to have the virus escaped from current immunity.

Are certain strains more lethal or weaker?

We don’t know yet. But I have to say the case fatality rate is more related to other factors when the virus is extremely similar. That is, the death is not solely depending on viral variants: age structure, economic status, living conditions, health education, prevention measures, healthcare system etc. all play a role.

What are the implications of the mutation rate/strains for the development of therapeutics and vaccines?

This is a great question. Like I mentioned previously, this virus evolves slower compared to flu, and there is not much immune pressure to select it currently. we should see occasional mutations to the spike protein of SARS-CoV-2 that allow the virus to partially escape from vaccines or existing “herd” immunity, but that this process will most likely take years rather than months. Below is the estimation of infected proportion in some European countries, so far it is only 0.5-3% of infected, far from the 50% of expected infection to mount herd immunity. So it probably will take the virus a few years to mutate enough to significantly hinder a vaccine. Similarly to therapeutics, the resistance to anti-viral medicine won’t be a concern for a few years.

Image
Estimates of total number of infections in different European countries from MRC Center for Global Infections Disease Analysis (https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe-npi-impact/)

Prevention, reopening

Is social distancing working?

Yes. From previous evidence in China. Some evidence from a few European countries, like in Germany, Italy, etc. And from the current changes in several states in the US. For example, in King County in Washington state, researchers correlated Facebook mobility data with effective reproduction number ( Re). ( Re or Rt is the real time estimation of secondary infections under some control measures.) We saw that along with the reduced mobility, the Re is reducing as well. So reality and data all tell us that social distancing works. Because we did social distance, we controlled it, so we didn’t see an extreme surge in cases – people may think we over-reacted. But we did not, it works and we have prevented the worse situation.

Models of true COVID-19 infections and deaths and reproduction number in Germany, as related to government interventions from https://epiforecasts.io/covid/posts/global

What are the proposed ideas for reopening the country?

I don’t think anyone has found a good answer. The pandemic is far from done. It is still in a very early stage, since most of the population in the country remains susceptible. The goal of the current set of restrictions is not to solve the problem, but rather to solve the acute problem of keeping the numbers of patients from exceeding health care capacity.

We saw now the shutdown works out well, the 30-day lockdown has saved many, many lives. What we should do next? There is a heavy dilemma here: if we relax restrictions, as we saw in the 1918 pandemic, and as we’ve seen probably in China, Hong Kong, and Singapore now, There’s every reason to expect a resurgence of coronavirus and we’re back in the same problem. On the other hand, keeping these restrictions in place is economically disastrous.

We probably have to try different things – one proposed strategy is to have serological testing – if people have been infected, they will have protective antibodies and they can go back to work. But now I think the serological testing is still more in a research perspective before it is widely used, since we still don’t know much about the immunity to this virus.

Another thing we can try is to do what China is doing. Very cautiously to reopen business, require people to take all the preventive measures – hand hygiene, wearing masks, close borders, and testing, tracing, and quarantine.

I think we should learn about all the successful experiences from different countries and trim it to the proper prevention measures for use locally. We cannot 100% copy one set of measures, but we are learning to use the best to save lives, like social distancing, wearing masks, etc.

At this moment, serological testing is promising, but we don’t know yet. People usually hesitate to admit that they don’t know. But admitting what we don’t know first is thriving ourselves to find the answers.

Read more

Leave a Reply

Your email address will not be published. Required fields are marked *