Why herd immunity to COVID-19 is reached much earlier than thought

Why herd immunity to COVID-19 is reached much earlier than thought

Posted on May 10, 2020 by niclewis | 16 Comments

By Nic Lewis

A study published in March by the COVID-19 Response Team from Imperial College (Ferguson20 [1]) appears to have been largely responsible for driving government actions in the UK and, to a fair extent, in the US and some other countries. Until that report came out, the strategy of the UK government, at least, seems to have been to rely on the build up of ‘herd immunity’ to slow the growth of the epidemic and eventually cause it to peter out.

The ‘herd immunity threshold’ (HIT) can be estimated from the basic reproduction rate of the epidemic, R0 – a measure of how many people, on average, each infected individual infects. Standard simple compartmental models of epidemic growth imply that the HIT equals {1 – 1/R0}. Once the HIT is passed, the rate of new infections starts to decline, which should ensure that health systems will not thereafter be overwhelmed and makes it more practicable to take steps to eliminate the disease.

However, the Ferguson20 report estimated that relying on herd immunity would result in 81% of the UK and US populations becoming infected during the epidemic, mainly over a two-month period, based on an R0 estimate of 2.4. These figures imply that the HIT is between 50% and 60%. [2] Their report implied that health systems would be overwhelmed, resulting in far more deaths. It claimed that only draconian government interventions could prevent this occurring. Such interventions were rapidly implemented in the UK, in most states of the US, and in various other countries, via highly disruptive and restrictive enforced ‘lockdowns.’

A notable exception was Sweden, which has continued to pursue a herd immunity-based strategy, relying on relatively modest social distancing policies. The Imperial College team estimated that, after those policies were introduced in mid-March, R0 in Sweden was 2.5, with only a 2.5% probability that it was under 1.5. [3] The rapid spread of COVID-19 in the country in the second half of March suggests that R0 is unlikely to have been significantly under 2.0. [4]

Very sensibly, the Swedish public health authority has surveyed the prevalence of antibodies to the SARS-COV-2 virus in Stockholm County, the earliest in Sweden hit by COVID-19. They thereby estimated that 17% of the population would have been infected by 11 April, rising to 25% by 1 May 2020. [5] Yet recorded new cases had stopped increasing by 11 April (Figure 1), as had net hospital admissions, [6] and both measures have fallen significantly since. That pattern indicates that the HIT had been reached by 11April, at which point only 17% of the population appear to have been infected.

How can it be true that the HIT has been reached in Stockholm County — with only about 17% of the population having been infected, while an R0 of 2.0 is normally taken to imply a HIT of 50%?

The importance of population inhomogeneity

A recent paper (Gomes et al. [7]) provides the answer. It shows that variation between individuals in their susceptibility to infection and their propensity to infect others can cause the HIT to be much lower than it is in a homogeneous population. Standard simple compartmental epidemic models take no account of such variability. And the model used in the Ferguson20 study, while much more complex, appears only to take into account inhomogeneity arising from a very limited set of factors – notably geographic separation from other individuals and household size – with only a modest resulting impact on the growth of the epidemic. [8] Using a compartmental model modified to take such variability into account, with co-variability between susceptibility and infectivity arguably handled in a more realistic way than by Gomes et al., I confirm their finding that the HIT is indeed reached at a much lower level than when the population is homogeneous. That would explain why the HIT appears to have been passed in Stockholm by mid April. The same seems likely to be the case in other major cities and regions that have been badly affected by COVID-19.

Figure 1. New COVID-19 cases reported in Stockholm County, Sweden, over the 7 days up to the date shown. Note that in Sweden testing for COVID-19 infection was narrowed on 12 March, to focus on people needing hospital care, so from then on only a tiny proportion of infections were recorded as cases. This would account for the lack of growth in cases during the first week plotted. Since hospitalisation usually occurs several days after symptom onset, this change also increases the lag between infection and recording as a case. Accordingly, from mid-March on the 7-day trailing average new cases figure will reflect new infections that on average occurred approximately two weeks earlier.

The epidemiological model used

Like Gomes et al., I use a simple ‘SEIR’ epidemiological model, [9] in which the population is divided into four compartments: Susceptible (uninfected), Exposed (latent: infected but not yet infectious), Infectious (typically when diseased), and Recovered (and thus immune and harmless). This is shown in Figure 2. In reality, the Recovered compartment includes people who instead die, which has the same effect on the model dynamics. The entire population starts in the Susceptible compartment, save for a tiny proportion that are transferred to the Infectious compartment to seed the epidemic. The seed infectious individuals infect Susceptible individuals, who move to the Exposed compartment. Exposed individuals gradually transfer to the Infectious compartment, on average remaining as Exposed for the chosen latent period. Infectious individuals in turn gradually transfer to the Recovered compartment, on average remaining as Infectious for the selected infectious period.

Figure 2. SEIR compartment epidemiological model diagram.

In the case of COVID-19, the diseased (symptomatic) stage is typically reached about 5 days after infection, but an infected individual starts to become infectious about 2 days earlier. I therefore set the average latent period as 3 days. [10]

The infectious period depends mainly on the delay between infectiousness and symptoms appearing and on how quickly an individual reduces contacts with others once they become symptomatic, as well as on how infectious asymptomatic cases are. In an SEIR model, the infective period can be derived by subtracting the latent period from the generation time – the mean interval between the original infection of a person and the infections that they then cause.

The Ferguson20 model assumed a generation time of 6.5 days, slightly lower than a subsequent estimate of 7.5 days.[11] I use 7 days, which is consistent with growth rates near the start of COVID-19 outbreaks.[12] The infectious period is therefore 4 (=7 − 3) days.

I set R0=2.4, the same value Ferguson20 use. On average, while an individual is in the Infectious compartment, the number of Susceptible individuals they infect is R0 × {the proportion of the population that remains in the Susceptible compartment}.

With these settings, the progression of a COVID-19 epidemic projected by a standard SEIR model, in which all individuals have identical characteristics, is as shown in Figure 3. The HIT is reached once 58% of the population has been infected, and ultimately 88% of the population become infected.

Figure 3. Epidemic progression in an SEIR model with R0=2.4 and a homogeneous population. The time to reach the herd immunity threshold, which depends on the strength of the seeding at time zero, is arbitrary.

Modifying the basic SEIR model for variability in individual susceptibility and infectivity

The great bulk of COVID-19 transmission is thought to occur directly from symptomatic and pre-symptomatic infected individuals, with little transmission from asymptomatic cases or from the environment. [13] There is strong evidence that a small proportion of individuals account for most infections – the ‘superspreaders’.

A good measure of the dispersion of transmission – the extent to which infection happens through many spreaders or just a few – is the coefficient of variation (CV). [14] Two different estimates of this figure have been published for COVID-19. A Shenzhen-based study [15] estimated that 8.9% of cases were responsible for 80% of total infections, while a multi-country study [16] estimated that 10% were so responsible. In both cases a gamma probability distribution was assumed, as is standard for this purpose. The corresponding CV best estimates and 95% uncertainty ranges are 3.3 (3.0–5.6) and 3.1 (2.2–5.0). These figures are slightly higher than the 2.5 estimated for the 2003 epidemic of SARS. [17]

CV estimates indicate the probability of transmission of an infection. They reflect population inhomogeneity regarding individuals’ differing tendency to infect others, but it is unclear to what extent they also reflect susceptibility differences between individuals. However, since COVID-19 transmission is very largely person-to-person, much of the inhomogeneity in transmission rates will reflect how socially connected individuals are, and how close and prolonged their interactions with other individuals are. As these factors affect the probability of transmission both from and to an individual, as well as causing variation in an individual’s infectivity they should cause the same variation in their susceptibility to infection.

A common social connectivity related factor implies that an individual’s susceptibility and infectivity are positively correlated, and it is not unreasonable to assume a quite strong correlation. However, it seems unrealistic to assume, as Gomes et al. do in one case, that an individual’s infectivity is directly proportional to their personal susceptibility. (In the other case that they model, they assume that an individual’s infectivity is unrelated to their susceptibility.)

Some of the variability in the likelihood of someone infecting a susceptible individual during an interaction will undoubtedly be unrelated to social connectivity, for example the size of their viral load. Likewise, susceptibility will vary with the strength of an individual’s immune system as well as with their social connectivity. I use unit-median lognormal distributions to reflect such social-connectivity unrelated variability in infectivity and susceptibility. Their standard deviations determine the strength of the factor they represent. I model an individual’s overall infectivity as the product of their common social-connectivity related factor and their unrelated infectivity-specific factor, and calculate their overall susceptibility in a corresponding manner. [18]

I consider the cases of CV=1 and CV=2 for the common social connectivity factor that causes inhomogeneity in both susceptibility and infectivity. For unrelated lognormally-distributed inhomogeneity in susceptibility I take standard deviations of either 0.4 or 0.8, corresponding to a CV of 0.417 or 0.947 respectively. Where their gamma-distributed common factor inhomogeneity is set at 1, the resulting total inhomogeneity in susceptibility is respectively 1.17 or 1.65 when the lower or higher unrelated inhomogeneity standard deviations respectively are used; where set at 2 the resulting total inhomogeneity in susceptibility is respectively 2.17 or 2.98. The magnitude of variability in individuals’ social-connectivity unrelated infectivity-specific inhomogeneity factor does not affect the progression of an epidemic or the HIT, so for simplicity I ignore it here. [19]


Figure 4 shows the progression of a COVID-19 epidemic in the case of CV=1 for the common social connectivity factor inhomogeneity, with unrelated inhomogeneity in susceptibility having a standard deviation of 0.4. The HIT is 60% lower than for a homogeneous population, at 23.6% rather than 58.3% of the population. And 43% rather than 88% of the population ultimately becomes infected. If the standard deviation of unrelated inhomogeneity in susceptibility is increased to 0.8, the HIT becomes 18.9%, and 35% of the population are ultimately infected.

Figure 4. Epidemic progression in an SEIR model with R0=2.4 and a population with CV=1 common factor inhomogeneity in susceptibility and infectivity and also unrelated multiplicative inhomogeneity in susceptibility with a standard deviation of 0.4.

Figure 5 shows the progression of a COVID-19 epidemic in the case of CV=2 for the common social connectivity factor inhomogeneity, with unrelated inhomogeneity in susceptibility having a standard deviation of 0.8. The HIT is only 6.9% of the population, and only 14% of the population ultimately becoming infected. If the standard deviation of unrelated inhomogeneity in susceptibility is reduced to 0.4, those figures become respectively 8.6% and 17%.

Figure 5. Epidemic progression in an SEIR model with R0=2.4 and a population with CV=2 common factor inhomogeneity in susceptibility and infectivity and also unrelated multiplicative inhomogeneity in susceptibility with a standard deviation of 0.8.


Incorporating, in a reasonable manner, inhomogeneity in susceptibility and infectivity in a standard SEIR epidemiological model, rather than assuming a homogeneous population, causes a very major reduction in the herd immunity threshold, and also in the ultimate infection level if the epidemic thereafter follows an unconstrained path. Therefore, the number of fatalities involved in achieving herd immunity is much lower than it would otherwise be.

In my view, the true herd immunity threshold probably lies somewhere between the 7% and 24%, implied by the cases illustrated in Figures 4 and 5. If it were around 17%, which evidence from Stockholm County suggests the resulting fatalities from infections prior to the HIT being reached should be a very low proportion of the population. The Stockholm infection fatality rate appears to be approximately 0.4%, [20] considerably lower than per the Verity et al. [21] estimates used in Ferguson20, with a fatality rate of under 0.1% from infections until the HIT was reached. The fatality rate to reach the HIT in less densely populated areas should be lower, because R0 is positively related to population density. [22] Accordingly, total fatalities should be well under 0.1% of the population by the time herd immunity is achieved. Although there would be subsequent further fatalities, as the epidemic shrinks, it should be increasingly practicable to hasten its end by using testing and contact tracing to prevent infections spreading, and thus substantially reduce the number of further fatalities below those projected by the SEIR model in a totally unmitigated scenario.

Nicholas Lewis is an independent climate science researcher, with expertise in: Data Analysis, Probability, Statistical Inference, Climate Variability, Physics of Global Warming, Bayesian Inference, Radiative Forcing, and Global Climate Modeling Parameters

[1] Neil M Ferguson et al., Impact of non-pharmaceutical interventions (NPIs) to reduce COVID-19 mortality and healthcare demand. Imperial College COVID-19 Response Team Report 9, 16 March 2020, https://spiral.imperial.ac.uk:8443/handle/10044/1/77482

[2] A final infection rate of 81% implies, in the context of a simple compartmental model with a fixed, homogeneous population, that the ‘effective R0‘ is between 2.0 and 2.1, and that the HIT is slightly over 50%. Ferguson20 use a more complex model, so it is not surprising that the implied effective R0 differs slightly from the basic 2.4 value that Ferguson20 state they assume.

[3] Flaxman, S. et al., Estimating the number of infections and the impact of non-pharmaceutical interventions on COVID-19 in 11 European countries. Imperial College COVID-19 Response Team Report 13, 30 March 2020, https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-13-europe-npi-impact/

[4] Based on the Ferguson20 estimate of a mean generation time of 6.5 days, which appears to be in line with existing evidence, an R0 of 2.0 would result in a daily growth rate of 2.0^(1/6.5)= 11%. That is slightly lower than the peak growth rate in cases in late March in Stockholm County, and in early April in the two regions with the next highest number of cases, in both of which the epidemic took off slightly later than in Stockholm, and in line with the growth rate in Swedish COVID-19 deaths in early April

[5] https://www.folkhalsomyndigheten.se/contentassets/2da059f90b90458d8454a04955d1697f/skattning-peakdag-antal-infekterade-covid-19-utbrottet-stockholms-lan-februari-april-2020.pdf

[6] John Burn-Murdoch, Financial Times Research, 2 May 2020. http://web.archive.org/web/20200507075628/https:/twitter.com/jburnmurdoch/status/1256712090028576768

[7] Gomes, M. G. M., et al. Individual variation in susceptibility or exposure to SARS-CoV-2 lowers the herd immunity threshold. medRxiv 2 May 2020. https://www.medrxiv.org/content/10.1101 2020.04.27.20081893v1

[8] The 81% proportion of the population that Ferguson20 estimated would eventually become infected is only slightly lower than the 88% level implied by their R0 estimate of 2.4 in the case of a homogeneous population.

[9] https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology#The_SEIR_model [10] Gomes et al. instead set the latent period slightly longer, to 4 days and treated it as a partly infectious period, unlike in the standard SEIR model.

[11] Li Q, Guan X, Wu P, et al.: Early Transmission Dynamics in Wuhan, China, of Novel Coronavirus-Infected Pneumonia. N Engl J Med. 2020; 382(13):1199–1207.https://www.nejm.org/doi/10.1056/NEJMoa2001316

[12] Once a SEIR model has passed its start up phase, and while a negligible proportion susceptible individuals have been infected, the epidemic daily growth factor is R0^(1/generation time), or 1.10–1.13 for R0=2.0–2.4 if the generation time is 7 days.

[13] L. Ferretti et al., Science 10.1126/science.abb6936 (2020).

[14] The coefficient of variation is the ratio of the standard deviation to the mean of its probability distribution. It is usual to assume a gamma distribution for infectivity, the shape parameter of which equals 1/CV2.

[15] Bi, Qifang, et al. “Epidemiology and transmission of COVID-19 in 391 cases and 1286 of their close contacts in Shenzhen, China: a retrospective cohort study.” The Lancet Infectious Diseases 27 April 2020. https://doi.org/10.1016/S1473-3099(20)30287-5

[16] Endo, Akira, et al. “Estimating the overdispersion in COVID-19 transmission using outbreak sizes outside China.” Wellcome Open Research 5.67 (2020): 67. https://wellcomeopenresearch.org/articles/5-67

[17] Lloyd-Smith, J O et al. “Superspreading and the effect of individual variation on disease emergence.” Nature 438.7066 (2005): 355-359. https://www.nature.com/articles/nature04153

[18] For computational efficiency, I divide the population into 10,000 equal sized segments with their common social connectivity factor increasing according to its assumed probability distribution, and allocate each population segment values for unrelated variability in susceptibility and infectivity randomly, according to their respective probability distributions.

[19] A highly susceptible but averagely infectious person is more likely to be removed from the susceptible pool early in an epidemic, reducing the average susceptibility of the pool. However, no such selective removal occurs for a highly infectious person of averagely susceptibility. Therefore, as Gomes et al. point out, variability in susceptibility lowers the HIT, but variability in infectivity does not do so except to the extent that it is correlated with variability in susceptibility.

[20] On 8 May 2020 reported total COVID-19 deaths in Stockholm County were 1,660, which is 0.40% of the estimated 413,000 of its population who had been infected by 11 April 2020. COVID-19 deaths reported for Stockholm County after 8 May that relate to infections by 11 April 2020 are likely to be approximately balanced by deaths reported by 8 May 2020 that related to post 11 April 2020 infections.

[21] Verity R, Okell LC, Dorigatti I, et al. Estimates of the severity of COVID-19 disease. medRxiv 13 March 2020; https://www.medrxiv.org/content/10.1101/2020.03.09.20033357v1.

[22] Similarly, the HIT may be significantly higher in areas that are very densely populated, have much less inhomogenous populations and/or are repeatedly reseeded from other areas. That would account for the high prevalence of COVID-19 infection that has been found in, for instance, some prisons and residential institutions or in city districts.

Originally posted here, where a pdf copy is also available

This intriguing article, just out, is by a mathematician who is using the SEIR (susceptible-exposed-infectious-recovered) epidemiological model — to explore the likelihood of herd immunity in “non-homogeneous” vs “homogeneous” populations. Although he mentions genetics briefly, what needs to be added to this equation are: [a] genetic variability in susceptibility to explosure, and [b] genetic variability in response to the SARS-CoV-2 virus (clinical presentation) in each individual patient. These two factors will create “noise” in this otherwise fascinating theoretical analysis. ☹

Here are three responses to the GEITP article shared yesterday. The article [attached] sent by Dr. Kopras underscores the multifactorial nature of COVID-19 susceptibility and clinical response. As these GEITP pages keep emphasizing, any multifactorial trait represents: [a] genetics (DNA sequence variants between individuals); [b] epigenetic effects (DNA methylation, RNA-interference, histone modifications, chromatin remodeling); [c] environmental factors (lifestyle, diet, exercise, smoking history); [d] endogenous influences (age, gender, cardiopulmonary function, renal status); and [e] each person’s microbiome at that moment in time (commensal bacteria present in gut, skin and elsewhere).

For “susceptibility to infection,” the [d] endogenous influences category becomes a very important (heavily-weighted) variable. ☹


Thank you, Magnus. The time-line for number of COVID-19-related deaths [see figure below] is also very intriguing — in a country that never imposed any restrictions — no face masks, children stayed in school, and the economy was not affected:

Did Sweden’s coronavirus strategy succeed or fail? – BBC News


From: Magnus
Sent: Wednesday, July 29, 2020 2:52 AM

Thank you for that update on the likelihood of herd immunity. It has been amazing how the infection rate has decreased in Sweden. Now we can see it also on the level of new confirmed COVID-19 cases:

The reason behind this is really not understood. The Swedish people have not changed their habits at all. One can imagine, as mentioned, that cell immunity could contribute; in Stockholm it is estimated that 20% have T Cell immunity and another 20% carry antibodies. Therefore, perhaps 40% immune people in our population is sufficient to provide herd immunity.

How stable is the epigenome of the virus? One could imagine changes (in the viral epigenome), relative to the influence of the virus environment in the Swedish population.
Best, Magnus

COMMENT: Monday, July 27, 2020 6:52 PM

For those interested in the herd immunity threshold to COVID-19, here are the latest update, readjustments, and corrections by Nicholas Lewis — which further supports his earlier paper.

From: John
Sent: Wednesday, July 29, 2020 11:35 AM

Dan, … Is it possible such an approach works — if a nation implements an effective contact tracing/surveillance program early in the course of the epidemic, provides adequate testing with timely results, and gives clear health-protective messaging to the population — as opposed to allowing regions and subpopulations to undergo rapid community spread? Perhaps universally accessible healthcare is also beneficial for reducing fatalities, especially among minority and disadvantaged populations?

Additionally, Sweden did implement voluntary and binding measures to control spread and protect vulnerable populations — so, it’s not like they did nothing at all. See the Swedish strategy, “Track, trace, protection, and control measures…” with “shared responsibility” and “social distancing,” which includes legally binding measures such as a ban on visits to nursing homes.


Dr. Nebert, … Actually, Sweden has imposed several restrictions. They banned public gatherings of >50 people — back in March. There are travel bans. However, the Swedish government had more guidelines than restrictions. Which, in my opinion, is where the U.S. government went wrong.


If you compare Sweden to other countries such as Norway, there were far more “deaths per 10,000” in Sweden: https://covid19.healthdata.org/sweden
Folks — we are in the middle, if not the beginning, of data collection. Until we have better data on how and where the virus has spread, unfortunately, science will not be able to answer this question at this time. We need to WAIT and learn, once we have reliable scientific and peer-viewed solid data.

Most estimates suggest that presently we have less than 10% COVID-19 infections. That would be 35+ million Americans. Even if 5 % were enough to induce herd immunity, that would equal 17.5 million confirmed COVID-19 cases. Take care of your families and yourself.

Hi Dan: in my humble opinion, Sweden achieved the right balance between isolating the vulnerable, preserving some personal freedoms, and preventing economic collapse. Norway and Finland may have lower death numbers right now, while they impose more severe isolation measures , but I bet their numbers will be as high, as time goes on; they have just delayed the inevitable cases and deaths — similar to Florida, Texas, California, Arizona, etc as they resume living their lives with fewer restrictions.

Shut downs do not ELIMINATE the virus; they simply just DELAY THE INEVITABLE. The virus is ALREADY out there, in virtually every corner of the world. The U.S. shutdown in March for a few weeks was necessary — in order to flatten the curve so as not to overwhelm New York and New Jersey area hospitals and healthcare providers. That “experiment” also helped us figure out that infected old people should not be sent back into nursing homes (as Cuomo did). I bet that reports of outbreaks across the country still GREATLY underestimate the number of people who have been infected, but have not been tested because they are asymptomatic or minimally symptomatic.

Only when widespread extremely sensitive and reliable antibody-testing is available — will we have a better idea of how many people have actually been infected. And many more have become immune and are antibody negative. Neutralizing antibodies may be useful in testing vaccine response, but (as you know and we have discussed), T cell response (and other immune parameters) cannot be practically measured by tests available today. You don’t have to have demonstrable antibodies to be immune.

Influenza deaths in 2017-2018 were >80,000, and those numbers were not inflated for political reasons (as they are now for COVID-19). As you know , anyone who dies and HAS the SARS-CoV-2 virus — is now listed as “a COVID-19 death,” even if they have end-stage renal disease, terminal cancer, myocardial infarctions, or a stroke UNrelated to COVID. If COVID 19 is suspected but even with a negative PCR — the virus can be listed as the cause of death. For instance, just because COVID-19 may cause hypercoagulability in some individuals, didn’t old people with diabetics snd hypertension have myocardial infarctions strokes before COVID-19 appeared? Nope , now everything is “COVID-19-related.”

Death of an elderly, frail, or chronically ill individual is always tragic. However, never before in the history of our country have we shut down our businesses, schools, churches — and committed collective suicide as a nation, while erroneously thinking we will eliminate a virus which is asymptomatic or minimally symptomatic in the overwhelming majority of individuals. Herd immunity (natural; or by vaccine, if and when it becomes available) is the best way to protect the most vulnerable in our population.

Because people cannot be isolated forever, and with 100% certainty of not being infections, their lives are being ruined by this enforced social isolation. This politically-motivated hysteria of “daily tallies of (mostly minimally symptomatic) cases” on our news each day is meant to make us think that “we are all going to die, if we don’t rigidly follow these arbitrary draconian edicts issued by mayors and governors who have no knowledge of virology or science. And we, as sheep, will just follow. This is pathetic, and it is ruining so many lives — instead of “saving us.”

This entry was posted in Center for Environmental Genetics. Bookmark the permalink.