As reported in my previous post, there has been a gradual reduction in the rate of decline of cases and deaths in the UK relative to my model forecasts. This decline had already been noted, as I reported in my July 6th blog article, by The Office for National Statistics and their research partners, the University of Oxford, and reported on the ONS page here.
I had adjusted the original lockdown effectiveness in my model (from 23rd March) to reflect this emerging change, but as the model had been predicting correct behaviour up until mid-late May, I will present here the original model forecasts, compared with the current reported deaths trend, which highlights the changes we have experienced for the last couple of months.
Forecast comparisons
The ONS chart which highlighted this slowing down of the decline, and even a slight increase, is here:
The Worldometers forecast for the UK has been refined recently, to take account of changes in mandated lockdown measures, such as possible mask wearing, and presents several forecasts on the same chart depending on what take-up would be going forward.
We see that, at worst, the Worldometers forecast could be for up to 60,000 deaths by November 1st, although, according to their modelling, if masks are “universal” then this is reduced to under 50,000.
Comparison of my forecast with reported data
My two charts that reveal most about the movement in the rate of decline of the UK death rate are here…
On the left, the red trend line for reported daily deaths shows they are not falling as fast as they were in about mid-May, when I was forecasting a long term plateau for deaths at about 44,400, assuming that lockdown effectiveness would remain at 83.5%, i.e. that the virus transmission rate was reduced to 16.5% of what it would be if there were no reductions in social distancing, self isolation or any of the other measures the UK had been taking.
The right hand chart shows the divergence between the reported deaths (in orange) and my forecast (in blue), beginning around mid to late May, up to the end of July.
The forecast, made back in March/April, was tracking the reported situation quite well (if very slightly pessimistically), but around mid-late May we see the divergence begin, and now as I write this, the number of deaths cumulatively is about 2000 more than I was forecasting back in April.
Lockdown relaxations
This period of reduction in the rate of decline of cases, and subsequently deaths, roughly coincided with the start of the UK Govenment’s relaxation of some lockdown measures; we can see the relaxation schedule in detail at the Institute for Government website.
As examples of the successive stages of lockdown relaxation, in Step 1, on May 13th, restrictions were relaxed on outdoor sport facilities, including tennis and basketball courts, golf courses and bowling greens.
In Step 2, from June 1st, outdoor markets and car showrooms opened, and people could leave the house for any reason. They were not permitted to stay overnight away from their primary residence without a ‘reasonable excuse’.
In Step 3, from 4th July, two households could meet indoors or outdoors and stay overnight away from their home, but had to maintain social distancing unless they are part of the same support bubble. By law, gatherings of up to 30 people were permitted indoors and outdoors.
These steps and other detailed measures continued (with some timing variations and detailed changes in the devolved UK administrations), and I would guess that they were anticipated and accompanied by a degree of informal public relaxation, as we saw from crowded beaches and other examples reported in the press.
Two issues remained, however, while bringing the current figures for July more into line.
One was that, as I only have one place in the model that I change the lockdown effectiveness, I had to change it from March 23rd (UK lockdown date), and that made the intervening period for the forecast diverge until it converged again recently and currently.
That can be seen in the right hand chart below, where the blue model curve is well above the orange reported data curve from early May until mid-July.
The long-term plateau in deaths for this model forecast is 46,400; this is somewhat lower than the model would show if I were to reduce the % lockdown effectiveness further, to reflect what is currently happening; but in order to achieve that, the history during May and June would show an even larger gap.
My forecast for the UK deaths as at July 30th, including trend line for daily reported deaths, for 83% lockdown effectiveness
The second issue is that the rate of increase in reported deaths, as we can also see (the orange curve) on the right-hand chart, at July 30th, is clearly greater than the model’s rate (the blue curve), and so I foresee that reported numbers will begin to overshoot the model again.
In the chart on the left, we see the same red trend line for the daily reported deaths, flattening to become nearly horizontal at today’s date, July 31st, reflecting that the daily reported deaths (the orange dots) are becoming more clustered above the grey line of dots, representing modelled daily deaths.
As far as the model is concerned, all this will need to be dealt with by changing the lockdown effectiveness to a time-dependent variable in the model differential equations representing the behaviour of the virus, and the population’s response to it.
This would allow changes in public behaviour, and in public policy, to be reflected by a changed lockdown effectiveness % from time to time, rather than having retrospectively to apply the same (reduced) effectiveness % since the start of lockdown.
Then the forecast could reflect current reporting, while also maintaining the close fit between March 23rd and when mitigation interventions began to ease.
Lockdown, intervention effectiveness and herd immunity
In the interest of balance, in case it might be thought that I am a fan of lockdown(!), I should say that higher % intervention effectiveness does not necessarily lead to a better longer term outlook. It is a more nuanced matter than that.
PC=school and university closure, CI=home isolation of cases, HQ=household quarantine, SD=large-scale general population social distancing, SDOL70=social distancing of those over 70 years for 4 months (a month more than other interventions)
which provoked me to re-confirm with the authors (and as covered in the paper) the reasons for the triple combination of CI_HQ_SD being worse than either of the double combinations of measures CI_HQ or CI_SD in terms of peak ICU bed demand.
The answer (my summary) was that lockdown can be too effective, given that it is a temporary state of affairs. When lockdown is partially eased or removed, the population can be left with less herd immunity (given that there is any herd immunity to be conferred by SARS-Cov-2 for any reasonable length of time, if at all) if the intervention effectiveness is too high.
Thus a lower level of lockdown effectiveness, below 100%, can be more effective in the long term.
I’m not seeking to speak to the ethics of sustaining more infections (and presumably deaths) in the short term in the interest of longer term benefits. Here, I am simply looking at the outputs from any postulated inputs to the modelled epidemic process.
I was as surprised as anyone when, in a UK Government briefing, in early March, before the UK lockdown on March 23rd, the Chief Scientific Adviser (CSA, Sir Patrick Vallance), supported by the Chief Medical Officer (CMO, Prof. Chris Whitty) talked about “herd immunity” for the first time, at 60% levels (stating that 80% needing to be infected to achieve it was “loose talk”). I mentioned this in my May 29th blog post.
The UK Government focus later in March (following the March 16th Imperial College paper) quickly turned to mitigating the effect of Covid-19 infections, as this chart sourced from that paper indicates, prior to the UK lockdown on March 23rd.
Projected effectiveness of Covid-19 mitigation strategies, in relation to the utilisation of critical care (ICU) beds
This is the imagery behind the “flattening the curve” phrase used to describe this phase of the UK (and others’) strategy.
Finally, that Imperial College March 16th paper presents this chart for a potentially cyclical outcome, until a Covid-19 vaccine or a significantly effective pharmaceutical treatment therapy arrives.
The potentially cyclical caseload from Covid-19, with interventions and relaxations applied as ICU bed demand changes
In this new phase of living with Covid-19, this is why I want to upgrade my model to allow periodic intervention effectiveness changes.
Conclusions
The sources I have referenced above support the conclusion in my model that there has been a reduction in the rate of decline of deaths (preceded by a reduction in the rate of decline in cases).
To make my model relevant to the new situation going forward, when lockdowns change, not only in scope and degree, but also in their targeting of localities or regions where there is perceived growth in infection rates, I will need to upgrade my model for variable lockdown effectiveness.
I wouldn’t say that the reduction of the rate of decline of cases and deaths is evidence of a “second wave”, but is rather the response of a very infective agent, which is still with us, to infect more people who are increasingly “available” to it, owing to easing of some of the lockdown measures we have been using (both informally by the public and formally by Government).
To me, it is evidence that until we have a vaccine, we will have to live with this virus among us, and take reasonable precautions within whatever envelope of freedoms the Government allow us.
Model daily ratio of cumulative deaths, and the H(t) function, the natural log of that ratio
Introduction
In my most recent post, I summarised the various methods of Coronavirus modelling, ranging from phenomenological “curve-fitting” and statistical methods, to the SIR-type models which are developed from differential equations representing postulated incubation, infectivity, transmissibility, duration and immunity characteristics of the SARS-Cov-2 virus pandemic.
The phenomenological methods don’t delve into those postulated causations and transitions of people between Susceptible, Infected, Recovered and any other “compartments” of people for which a mechanistic model simulates the mechanisms of transfers (hence “mechanistic”).
Types of mechanistic SIR models
Some SIR-type mechanistic models can include temporary immunity (or no immunity) (SIRS) models, where the recovered person may return to the susceptible compartment after a period (or no period) of immunity.
SEIRS models allow for an Exposed compartment, for people who have been exposed to the virus, but whose infection is latent for a period, and so who are not infective yet. I discussed some options in my late March post on modelling work reported by the BBC.
My model, based on Alex de Visscher’s code, with my adaptations for the UK, has seven compartments – Uninfected, Infected, Sick, Seriously Sick, Better, Recovered and Deceased. There are many variations on this kind of model, which is described in my April 14th post on modelling progress.
Phenomenological curve-fitting
I have been focusing, in my review of modelling methods, on Prof. Michael Levitt’s curve-fitting approach, which seems to be a well-known example of such modelling, as reported in his recent paper. His small team have documented Covid-19 case and death statistics from many countries worldwide, and use a similar curve-fitting approach to fit current data, and then to forecast how the epidemics might progress, in all of those countries.
Because of the scale of such work, a time-efficient predictive curve-fitting algorithm is attractive, and they have found that a Gompertz function, with appropriately set parameters (three of them) can not only fit the published data in many cases, but also, via a mathematically derived display method for the curves, postulate a straight line predictor (on such “log” charts), facilitating rapid and accurate fitting and forecasting.
Such an approach makes no attempt to explain the way the virus works (not many models do) or to calibrate the rates of transition between the various compartments, which is attempted by the SIR-type models (although requiring tuning of the differential equation parameters for infection rates etc).
In response to the forecasts from these models, then, we see many questions being asked about why the infection rates, death rates and other measured statistics are as they are, differing quite widely from country to country.
There is so much unknown about how SARS-Cov-2 infects humans, and how Covid-19 infections progress; such data models inform the debate, and in calibrating the trajectory of the epidemic data, contribute to planning and policy as part of a family of forecasts.
The problem with data
I am going to make no attempt in this paper, or in my work generally, to model more widely than the UK.
What I have learned from my work so far, in the UK, is that published numbers for cases (particularly) and even, to some extent, for deaths can be unreliable (at worst), untimely and incomplete (often) and are also adjusted historically from time to time as duplication, omission and errors have come to light.
Every week, in the UK, there is a drop in numbers at weekends, recovered by increases in reported numbers on weekdays to catch up. In the UK, the four home countries (and even regions within them) collate and report data in different ways; as recently as July 17th, the Northern Ireland government have said that the won’t be reporting numbers at weekends.
Across the world, I would say it is impossible to compare statistics on a like-for-like basis with any confidence, especially given the differing cultural, demographic and geographical aspects; government policies, health service capabilities and capacities; and other characteristics across countries.
The extent of the (un)reliability in the reported numbers across nations worldwide (just like the variations in the four home UK countries, and in the regions), means that trying to forecast at a high level for all countries is very difficult. We also read of significant variations in the 50 states of the USA in such matters.
Hence my reluctance to be drawn into anything wider than monitoring and trying to predict UK numbers.
Curve fitting my UK model forecast
I thought it would be useful, at least for my understanding, to apply a phenomenological curve fitting approach to some of the UK reported data, and also to my SIR-style model forecast, based on that data.
I find the UK case numbers VERY inadequate for that purpose. There is a fair expectation that we are only seeing a minority fraction (as low as 8% in the early stages, in Italy for example) of the actual infections (cases) in the UK (and elsewhere).
The very definition of what comprises a case is somewhat variable; in the UK we talk about confirmed cases (by test), but the vast majority of people are never tested (owing to a lack of symptoms, and/or not being in hospital) although millions (9 million to date in the UK) of tests have either been done or requested (but not necessarily returned in all cases).
Reported numbers of tests might involve duplication since some people are (rightly) tested multiple times to monitor their condition. It must be almost impossible to make such interpretations consistently across large numbers of countries.
Even the officially reported UK deaths data is undeniably incomplete, since the “all settings” figures the UK Government reports (and at the outset even this had only been hospital deaths, with care homes added (and then retrospectively edited in later on) are not the “excess” deaths that the UK Office for National Statistics (ONS) also track, and that many commentators follow. For consistency I have continued to use the Government reported numbers, their having been updated historically on the same basis.
Rather than using case numbers, then, I will simply make the curve-fitting vs. mechanistic modelling comparison on both the UK reported deaths and the forecasted deaths in my model, which has tracked the reporting fairly well, with some recent adjustments (made necessary by the process of gradual and partial lockdown relaxation during June, I believe).
I had reduced the lockdown intervention effectiveness in my model by 0.5% at the end of June from 83.5% to 83%, because during the relaxations (both informal and formal) since the end of May, my modelled deaths had begun to lag the reported deaths during the month of June.
This isn’t surprising, and is an indicator to me, at least, that lockdown relaxation has somewhat reduced the rate of decline in cases, and subsequently deaths, in the UK.
My current forecast data
Firstly, I present my usual two charts summarising my model’s fit to reported UK data up to and including 16th July.
UK deaths, reported vs. model, 83%, cumulative, to 16th July 2020
UK deaths, reported vs. model, 83%, cumulative and daily, to 16th July 2020
UK charts showing the fit of my model to deaths data to date
On the left we see the the typical form of the S-curve that epidemic cumulative data takes, and on the right, the scatter (the orange dots) in the reported daily data, mainly owing to regular incompleteness in weekend reporting, recovered during the following week, every week. I emphasise that the blue and grey curves are my model forecast, with appropriate parameters set for its differential equations (e.g. the 83% intervention effectiveness starting on March 23rd), and are not best fit analytical curves retro-applied to the data.
Next see my model forecast, further out to September 30th, by when forecast daily deaths have dropped to less than one per day, which I will also use to compare with the curve fitting approach. The cumulative deaths plateau, long term, is for 46,421 deaths in this forecast.
UK deaths, reported vs. model, 83%, cumulative and daily, to 30th September
The curve-fitting Gompertz function
I have simplified the calculation of the Gompertz function, since I merely want to illustrate its relationship to my UK forecast – not to use it in anger as my main process, or to develop multiple variations for different countries. Firstly my own basic charts of reported and modelled deaths.
Cumulative, daily & 7-day avge reported UK deaths, 83%, 6th March – 16th July
Cumulative, daily & 7-day avge model UK deaths, 83%, 6th March – 16th July
The same data as above, but in a slightly different graphical format
On the left we see the reported data, with the weekly variations I mentioned before (hence the 7-day average to make the trend clearer) and on the right, the modelled version, showing how close the fit is, up to 16th July.
On any given day, the 7-day average lags the barchart numbers when the numbers are growing, and exceeds the numbers when they are declining, as it is taking 7 numbers prior to and up to the reporting day, and averaging them. You can see this more clearly on the right for the smoother modelled numbers (where the averaging isn’t really necessary, of course).
It’s also worth mentioning that the Gompertz function fitting allows its analytical statistical function curve to fit the observed varying growth rate of this SARS-Cov-2 pandemic, with its asymmetry of a slower decline than the steeper ramp-up (sub-exponential though it is) as seen in the charts above.
I now add, to the reported data chart, a graphical version including a derivation of the Gompertz function (the green line) for which I show its straight line trend (the red line). The jagged appearance of the green Gompertz curve on the right is caused by the weekend variation in the reported data, mentioned before.
Reported & daily ratio of cumulative deaths, c(t)/c(t-1), and H(t) = Ln(c(t)/c(t-1))
Adding G(t), the negative natural log of H(t), in green, with its trend line in red
Adding a Gompertz version of the reported data to see the data trend it indicates
Those working in the field would use smoothed reported data to reduce this unnecessary clutter, but this adds a layer of complexity to the process, requiring its own justifications, whose detail (and different smoothing options) are out of proportion with this summary.
But for my model forecast, we will see a smoother rendition of the data going into this process. See Michael Levitt’s paper for a discussion of the smoothing options his team uses for data from the many countries the scope of his work includes.
Of course, there are no reported numbers beyond today’s date (16th July) so my next charts, again with the Gompertz equation lines added (in green), compare the fit of the Gompertz version of my model forecast up to July 16th (on the right) with the reported data version (on the left) from above – part of the comparison purpose of this exercise.
Adding the negative natural log of H(t) for reported data with its trend line in red
Modelled daily ratio of cumulative deaths, c(t)/c(t-1), and H(t)=Ln(c(t)/c(t-1), and G(t)=-Ln(H(t)), to July 16th
Comparing Gompertz versions of the reported data with the modelled data version
The next charts, with the Gompertz equation lines added (in green), compare the fit of my model forecast only (i.e. not the reported data) up to July 16th on the left, with the forecast out to September 30th on the right.
Modelled daily ratio of cumulative deaths, c(t)/c(t-1) and H(t)=Ln(c(t)/c(t-1)), and G(t)=-Ln(H(t)), to 16th July
Modelled daily ratio of cumulative deaths, c(t)/c(t-1), and H(t)=Ln(c(t)/c(t-1), and G(t)=-Ln(H(t)), to 30th Sept.
Modelled deaths data, with the related Gompertz function G(t) added to the charts
What is notable about the charts is the nearly straight line appearance of the Gompertz version of the data. The wiggles approaching late September on the right are caused by some gaps in the data, as some of the predicted model numbers for daily deaths are zero at that point; the ratios (c(t)/c(t-1)) and logarithmic calculation Ln(c(t)/c(t-1)) have some necessary gaps on some days (division by 0, and ln(0) being undefined).
Discussion
The Gompertz method potentially allows a straight line extrapolation of the reported data in this form, instead of developing SIR-style non-linear differential equations for every country. This means much less scientific and computer time to develop and process, so that Michael Levitt’s team can process many country datasets quickly, via the Gompertz functional representation of reported data, to create the required forecasts.
As stated before, this method doesn’t address the underlying mechanisms of the spread of the epidemic, but policy makers might sometimes simply need the “what” of the outlook, and not the “how” and “why”. The assessment of the infectivity and other disease characteristics, and the related estimation of their representation by coefficients in the differential equations for mechanistic models, might not be reliably and quickly done for this novel virus in so many different countries.
When policy makers need to know the potential impact of their interventions and actions, then mechanistic models can and do help with those dependencies, under appropriate assumptions.
As mentioned in my recent post on modelling methods, such mechanistic models might use mobility and demographic data to predict contact rates, and will, at some level of detail, model interventions such as social distancing, hygiene improvements and the use of masks, as well as self-isolation (or quarantine) for suspected cases, and for people in high risk groups (called shielding in the UK) such as the elderly or those with underlying health conditions.
Michael Levitt’s (and other) phenomenological methods don’t do this, since they are fitting chosen analytical functions to the (cleaned and smoothed) cases or deaths data, looking for patterns in the “output” data for the epidemic in a country, rather than for the causations for, and implications of the “input” data.
In Michael’s case, an important variable that is used is the ratio of successive days’ cases data, which means that the impact of national idiosyncrasies in data collection are minimised, since the same method is in use on successive days for the given country.
In reality, the parameters that define the shape (growth rate, inflection point and decline rate) of the specific Gompertz function used would also have to be estimated or calculated, with some advance idea of the plateau figure (what is called the “carrying capacity” of the related Generalised Logistics Functions (GLFs) of which the Gompertz functions comprise a subset).
I have taken some liberties here with the process, since my aim was simply to illustrate the technique using a forecast I already have.
Closing remarks
I have some corrective and clarification work to do on this methodology, but my intention has merely been to compare and contrast two methods of producing Covid-19 forecasts – phenomenological curve-fitting vs. SIR modelling.
These is much that the professionals in this field have yet to do. Many countries are struggling to move from blanket lockdown, through to a more targeted approach, using modelling to calibrate the changing effect of the various sub-measures in the lockdown package. I covered some of those differential effects of intervention options in my post on June 28th, including the consideration of any resulting “herd immunity” as a future impact of the relative efficacy of current intervention methods.
From a planning and policy perspective, Governments have to consider the collateral health impact of such interventions, which is why the excess deaths outlook is important, taking into account the indirect effect of both Covid-19 infections, and also the cumulative health impacts of the methods (such as quarantining and social distancing) used to contain the virus.
One of these negative impacts is on the take-up of diagnosis and treatment of other serious conditions which might well cause many further excess deaths next year, to which I referred in my modelling update post of July 6th, referencing a report by Health Data Research UK, quoting Data-Can.org.uk about the resulting cancer care issues in the UK.
Politicians also have to cope with the economic impact, which also feeds back into the nation’s health.
Hence the narrow numbers modelling I have been doing is only a partial perspective on a very much bigger set of problems.
Chart by Michael Levitt illustrating his Gompertz function curve fitting methodology
Introduction
I have been wondering for a while how to characterise the difference in approaches to Coronavirus modelling of cases and deaths, between “curve-fitting” equations and the SIR differential equations approach I have been using (originally developed in Alex de Visscher’s paper this year, which included code and data for other countries such as Italy and Iran) which I have adapted for the UK.
Part of my uncertainty has its roots in being a very much lapsed mathematician, and part is because although I have used modelling tools before, and worked in some difficult area of mathematical physics, such as General Relativity and Cosmology, epidemiology is a new application area for me, with a wealth of practitioners and research history behind it.
Curve-fitting charts such as the Sigmoid and Gompertz curves, all members of a family of curves known as logistics or Richards functions, to the Coronavirus cases or deaths numbers as practised, notably, by Prof. Michael Levitt and his Stanford University team has had success in predicting the situation in China, and is being applied in other localities too.
The SIR model approach, setting up an series of related differential equations (something I am more used to in other settings) that describe postulated mechanisms and rates of virus transmission in the human population (hence called “mechanistic” modelling), looks beneath the surface presentation of the epidemic cases and deaths numbers and time series charts, to model the growth (or otherwise) of the epidemic based on postulated characteristics of viral transmission and behaviour.
Research literature
In researching the literature, I have become familiar with some names that crop up or frequently in this area over the years.
Focusing on some familiar and frequently recurring names, rather than more recent practitioners, might lead me to fall into “The Trouble with Physics” trap (the tendency, highlighted by Lee Smolin in his book of that name, exhibited by some University professors to recruit research staff (“in their own image”) who are working in the mainstream, rather than outliers whose work might be seen as off-the-wall, and less worthy in some sense.)
In this regard, Michael Levitt‘s new work in the curve-fitting approach to the Coronavirus problem might be seen by others who have been working in the field for a long time as on the periphery (despite his Nobel Prize in Computational Biology and Stanford University position as Professor of Structural Biology).
His results (broadly forecasting, very early on, using his curve-fitting methods (he has used Sigmoid curves before, prior to the current Gompertz curves), a much lower incidence of the virus going forward, successfully so in the case of China) are in direct contrast to that of some some teams working as advisers to Governments, who have, in some cases, all around the world, applied fairly severe lockdowns for a period of several months in most cases.
In particular the work of the Imperial College Covid response team, and also the London School of Hygiene and Tropical Medicine have been at the forefront of advice to the UK Government.
Some Governments have taken a different approach (Sweden stands out in Europe in this regard, for several reasons).
I am keen to understand the differences, or otherwise, in such approaches.
Twitter and publishing
Michael chooses to publish his work on Twitter (owing to a glitch (at least for a time) with his Stanford University laboratory‘s own publishing process. There are many useful links there to his work.
My own succession of blog posts (all more narrowly focused on the UK) have been automatically published to Twitter (a setting I use in WordPress) and also, more actively, shared by me on my FaceBook page.
But I stopped using Twitter routinely a long while ago (after 8000+ posts) because, in my view, it is a limited communication medium (despite its reach), not allowing much room for nuanced posts. It attracts extremism at worst, conspiracy theorists to some extent, and, as with a lot of published media, many people who choose on a “confirmation bias” approach to read only what they think they might agree with.
One has only to look at the thread of responses to Michael’s Twitter links to his forecasting results and opinions to see examples of all kinds of Twitter users: some genuinely academic and/or thoughtful; some criticising the lack of published forecasting methods, despite frequent posts, although they have now appeared as a preprint here; many advising to watch out (often in extreme terms) for “big brother” government when governments ask or require their populations to take precautions of various kinds; and others simply handclapping, because they think that the message is that this all might go away without much action on their part, some of them actively calling for resistance even to some of the most trivial precautionary requests.
Preamble
One of the recent papers I have found useful in marshalling my thoughts on methodologies is this 2016 one by Gustavo Chowell, and it finally led me to calibrate the differences in principle between the SIR differential equation approach I have been using (but a 7-compartment model, not just three) and the curve-fitting approach.
I had been thinking of analogies to illustrate the differences (which I will come to later), but this 2016 Chowell paper, in particular, encapsulated the technical differences for me, and I summarise that below. The Sergio Alonso paper also covers this ground.
Categorization of modelling approaches
Gerard Chowell’s 2016 paper summarises modelling approaches as follows.
Phenomenological models
A dictionary definition – “Phenomenology is the philosophical study of observed unusual people or events as they appear without any further study or explanation.”
Chowell states that phenomenological approaches for modelling disease spread are particularly suitable when significant uncertainty clouds the epidemiology of an infectious disease, including the potential contribution of multiple transmission pathways.
In these situations, phenomenological models provide a starting point for generating early estimates of the transmission potential and generating short-term forecasts of epidemic trajectory and predictions of the final epidemic size.
Such methods include curve fitting, as used by Michael Levitt, where an equation (represented by a curve on a time-incidence graph (say) for the virus outbreak), with sufficient degrees of freedom, is used to replicate the shape of the observed data with the chosen equation and its parameters. Sigmoid and Gompertz functions (types of “logistics” or Richards functions) have been used for such fitting – they produce the familiar “S”-shaped curves we see for epidemics. The starting growth rate, the intermediate phase (with its inflection point) and the slowing down of the epidemic, all represented by that S-curve, can be fitted with the equation’s parametric choices (usually three or four).
Chart by Michael Levitt illustrating his Gompertz function curve fitting methodology
A feature that some epidemic outbreaks share is that growth of the epidemic is not fully exponential, but is “sub-exponential” for a variety of reasons, and Chowell states that:
“Previous work has shown that sub-exponential growth dynamics was a common phenomenon across a range of pathogens, as illustrated by empirical data on the first 3-5 generations of epidemics of influenza, Ebola, foot-and-mouth disease, HIV/AIDS, plague, measles and smallpox.”
Choices of appropriate parameters for the fitting function can allow such sub-exponential behaviour to be reflected in the chosen function’s fit to the reported data, and it turns out that the Gompertz function is more suitable for this than the Sigmoid function, as Michael Levitt states in his recent paper.
Once a curve-fit to reported data to date is achieved, the curve can be used to make forecasts about future case numbers.
Mechanistic and statistical models
Chowell states that “several mechanisms have been put forward to explain the sub-exponential epidemic growth patterns evidenced from infectious disease outbreak data. These include spatially constrained contact structures shaped by the epidemiological characteristics of the disease (i.e., airborne vs. close contact transmission model), the rapid onset of population behavior changes, and the potential role of individual heterogeneity in susceptibility and infectivity.“
He goes on to say that “although attractive to provide a quantitative description of growth profiles, the generalized growth model (described earlier) is a phenomenological approach, and hence cannot be used to evaluate which of the proposed mechanisms might be responsible for the empirical patterns.
Explicit mechanisms can be incorporated into mathematical models for infectious disease transmission, however, and tested in a formal way. Identification and analysis of the impacts of these factors can lead ultimately to the development of more effective and targeted control strategies. Thus, although the phenomenological approaches above can tell us a lot about the nature of epidemic patterns early in an outbreak, when used in conjunction with well-posed mechanistic models, researchers can learn not only what the patterns are, but why they might be occurring.“
On the Imperial College team’s planning website, they state that their forecasting models (they have several for different purposes, for just these reasons I guess) fall variously into the “Mechanistic” and “Statistical” categories, as follows.
Imperial College models use a combination of mechanistic and statistical approaches.
“Mechanistic model: Explicitly accounts for the underlying mechanisms of diseases transmission and attempt to identify the drivers of transmissibility. Rely on more assumptions about the disease dynamics.
“Statistical model: Do not explicitly model the mechanism of transmission. Infer trends in either transmissibility or deaths from patterns in the data. Rely on fewer assumptions about the disease dynamics.
“Mechanistic models can provide nuanced insights into severity and transmission but require specification of parameters – all of which have underlying uncertainty. Statistical models typically have fewer parameters. Uncertainty is therefore easier to propagate in these models. However, they cannot then inform questions about underlying mechanisms of spread and severity.“
So Imperial College’s “statistical” description matches more to Chowell’s description of a phenomenological approach, although may not involve curve-fitting per se.
The SIR modelling framework, employing differential equations to represent postulated relationships and transitions between Susceptible, Infected and Recovered parts of the population (at its most simple) falls into this Mechanistic model category.
Chowell makes the following useful remarks about SIR style models.
“The SIR model and derivatives is the framework of choice to capture population-level processes. The basic SIR model, like many other epidemiological models, begins with an assumption that individuals form a single large population and that they all mix randomly with one another. This assumption leads to early exponential growth dynamics in the absence of control interventions and susceptible depletion and greatly simplifies mathematical analysis (note, though, that other assumptions and models can also result in exponential growth).
The SIR model is often not a realistic representation of the human behavior driving an epidemic, however. Even in very large populations, individuals do not mix randomly with one another—they have more interactions with family members, friends, and coworkers than with people they do not know.
This issue becomes especially important when considering the spread of infectious diseases across a geographic space, because geographic separation inherently results in nonrandom interactions, with more frequent contact between individuals who are located near each other than between those who are further apart.
It is important to realize, however, that there are many other dimensions besides geographic space that lead to nonrandom interactions among individuals. For example, populations can be structured into age, ethnic, religious, kin, or risk groups. These dimensions are, however, aspects of some sort of space (e.g., behavioral, demographic, or social space), and they can almost always be modeled in similar fashion to geographic space“.
Here we begin to see the difference I was trying to identify between the curve-fitting approach and my forecasting method. At one level, one could argue that curve-fitting and SIR-type modelling amount to the same thing – choosing parameters that make the theorised data model fit the reported data.
But, whether it produces better or worse results, or with more work rather than less, SIR modelling seeks to understand and represent the underlying virus incubation period, infectivity, transmissibility, duration and related characteristics such as recovery and immunity (for how long, or not at all) – the why and how, not just the what.
The (nonlinear) differential equations are then solved numerically (rather than analytically with exact functions) and there does have to be some fitting to the initial known data for the outbreak (i.e. the history up to the point the forecast is being done) to calibrate the model with relevant infection rates, disease duration and recovery timescales (and death rates).
This makes it look similar in some ways to choosing appropriate parameters for any function (Sigmoid, Gompertz or General Logistics function (often three or four parameters)).
But the curve-fitting approach is reproducing an observed growth pattern (one might say top-down, or focused on outputs), whereas the SIR approach is setting virological and other behavioural parameters to seek to explain the way the epidemic behaves (bottom-up, or focused on inputs).
Metapopulation spatial models
Chowell makes reference to population-level models, formulations that are used for the vast majority of population based models that consider the spatial spread of human infectious diseases and that address important public health concerns rather than theoretical model behaviour. These are beyond my scope, but could potentially address concerns about indirect impacts of the Covid-19 pandemic.
a) Cross-coupled metapopulation models
These models, which have been used since the 1940s, do not model the process that brings individuals from different groups into contact with one another; rather, they incorporate a contact matrix that represents the strength or sum total of those contacts between groups only. This contact matrix is sometimes referred to as the WAIFW, or “who acquires infection from whom” matrix.
In the simplest cross-coupled models, the elements of this matrix represent both the influence of interactions between any two sub-populations and the risk of transmission as a consequence of those interactions; often, however, the transmission parameter is considered separately. An SIR style set of differential equations is used to model the nature, extent and rates of the interactions between sub-populations.
b) Mobility metapopulation models
These models incorporate into their structure a matrix to represent the interaction between different groups, but they are mechanistically oriented and do this by considering the actual process by which such interactions occur. Transmission of the pathogen occurs within sub-populations, but the composition of those sub-populations explicitly includes not only residents of the sub-population, but visitors from other groups.
One type of model uses a “gravity” approach for inter-population interactions, where contact rates are proportional to group size and inversely proportional to the distance between them.
Another type described by Chowell uses a “radiation” approach, which uses population data relating to home locations, and to job locations and characteristics, to theorise “travel to work” patterns, calculated using attractors that such job locations offer, influencing workers’ choices and resulting travel and contact patterns.
Transportation and mobile phone data can be used to populate such spatially oriented models. Again SIR-style differential equations are used to represent the assumptions in the model about between whom, and how the pandemic spreads.
Summary of model types
We see that there is a range of modelling methods, successively requiring more detailed data, but which seek increasingly to represent the mechanisms (hence “mechanistic” modelling) by which the virus might spread.
We can see the key difference between curve-fitting (what I called a surface level technique earlier) and the successively more complex models that seek to work from assumed underlying causations of infection spread.
An analogy (picking up on the word “surface” I have used here) might refer to explaining how waves in the sea behave. We are all aware that out at sea, wave behaviour is perceived more as a “swell”, somewhat long wavelength waves, sometimes of great height, compared with shorter, choppier wave behaviour closer to shore.
I’m not here talking about breaking waves – a whole separate theory is needed for those – René Thom‘s Catastrophe Theory – but continuous waves.
A curve fitting approach might well find a very good fit using trigonometric sine waves to represent the wavelength and height of the surface waves, even recognising that they can be encoded by depth of the ocean, but it would need an understanding of hydrodynamics, as described, for example, by Bernoulli’s Equation, to represent how and why the wavelength and wave height (and speed*) changes depending on the depth of the water (and some other characteristics).
(*PS remember that the water moves, pretty much, up and down, in an elliptical path for any fluid “particle”, not in the direction of travel of the observed (largely transverse) wave. The horizontal motion and speed of the wave is, in a sense, an illusion.)
Concluding comments
There is a range of modelling methods, successively requiring more detailed data, from phenomenological (statistical and curve-fitting) methods, to those which seek increasingly to represent the mechanisms (hence “mechanistic”) by which the virus might spread.
We see the difference between curve-fitting and the successively more complex models that build a model from assumed underlying interactions, and causations of infection spread between parts of the population.
I do intend to cover the mathematics of curve fitting, but wanted first to be sure that the context is clear, and how it relates to what I have done already.
Models requiring detailed data about travel patterns are beyond my scope, but it is as well to set into context what IS feasible.
Setting an understanding of curve-fitting into the context of my own modelling was a necessary first step. More will follow.
References
I have found several papers very helpful on comparing modelling methods, embracing the Gompertz (and other) curve-fitting approaches, including Michaels Levitt’s own recent June 30th one, which explains his methods quite clearly.
Cumulative charts to date, with 83% vs. 83.5% intervention effectiveness since lockdown
Introduction
In my previous post on June 28th, I covered the USA vs. Europe Coronavirus pandemic situations; herd immunity, and the effects of various interventions on it, particularly as envisioned by the Imperial College Covid-19 response team; and the current forecasts for cases and deaths in the UK.
I have now updated the forecasts, as it was apparent that during the month of June, there had been a slight increase in the forecast for UK deaths. Worldometers’ forecast had increased, and also the reported UK numbers were now edging above the forecast in my own model, which had been tracking well as a forecast (if very slightly pessimistically) until the beginning of June.
This might be owing both to informal public relaxation of lockdown behaviour, and also to formal UK Government relaxations in some intervention measures since the end of May.
Re-forecast
I have now reforecast my model with a slightly lower intervention effectiveness (83% instead of 83.5% since lockdown on 23rd March), and, while still slightly below reported numbers, it is nearly on track (although with the reporting inaccuracy each weekend, it’s not practical to try to track every change).
My long term outlook for deaths is now for 46,421 instead of 44,397, still below the Worldometers number (which has increased to 47,924 from 43,962).
Here are the comparative charts – first, the reported deaths (the orange curve) vs. modelled deaths (the blue curve), linear axes, as of July 6th.
UK deaths, reported vs. model, 83%, cumulative, to 30th September 2020
UK deaths, reported vs. model, 83.5%, cumulative, to 30th September 2020
Cumulative charts to date, with 83% vs. 83.5% intervention effectiveness since lockdown
Comparing this pair of charts, we see that the .5% reduction in lockdown intervention effectiveness (from March 23rd) brings the forecast, the blue curve on the left chart, above the reported orange curve. On the right, the forecast, which had been tracking the reported numbers for a month or more, had started to lag the reported numbers since the beginning of June.
I present below both cumulative and daily numbers of deaths, reported vs. forecast, with log y-axis. The scatter in the daily reported numbers (orange dots) is because of inconsistencies in reporting at weekends, recovered during each following week.
UK deaths, reported vs. model, 83%, cumulative/daily, to 30th September 2020
UK deaths, reported vs. model, 83.5%, cumulative/daily, to 30th September 2020
Cumulative and daily forecast charts, with 83% vs. 83.5% intervention effectiveness
In this second pair of charts, we can just see that the rate of decline in daily deaths, going forward, is slightly reduced in the 83% chart on the left, compared with the 83.5% on the right.
This means that the projected plateau in modelled deaths, as stated above, is at 46,421 instead of 44,397 in my modelled data from which these charts are drawn.
It also shows that the forecast reduction to single digit (<10) deaths per day is pushed out from 13th August to 20th August, and the forecast rate of fewer than one death per day is delayed from 21st September to 30th September.
ONS & PHE work on trends, and concluding comments
Since the beginning of lockdown relaxations, there has been sharpened scrutiny of the case and death numbers. This monitoring continues with the latest announcements by the UK Government, taking effect from early July (with any accompanying responses to follow from the three UK devolved administrations).
The Office for National Statistics has been monitoring cases and deaths rates, of course, and the flattening of the infections and deaths reductions has been reported in the press recently.
July 3rd Times reporting ONS regarding trends in Covid-19 incidence rates and deaths
As the article says, any movement would firstly be in the daily number of cases, with any potential change in the deaths rate following a couple of weeks after (owing to the Covid-19 disease duration).
Source data for the reported infection rate is on the following ONS chart (Figure 6 on their page), where the latest exploratory modelling, by ONS research partners at the University of Oxford, shows the incidence rate appears to have decreased between mid-May and early June, but has since levelled off.
Estimated numbers of new infections of the coronavirus (COVID-19), England, based on tests conducted daily since 11 May 2020
The death rate trend can be seen in the daily and 7-day average trend charts, with data from Public Health England.
Daily and 7-day average trends for deaths in the UK, The Times, July 3rd
The ONS is also tracking Excess deaths, and it seems that the Excess deaths in 2020 in England & Wales have reduced to below the five-year average for the second consecutive week.
The figures can be seen in the spreadsheet here, downloaded from the ONS page. The following chart appears there as Figure 1, also showing that the number of deaths involving Covid-19 decreased for the 10th consecutive week.
Number of deaths registered by week, England & Wales, Dec 2019 to 26th June 2020
There are warnings, however, also reported by The Times, that there may be increased mortality from other diseases (such as cancer) into 2021 because worries about the pandemic haves led to changes in patterns of use of the NHS, including GPs, with fewer people risking trips to hospital for diagnosis and/or treatment. The report referred to below from Data-can.org.uk highlights this
The Times report on July 6th concerning the impact on other diseases of the pandemic
I will make any adjustments to the rate of change as we go forward, but thankfully daily numbers are just reducing at the moment in the UK, and I hope that this continues.