Publications
Most recent author-created versions
Switch view
Is There a Cap on Longevity? A Statistical Review
Annual Review of Statistics and Its Application, Vol. 9, No. 1, pp. 21-45. | 2022
Léo R. Belzile, Anthony C. Davison, Jutta Gampe, Holger Rootzén, and Dmitrii Zholud
There is sustained and widespread interest in understanding the limit, if there is any, to the human life span. Apart from its intrinsic and biological interest, changes in survival in old age have implications for the sustainability of social security systems. A central question is whether the endpoint of the underlying lifetime distribution is finite. Recent analyses of data on the oldest human lifetimes have led to competing claims about survival and to some controversy, due in part to incorrect statistical analysis. This article discusses the particularities of such data, outlines correct ways of handling them, and presents suitable models and methods for their analysis. We provide a critical assessment of some earlier work and illustrate the ideas through reanalysis of semisupercentenarian lifetime data. Our analysis suggests that remaining life length after age 109 is exponentially distributed and that any upper limit lies well beyond the highest lifetime yet reliably recorded. Lower limits to 95\% confidence intervals for the human life span are about 130 years, and point estimates typically indicate no upper limit at all.
Human mortality at extreme age
Royal Society open science, Vol. 8, No. 9, pp. 202097. | 2021
Léo R. Belzile, Anthony C. Davison, Holger Rootzén and Dmitrii Zholud
We use a combination of extreme value statistics, survival analysis and computer-intensive methods to analyse the
mortality of Italian and French semi-supercentenarians. After accounting for the effects of the sampling frame, extreme-value
modelling leads to the conclusion that constant force of mortality beyond 108 years describes the data well and there is
no evidence of differences between countries and cohorts. These findings are consistent with use of a Gompertz model
and with previous analysis of the International Database on Longevity and suggest that any physical upper bound for the
human lifespan is so large that it is unlikely to be approached. Power calculations make it implausible that there is an upper
bound below 130 years. There is no evidence of differences in survival between women and men after age 108 in the Italian
data and the International Database on Longevity, but survival is lower for men in the French data.
Rejoinder to discussion of the paper “Human life is unlimited – but short”
Extremes, Vol. 21, No. 3, pp. 415-424. | 2018
Rootzén, H. and Zholud, D.
What can be learned from data about human survival at extreme age? In this rejoinder we give our views on some of the issues raised in the discussion of our paper Rootzén and Zholud (2017).
Human life is unlimited - but short
Extremes, Vol. 20, No. 4, pp. 713-728. | 2017
Rootzén, H. and Zholud, D.
Does the human lifespan have an impenetrable biological upper limit which ultimately will stop further increase in life lengths? This question is important for understanding aging, and for society, and has led to intense controversies. Demographic data for humans has been interpreted as showing existence of a limit, or even as an indication of a decreasing limit, but also as evidence that a limit does not exist. This paper studies what can be inferred from data about human mortality at extreme age. We show that in western countries and Japan and after age 110 the probability of dying is about 47% per year. Hence there is no finite upper limit to the human lifespan. Still, given the present stage of biotechnology, it is unlikely that during the next 25 years anyone will live longer than 128 years in these countries. Data, remarkably, shows no difference in mortality after age 110 between sexes, between ages, or between different lifestyles or genetic backgrounds. These results, and the analysis methods developed in this paper, can help testing biological theories of ageing and aid confirmation of success of efforts to find a cure for ageing.
Tail Estimation for Window Censored Processes
Technometrics, Vol. 58, No. 1, pp. 95-103. | 2016
Rootzén, H. and Zholud, D.
This paper develops methods to estimate the tail and full distribution of the lengths of the 0-intervals in a continuous time stationary ergodic stochastic process which takes the values 0 and 1 in alternating intervals. The setting is that each of many such 0-1 processes have been observed during a short time window. Thus, the observed 0-intervals could be non-censored, right censored, left censored or doubly censored, and the lengths of 0-intervals which are ongoing at the beginning of the observation window have a length-biased distribution. We exhibit parametric conditional maximum likelihood estimators for the full distribution, develop maximum likelihood tail estimation methods based on a semi-parametric generalized Pareto model, and propose goodness of fit plots. Finite sample properties are studied by simulation, and asymptotic normality is established for the most important case. The methods are applied to estimation of the length of off-road glances in the 100-car study, a big naturalistic driving experiment.
Efficient estimation of the number of false positives in high-throughput screening
Biometrika, Vol. 102, No. 3, pp. 695-704. | 2015
Rootzén, H. and Zholud, D.
This paper develops new methods to handle false positives in High-Throughput Screening experiments. The setting is very highly multiple testing problems where testing is done at extreme significance levels and with low degrees of freedom, and where the true null distribution may differ from the theoretical one. We answer the question 'How many of the positive test results are false?' by showing that the conditional distribution of the number of false positives, given that there is in all r positives, approximately has a binomial distribution, and find efficient estimators for its success probability parameter. Furthermore, we provide efficient methods for estimation of the true null distribution resulting from a preprocessing method, and techniques to compare it with the theoretical null distribution. Analysis is based on a simple polynomial model for the tail of the distribution of p-values. We provide asymptotics which motivate this model, exhibit properties of estimators of the parameters of the model, and point to model checking tools. The methods are tried out on two large genomic studies and on an fMRI brain scan experiment.
Tail approximations for the Student t-, F-, and Welch statistics for non-normal and not necessarily i.i.d. random variables
Bernoulli, Vol. 20, No. 4, pp. 2102-2130. | 2014
Zholud, D.
We present a detailed study of the asymptotic behavior of the distribution of the tails of these, perhaps, most commonly used statistical tests under non-standard conditions, that is, releasing the underlying assumptions of normality, independence and identical distribution and considering a more general case where one only assumes that the vector of data has a continuous joint density. We determine asymptotic expressions for P(T > u) as u tends to infinity for this case. The approximations are particularly accurate for small sample sizes and may be used, for example, in the analysis of High-Throughput Screening experiments, where the number of replicates can be as low as two to five and often extremely high significance levels are used. We give numerous examples and complement our results by a thorough investigation of the convergence speed - both theoretically, by deriving exact bounds for absolute and relative errors of the approximations, and by means of a simulation study.
Extreme Value Analysis of Huge Datasets: Tail Estimation Methods in High-Throughput Screening and Bioinformatics
PhD Thesis, University of Gothenburg. ISBN: 978-91-628-8354-6. | 2011
Zholud, D.
The thesis presents results in Extreme Value Theory with applications to High-Throughput Screening and Bioinformatics. The methods described in the thesis, however, are applicable to statistical analysis of huge datasets in general. The main results are covered in four papers.
The first paper develops novel methods to handle false rejections in High-Throughput Screening experiments where testing is done at extreme significance levels, with low degrees of freedom, and when the true null distribution may differ from the theoretical one. We introduce efficient and accurate estimators of False Discovery Rate and related quantities, and provide methods of estimation of the true null distribution resulting from data preprocessing, as well as techniques to compare it with the theoretical null distribution. Extreme Value Statistics provides a natural analysis tool: a simple polynomial model for the tail of the distribution of p-values. We exhibit the properties of the estimators of the parameters of the model, and point to model checking tools, both for independent and dependent data. The methods are tried out on two large scale genomic studies and on an fMRI brain scan experiment.
The second paper gives a strict mathematical basis for the above methods. We present asymptotic formulas for the distribution tails of, probably, the most commonly used statistical tests, under non-normality, dependence, and non-homogeneity, and derive bounds for the absolute and relative errors of the approximations.
In papers three and four we study high-level excursions of the Shepp statistic for the Wiener process and for a Gaussian random walk. The application areas include finance and insurance, and sequence alignment scoring and database searches in Bioinformatics.
The first paper develops novel methods to handle false rejections in High-Throughput Screening experiments where testing is done at extreme significance levels, with low degrees of freedom, and when the true null distribution may differ from the theoretical one. We introduce efficient and accurate estimators of False Discovery Rate and related quantities, and provide methods of estimation of the true null distribution resulting from data preprocessing, as well as techniques to compare it with the theoretical null distribution. Extreme Value Statistics provides a natural analysis tool: a simple polynomial model for the tail of the distribution of p-values. We exhibit the properties of the estimators of the parameters of the model, and point to model checking tools, both for independent and dependent data. The methods are tried out on two large scale genomic studies and on an fMRI brain scan experiment.
The second paper gives a strict mathematical basis for the above methods. We present asymptotic formulas for the distribution tails of, probably, the most commonly used statistical tests, under non-normality, dependence, and non-homogeneity, and derive bounds for the absolute and relative errors of the approximations.
In papers three and four we study high-level excursions of the Shepp statistic for the Wiener process and for a Gaussian random walk. The application areas include finance and insurance, and sequence alignment scoring and database searches in Bioinformatics.
Extremes of Shepp statistics for Gaussian random walk
Extremes, Vol. 12, No. 1, pp. 1-17. | 2009
Zholud, D.
We derive asymptotic behavior of the probability of high-level excursion for the maximal increment of a Gaussian random walk. The motivation for writing this paper comes from the problem of finding similarities between long biological sequences in Bioinformatics, however the result might also have suitable applications in other areas such as e.g. finance and insurance.
Extremes of Shepp statistics for the Wiener process
Extremes, Vol. 11, No. 4, pp. 339-351. | 2008
Zholud, D.
We derive asymptotic behavior of the probability of high-level excursion for the maximal increment of the Wiener process. The result is essential for deriving the corresponding asymptotic formula for maximal increments of a Gaussian random walk, and also has potential applications in finance and insurance.
On the limit distribution of multiscale test statistics for nonparametric curve estimation
Mathematical Methods of Statistics, Vol. 15, No. 1, pp. 20-25. | 2006
Dumbgen, L., Piterbarg, V.I. and Zholud, D.
We prove continuity of the limit distribution function of certain multiscale test statistics which are used in nonparametric curve estimation, e.g. in testing qualitative hypotheses (about an unknown regression function) such as nonpositivity, monotonicity or concavity.