Jonas Ranstam's website

Tips for statistical reviewers

An ideal manuscript starts with a specific research question and ends with an empirically supported answer. In practice, however, many manuscripts are difficult to read, with unclear explanations linking aims and interpretations. Many investigators are more interested in developing dogmas than evidence, and statistical methodology and results are, in many cases, used to disguise subjective opinions. Conflicts of interest are common. The investigator's personal, the affiliated research organisation's, and the scientific journal's status depend on publications and impact factors.

The primary purpose of having a manuscript reviewed statistically is to ensure that the limitations of the chosen research question, study design, data collection, statistical analysis and interpretation of analysis results are clearly described to the reader. The ICMJE recommendationđź”— to the authors is: "Link the conclusions with the goals of the study but avoid unqualified statements and conclusions not adequately supported by the data." To identify errors of omission as well as commission, the reviewer must also perform a critical evaluation of the empirical support for the author's conclusion.

Reviewers and authors

One typical reviewer mistake is to give too specific and detailed review comments, which may be caused by a wish to help but increases the risks of authorship issues and conflicts of interest. The different responsibilities of authors and reviewers should be respected. From a formal viewpoint, the reviewer is assigned his tasks by the editor-in-chief, to which all review comments should be addressed even if the corresponding author is copied on the comments. The reviewer is usually also requested to provide confidential comments to the editor. Recommendations about the rejection, revision, and accepting of manuscripts should be directed only to the editor.

General statistical problems

The most common mistake a statistical reviewer is likely to find is the misinterpretation of p-values and statistical significance. The reasons are not philosophical, such as about differences between the Fisher and Neyman-Pearson approach to hypothesis testing. On the contrary, the problem is much simpler; few medical investigators grasp the difference between description and inference, i.e. between describing findings in a sample and quantifying the uncertainty of these findings when generalised beyond the sample they were observed in. Sampling variation and sampling uncertainty are crucial phenomena to consider in empirical research, but the uncertainty measures, confidence intervals and p-values are often misinterpreted, confidence intervals as dispersion measures and p-values as indicators that either show practically relevant differences (p<0.05) or indicate evidence of "no difference" (NS).

However, the ICMJE recommendation states, "distinguish between clinical and statistical significance" and "quantify findings and present them with appropriate indicators of measurement error or uncertainty (such as confidence intervals). Avoid relying solely on statistical hypothesis testing, such as P values, which fail to convey important information about effect size and precision of estimates." As statistical and clinical significance are two different things, the p-value and the statistical significance do not help interpret the practical relevance of a finding. An interval estimate of the effect size, showing the plausible range of effects, and considerations of what can be declared as clinically important are necessary.

Another general mistake is to think of p-values as a sort of independent analysis block, which can be selected post hoc to show the empirical support for a finding depending on whether or not these blocks show statistical significance. However, post hoc testing, in the sense of testing what has already been observed, should not be confused with prespecified testing. While it may be theoretically possible to design a confirmatory experiment so that one statistical test is sufficient to answer the research question, most real investigations are exploratory, requiring a series of linked research questions. The analysis strategy may then need to be explained to the reader to show empirical support, but this is rarely done, perhaps because of a wish to avoid revealing mistakes. For example, what is the reason for presenting p-values in Table 1 when reporting trials or observational studies?

Specific statistical problems

Several technical problems tend to appear in manuscript after manuscript. The terminology used often reveals that authors are methodologically ignorant. "Assessing" and "determining" effects are commonly used instead of the more appropriate term "estimating" or "evaluating". Misunderstood technical terms such as multivariate instead of multiple or multivariable and quartile instead of quart are ubiquitous. Authors also often tend to use technical terms such as correlation and incidence in nontechnical ways. The ICMJE recommendation is to "Avoid nontechnical uses of technical terms", and statistical reviewers have a professional responsibility to care about the integrity and coherence of the statistical terminology.

The statistics section usually includes a description of how normal distribution has been checked, but it is hardly ever motivated why, and in many cases, it is meaningless. Some authors use non-parametric methods when analysing "non-parametric data". No such thing exists, but non-parametric null hypotheses are usually tested using distribution-free methods.

The inability to define the correct analysis unit is a special problem with potentially important consequences. For example, instead of analysing patients, some authors analyse knees, hips or fingers. Traditional statistical methods are based on the assumption of statistically independent observations, which is at variance with clustering on patients. It also leads to an overestimation of the degrees of freedom and leads to erroneous p-values and confidence intervals. Mixed models or GEE may be helpful for an adequate analysis.

Logistic regression models appear in various studies. These models are often developed using techniques that may be adequate for developing models for the prediction of individual outcomes, but the results are presented and interpreted as average risk factor effects. The accuracy of prediction models should, however, be evaluated in terms of sensitivity and specificity, and an evaluation of causal effects requires adjustments based on explicit assumptions regarding cause-effect relationships. In addition, the results of logistic regression models are presented in terms of odds ratios but are usually interpreted as risk ratios, which may be misleading. Other regression methods can be more helpful.

Accept or reject

Whether to recommend accepting or rejecting a manuscript may be a difficult question, but it seems reasonable to require compliance with the ICMJE recommendations for acceptance. Noncompliance can perhaps be changed with a revision, but for some manuscripts, the best advice may be to start over from scratch. Getting this outcome of a review may be disappointing for the author, but it should not be taken personally. Statistical reviewing is about the evaluation of evidence. A well-performed statistical review can improve a manuscript substantially and help the author avoid publishing embarrassing mistakes.



This sections of the website is still under development.