# 48

Statistics

*Alan Cook, MD, MS*

*University of Texas at Tyler, Tyler, TX, USA*

*A federal healthcare payer is considering incentive payments to trauma centers based on patient survival. The leadership wants to make sure the comparison is fair, so they are developing a risk adjustment model. They plan to incorporate injury severity into the model. They are choosing between the Injury Severity Score (ISS) (Baker, et al., 1974) and the Trauma Mortality Prediction Model (TMPM) (Osler, et al., 2008). One method of comparing the ISS and TMPM is to estimate the area under the receiver operating characteristic curve (AUROC) for each model with death as the outcome. The results are shown in Figure*48.1.

*Which of the following statements is the most correct?*

*The payer should chose TMPM because the AUROC is higher than the ISS*.

*The payer should chose ISS because the AUROC is lower than the TMPM*.

*The horizontal line represents the “perfect” AUROC score*.

*The Y‐axis can also be considered the False Negative Rate*.

*The ISS can discriminate survivors from fatalities 16.9% of the time (1 – AUROC)*.

The area under the receiver operating characteristic curve originated as a measure of radio signal detection or discrimination of signal from noise. It also gained a great deal of traction in psychology then radiology and medical decision‐making. The AUROC can be interpreted as a measure of a model’s ability to discriminate patients with an outcome from those without. Here, the outcome was death from traumatic injury. The construct of the graph is the sensitivity (y‐axis) over 1‐the specificity (x‐axis) for each point computed by the model or at predetermined cut points. This can also be described as the true‐positive rate (y‐axis) over the false‐positive rate (x‐axis). The AUROC for the ISS indicates the ISS can discriminate survivors from fatalities 83.1% of the time. Whereas the TMPM can make such discrimination 87.5% of the time. As such, the TMPM compares favorably over the ISS. In this analysis, the closer the curve is to the point [0, 1] (left upper corner) the greater the area under the curve indicating better discrimination capability. Thus, the diagonal line represents an AUROC of 0.5 where the model predicts no better than a coin toss.

**Answer: A**

Hanley J, McNeil B. The meaning and use of the area under a receiver operating characteristic (ROC) curve.

*Radiology*. 1982; 143:29–36.

*An academic trauma surgeon is studying the effect of cigarette smoking on the actual versus predicted inspiratory volume in elders with rib fractures. Trauma centers are required to record the chronic medical conditions of the patients, including smoking. Other pertinent clinical data will be included in the final, published article*.

*Select the best answer from the following:*

*The term for the proportion of patients who smoke is “incidence.”*

*The proper study design is a randomized trial in this study and all others*.

*This would be considered a cohort study as elderly patients with rib fractures are selected for the study, and the exposure is whether or not they are an active smoker*.

*This study would not require participant consent since no intervention is involved*.

*The study is considered a longitudinal study because the patients are elderly and if they smoke have likely done so for a long time*.

This study should be considered cohort study as the patients are selected for the study, and the outcome is compared according to smoking status (the exposure). If the study sought to describe the proportion of smokers among elderly trauma patients, it could be considered a prevalence study or a cross‐sectional study. Prevalence is a measure of the number of subjects in a population who have a condition at the time of the study and can be thought of as a “snapshot‐in‐time,” much like a survey or poll. While a randomized study is considered the epitome of study designs, it is not feasible in all studies. Here, we cannot ethically randomize elderly patients to smoke or not, and the physiological phenomenon of interest is deterioration of pulmonary function, which takes a significant amount of time to accumulate before an effect would be manifest clinically (choice C). The long follow‐up time would be prohibitively resource intensive. Although no intervention is being studied, the study would require consent from the participant as the protocol will entail medical testing and include the analysis of other clinical data. Since no new intervention is involved, the study may qualify for expedited review by the Institutional Review Board (choice D). If the study began with a group of teenagers and followed their pulmonary function over their lifetime at 5‐year intervals, the study would be a longitudinal study (choice E). The quintessential cohort study is the Framingham Heart Study.

**Answer: C**

Dawber TR, Meadors GF, Moore Jr FE. Epidemiological approaches to heart disease: the Framingham study.

*American Journal of Public Health and the Nations Health*. 1951; 41(3):279–86.

*Confounding is a term frequently bandied about in research. Select the item below, which accurately describes confounding*.

*A confounder is a variable (G) directly affected by two other variables, namely, the exposure (X) and the outcome (Y)*.

*The effects of a confounder cannot be mitigated in an observational study*.

*Including the confounder in the analysis exaggerates the effect sizes of the variables of interest, i.e. the odds ratio for the outcome due to the exposure is larger than the “true” effect*.

*A confounder is equally distributed between groups of subjects in a study*.

*Confounding describes the circumstance when the measure of effect of the exposure on the outcome is primarily due to a third variable related to both the exposure and the outcome*.

Confounding results when a third variable is responsible for the effect of the exposure (X) on the outcome (Y). The third variable, the confounder, is related to the exposure and the outcome. A classic example is the effect of alcohol consumption on lung cancer. The effect may be accounted for by the confounder of smoking. Oftentimes, people smoke while drinking or drink in places where smoking is present, like bars. Here, smoking is not equally distributed among the alcohol drinkers. Confounding is shown schematically in the following diagram:

The effects of confounding can be mitigated or adjusted for by including the confounding variable in a multivariable model, for example. The result of including smoking status in the analysis would likely mitigate or completely negate any observed association between alcohol consumption and lung cancer.

**Answer: E**

Williamson EJ, Aitken Z, Lawrie J, Dharmage SC, Burgess JA, Forbes AB. Introduction to causal diagrams for confounder selection.

*Respirology*. 2014; 19(3):303–11.

*Randomized, controlled trial is regarded as the epitome of study designs. However, not all studies are suitable for this particular design. Please select the most correct answer from the following, regarding randomization*.

*Randomized trials are relatively inexpensive compared to other study designs*.

*Randomized trials, like other study designs, cannot infer causality between the exposure or intervention and the outcome*.

*Randomization essentially guarantees significant p‐values for the effect of the intervention on the outcome*.

*Randomization removes the potential for selection bias in terms of allocating patients to the intervention or control group*.

*Randomization is the most ethical approach to clinical research*.

Randomization is the process in a randomized control trial where the participants are allocated to intervention or control groups through a formal randomization process. A trial that includes a randomization step as part of the protocol is, by definition, a prospective study that requires an intervention, follow‐up time, and staff to collect data. These characteristics of randomized trials tend to make them expensive compared to other study designs. A key strength of randomization is the removal of selection bias in the allocation of subjects to intervention and control groups. The removal of selection bias and the prospective nature of the study provide strong justification to infer causality between the intervention and outcome of a study. The randomization process tends to produce relatively balanced groups of participants in terms of characteristics important to the study. However, since p‐values are influenced by the effect size of the intervention and the number of observations in the analysis, the mere act of randomization does not assure statistical significance. Finally, not all studies lend themselves to a random allocation of subjects to the treatment or control groups. This is the case when treatments have become established as the standard of care despite the lack of prospective randomized trials.

**Answer: D**

Greenland S. Randomization, statistics, and causal inference.

*Epidemiology*. 1990; 1(6):421‐9.

*An investigator wishes to study the association of chlorhexidine oral rinse compared to saline rinse on the incidence of ventilator‐associated pneumonia (VAP, yes/no) at any point in the patient*‘*s ICU stay. The other risk factors for VAP include patient age, gender, injury severity, GCS on ED arrival, days of mechanical ventilation, traumatic brain injury (yes/no), face fractures (yes/no), etc. Since the incidence of VAP can be confounded by the presence or absence of other factors, several variables must be controlled for simultaneously to estimate the effect of chlorhexidine oral rinse*.

*The proper test for this analysis is:*

*Paired Student*‘*s t‐test*

*Multivariable logistic regression*

*Multivariable linear regression*

*Cox proportional hazard ratio*

*The Mann‐Whitney U test*

In a previous question, we discussed the phenomenon of confounding. One method of adjusting for one or more confounders in the analysis phase of a study is to control for them in a multivariable model. Here, the outcome of interest is binary, VAP (yes/no). Therefore, the paired Student’s

*t*‐test, which compares the means of a variable for a group of individuals measured before and after an intervention, like subjects’ weight before and after a diet change is not appropriate (choice A). The Mann‐Whitney U test is another name for the Wilcoxon Rank Sum test where one can compare the means of a variable between two independent groups of subjects when the distribution of the variable is not normally distributed. Additionally, the Mann‐Whitney U test is a bivariate test and cannot accommodate the nine variables necessary to the study (choice E). The Cox proportional hazard ratio is a multivariable model that incorporates a time‐to‐event component, e.g. the number of ICU days until discharge or death. The study in question is simply interested in whether or not VAP develops, not how long it takes to develop (choice D). The multivariable linear regression would be an appropriate multivariable model if the outcome of interest is continuous and linear, like hospital length of stay. The multivariable logistic regression is the model of choice for the analysis at hand (choice C). The logistic regression model is used to describe the relationship between a binary outcome variable, VAP, and a set of independent predictor variables whether they are continuous, categorical, or binary. The results are reported as odds ratios with 95% CIs and p‐values for the predictors.

**Answer: B**

Peng C‐YJ, So T‐SH. Logistic regression analysis and reporting: a primer.

*Understanding Statistics: Statistical Issues in Psychology, Education, and theSocial Sciences*. 2002; 1(1):31–70.

*A group of investigators conducted a multicenter survey study of 27 trauma centers across three states. They hypothesized that the number of trauma and acute care surgery (TACS) consults and activations was associated with the number of cases done by TACS residents per month. The data are as follows:*

Mean (SD) Minimum, maximum Median (IQR) Activations 199.7 (110.5) 60, 408 160 (200) Resident cases 22.1 (6.8) 12, 32 23.2 (11.6)

*Describe the variable activations in terms of data type*.

*Binary*

*Nominal*

*Ordinal*

*Continuous*

*Ratio*

Numerical data can take several forms. The type of numerical data in a variable can determine the appropriate tests of significance and regression model to choose. The simplest type is binary or dichotomous. Binary data contain two mutually exclusive values like 1 or 0 for alive or dead (choice A). Nominal data represents categories of a phenomenon like blood type, for example 1 = A+, 2 = A−, …, 6 = O−. There is no quantitative difference between the categories. A + ≠ 2 × A−. Moreover, the blood types aren’t ordered. The numeric values are contiguous as a matter of convenience (choice B). Ordinal data can be placed in meaningful order, e.g. the order of finishers in a race (1

^{st}place, 2^{nd}place, and so on). However, there is no information about how far apart the runners finished (0.01 seconds between 1^{st}and 2^{nd}place, 0.07 seconds between 2^{nd}and 3^{rd}place) (choice C). If the variable was named “Total Time” and contained each racer’s course time in milliseconds, the variable would be considered continuous just as the variable “Activations.” Note that continuous data are presented as discrete values rounded to a convenient decimal place (choice D). Most biometric data belong on the ratio scale. The ratio scale is like the continuous numeric scale with the limitation that it includes zero but does not include negative numbers (choice E).

**Answer: C**

Barkan H. Statistics in clinical research: important considerations.

*Annals of Cardiac Anaesthesia*. 2015; 18(1):74.

*One fundamental analytic technique is to compare dichotomous variables and outcomes using a 2 × 2 contingency table. Only one of the following can be computed from this construct*.

*Spearman*‘*s correlation coefficient*

*Odds ratio*

*Cox Hazard Ratio*

*Pearson*‘*s rho*

*β coefficient*

The 2 × 2 contingency table is a fundamental construct in biostatistics. It can represent a test result (positive or negative) and the disease state (present or absent), a risk factor and the disease, etc.

Disease No disease Exposure *a**B*No exposure *c**D*

All of the following can be calculated from the 2 × 2 table as follows:

or the ratio of risk of an outcome in the exposed to that in the unexposed. Odds ratio (OR) is the ratio of

__odds__of an outcome in the exposed to the odds in the unexposed. Sensitivity and specificity are common terms in scientific literature. Sensitivity is the proportion of true positive cases (*a*) among all who develop the disease (*a + c*). While specificity is the proportion of true negative cases (*d*) among all who do not have the disease (*d + b*).