Studies assessing how well different methods of measuring the same underlying quantity agree with each other. Typical analyses are Bland-Altman methods (see Bland-Altman) including plotting differences versus the average of each pair, and calculating limits of agreement.
Probability of rejecting the null hypothesis when the null is true in the population; probability of making a false-positive conclusion when conducting 1 or more tests; equal to the significance level or type I error
The hypothesis that one wishes to claim; the opposite of the null hypothesis
Analysis of variance (ANOVA)
Statistical method used to compare 2 or more groups on the mean of the outcome variable; equivalent to a 2-sample t-test if only 2 groups
A relationship between 2 variables; assessed in many ways, depending on study design and type of variables
Probability of not rejecting the null hypothesis when it is false; probability of making a false-negative conclusion; type II error; 1 minus power
See normal distribution
Bland-Altman method of assessing agreeing between 2 methods of measurement; for example, includes plotting the difference between methods on the outcome versus the average of each pair, and calculating limits of agreement. Limits of agreement are the mean difference +/− SD of the difference.
Method used to maintain an overall type I error (or alpha), say at 0.05, by setting the significance criterion for a particular tests to alpha divided by number of tests.
Case control study
Study assessing the association between an exposure and outcome that is designed by first identifying subjects with and without the outcome and then assessing whether or not they had the exposure.
A variable that is categorized, either ordinal (eg, Likert scale) or nominal (eg, gender, blood type).
Making conclusions that one variable causes the outcomes in another, as opposed to the variables simply being associated.
Central limit theorem
Key statistical theorem that states that the mean of repeated samples from a population will approximately equal the mean of the population, and the sampling distribution of the mean will have a normal distribution (bell-shaped curve).
T-test used to compare 2 or more groups on a binary outcome variable.
A prospective study following a group or groups of patients over time.
Confidence interval (95%)
Interval defined as an estimate or estimated effect +/− 1.96 × standard errors; 95% of such intervals contains the true estimate or estimated effect.
A variable, such as blood pressure or body mass index (BMI), that is measured on a continuum
Means “distortion”. A variable that is associated with both the exposure and the outcome, and that should therefore be adjusted for when assessing association between exposure and outcome to prevent distortion of the effect of interest.
A measure of the linear relationship between 2 variables.
Cox proportional hazards regression model
Regression method used to compare groups on time to an event, especially when all patients do not have the event during follow-up. Main assumption is that the hazard functions (function defined by percent having event over) are parallel for the groups being compared. A hazard ratio is estimated to compare groups on survival.
Association between 2 variables at a fixed moment in time.
Practice of conducting many tests in a single dataset with the rather unfocused goal of searching for something that will be significant.
Data that is not independent, and often represents repeated measurements on the same subjects or units.
The outcome variable in a regression model.
Measures that assess how well a variable of interest can discriminate the truth, and measured by sensitivity, specificity, positive and negative predictive value, as well as the area under the receiver operating characteristic curve (for continuous or ordinal predictor).
Variable that represents counts, such as number of infections for a subject.
Variability; the degree to which units differ from each other on some outcome.
The true difference or effect divided by its standard deviation.
A quantity measured in a sample that tries to capture the truth in the given population.
Medical practice which allows itself to be informed by rigorous research results.
The independent variable of interest, and which a researcher desires to associate with an outcome.
Display of data that bins patients into equal-sized bars based on an outcome of interest and graphs the bars either horizontally or vertically.
Data that are not correlated with each other—typically from different subjects.
Explanatory variable in a model, also sometimes called the predictor variable or exposure variable.
Making a decision about an association of treatment effect in a population of interest from data on a sample from that population
Difference between the 3rd and 1st quartiles (75th and 25th percentiles) of a variable
Interval scaled variable
The quantitative variable for which a common distance between any 2 values has the same meaning.
An interaction is present if the relationship between 2 variables (say exposure A and outcome B) is different for different levels of a third variable (interacting variables C).
Statistical model with continuous dependent variable
Method use to display survival or failure curves for time-to-event data and compare curves using log-rank or Wilcoxon tests
Generalization of the Wilcoxon Rank Sum test to more than 2 groups
Limits of agreement (95%)
Mean bias (or difference) between 2 methods of measurement ±1.96 × standard deviation of the difference. Shows where 95% of differences are expected to fall.
Statistical model with binary dependent variable
Test used to compare 2 or more survival curves (see Kaplan-Meier)
Test used to compare 2 paired or correlated proportions
Average of a set of data points; equal to the sum of the values divided by the sample size.
Middle value among a set of data values sorted from smallest to largest. For an even number of observations, the median is the average of the 2 middle points.
Quantitative synthesis of results of more than 1 research study on the same topic
Most common value in a sample
Multiple comparisons procedures
Method used to control the type I error at a nominal level (usually 5%) when multiple comparisons are performed or multiple outcomes are assessed in the same study.
Multiple testing problem
The phenomenon that repeated testing increases the chance of false-positive findings.
A statistical model that contains more than 1 independent or explanatory variable.
A statistical model that contains more than 1 dependent or outcome variable, such as in a repeated measures model.
Negative predictive value
The probability of the true disease status being negative given that the test result or predictor is negative.
A type of variable that is not continuous, discrete or ordinal—but only a name and without any inherent ordering, such as gender or blood type.
Non-normally distributed data
Data that does not follow a bell-shaped or Gaussian curve distribution.
Normally distributed data
Data that does have a bell-shaped or Gaussian distribution.
A bell-shaped or Gaussian distribution defined by a mean and standard deviation.
The research hypothesis that researchers want to reject, and typically represents no association for the research question of interest.
A study in which the independent variable or exposure variable is not under the control of the researcher.
The ratio of the odds of an outcome in one group versus another. An odds ratio expresses the association between an exposure variable and an outcome variable, but does not imply causal inference between the 2 variables.
A data variable that consists of a limited number of ordered categories, such as ASA physical status or a Likert scale.
Probability value. A P-value gives the probability of observing a result as extreme or more extreme than the one observed in a research study if the hypothesis were in fact true.
A statistical test used to compare to dependent samples on a continuous outcome. Oftentimes the dependent samples represent measurements on the same patients under 2 different scenarios, such as before and after an intervention.
A measure of the linear association between 2 continuous or ordinal variables. The square of the Pearson correlation (R squared) represents the proportion of variance in one variable explained by the other.
The group of subjects or units that are the target of a research study; ie, the subjects, or units that one wishes to generalize to.
Positive predictive value
The probability that the true status is positive given that the test or predictor value is positive.
Power of a test
The probability of rejecting the null hypothesis for a given statistical method under a particular alternative hypothesis treatment effect.
A statistical model built particularly for the purpose of predicting individual patient values, and typically assessed for model fit (calibration) and how well the model can discriminate among the outcome values or explain the variance in the outcome.
Propensity score methods
Statistical methods used to control confounding by first modeling the probability of having the exposure as a function of potentially confounding variables, and then either matching exposed versus unexposed on that risk score (the propensity score) or weighting inversely on it when assessing the association between exposure and outcome.
A measure of central location defined as the number of events divided by the number of patients who are subjects. Equivalent to the mean of a binary variable with values 0 and 1.
The data values corresponding to the 25th and 75th percentiles of a sample are referred to as the first and third quartiles.
Research design in which the experimental units are randomly assigned to receive 1 of the 2 or more interventions being assessed, thus removing confounding or selection bias.
The difference between the largest and smallest data value in a sample.
Rejecting the null
The decision to disavow the null hypothesis (typically of no association) based on a statistical test that gives a small P-value, or else based on a confidence interval for the association of interest that does not contain the null hypothesis value.
The ratio of 2 proportions, typically estimated in a randomized study when comparing 2 groups on the outcome of interest.
The scientific hypothesis upon which researchers build a research study.
Repeated measures ANOVA
A statistical method that includes repeated measurements on the same subjects, or units, and accounts for the likely correlation within those subjects or units when either comparing times or comparing groups over time.
The statistic that estimates the proportion of the variance in the outcome variable,which is explained by 1 or more predictor variables in a linear regression, and is equal to the square of the Pearson correlation for simple linear regression.
The particular set of subjects, or units, that are measured in a research study; we make inference on the population of interest from the data in the sample.
Sample size calculation/ justification
The calculation giving either the number of required subjects, or the power to detect a difference in a research study. Components needed to calculate a sample size are the treatment effect of interest, estimated variability of the primary outcome, significance level, and power.
A graph plotting 2 continuous variables—one on the vertical axis and the other on the horizontal axis—to visually assess their association.
The probability that truly diseased patients will test positive.
The P value criterion used to indicate whether the null hypothesis will be rejected or not.
The type I error or probability of at least 1 false-positive finding in a research hypothesis or set of hypotheses.
A method to assess association between 2 quantitative variables using the rankings of the data values as opposed to the actual data values.
The probability of truly non-diseased patients testing negative.
Roughly speaking, the average deviation from the mean in a sample; the square root of the variance.
Standard error of the mean
The estimated variability of the mean of a group or of the difference between groups. For a single group, the standard error is equal to the standard deviation divided by the square root of N
A quantitative measurement on a sample whose goal it is to estimate the same quantity in the population of interest.
Alpha (type I error) and beta (type II error)
Statistical method used to compare to independent samples on a continuous outcome, or else to compare the single mean to a constant
The signal-to-noise ratio used in every statistical test, such as the difference in means divided by the standard error of the difference
The difference between groups or the association of interest. Can be defined in many ways.
In a sample, the measurement of how much the individual units differ from each other on an outcome of interest.
Wilcoxon rank sum test
Mann-Whitney or Wilcoxon-Mann-Whitney test: compares groups on the ranks of the data. Equivalent to a t-test on the ranks. Does not directly compare medians.
Wilcoxon signed ranks test
A nonparametric test used to compare to dependent samples on a continuous or ordinal outcome variable using the ranks of the differences instead of the actual values.
38.2 Types of Data
Appreciation of the various types of data is an important step in understanding which statistical test would be most appropriate for a given situation . The main types of data are interval or continuous, such as creatinine or blood pressure; ordinal or ranked data, such as American Society of Anesthesiologists (ASA) class (I, II, III, IV) or a Likert scale response (eg, satisfaction with care from low to high as 1,2,3,4,5); and categorical or nominal, such as male/female, alive/dead, or red/white/blue/green. Categorical data with two categories (male/female) is also called binary data. It is always a loss of information and therefore a less powerful analysis to make a continuous variable (say, age) into a binary variable (say, < 50, ≥ 50), although it is occasionally the best way to answer a specific research question. Counts such as number of children in a family or number of postoperative infections are called discrete measurements, and can often be analyzed using the same statistical methods as truly continuous data.
38.3 Descriptive Statistics
Summary statistics such as mean (standard deviation) for “normally distributed” data and median (25th%, 75th%) for non-normally distributed data are very useful ways to report study results. But any statistical analysis should also include plots of the data to visualize the relationship(s) that a statistical model is trying to express. It is a good rule of thumb that one should not report statistical results that cannot be visualized to some degree in a graphical display. A boxplot showing median, mean, quartiles, and range of data is an excellent way to display continuous data, and much better than simply plotting the mean and standard deviation (SD) (or standard error of the mean [SEM]) with a so-called “detonator” plot. In addition, if the data set is quite small, it is good to report a listing of the data points for each observation.
Get Clinical Tree app for offline access
38.4 Normal Distribution
The frequency distribution of many variables is naturally a bell-shaped curve; symmetrically distributed with higher concentration of data near the central value, and less data moving away from the center (◘ Fig. 38.1). Examples are age, body mass index (BMI), blood pressure, height, weight, log-transformed length of stay (actual LOS is non-symmetrically shaped—skewed to the right). Gaussian distribution is another name for the “normal” distribution. Data following a true normal distribution has specific properties: it is defined by two parameters, a mean and standard deviation (SD); data are symmetrically distributed around a mean; ~ 68% of data points lie within 1 SD of the mean; mean = median.