Measurement of Pain



Measurement of Pain


Mark P. Jensen



Introduction

Valid and reliable pain assessment is essential for successful pain care. Adequate assessment is also necessary to determine the efficacy of pain treatments in clinical trials and for understanding the mechanisms of those effects. The clinician or researcher who wishes to use the most useful measures and strategies for pain assessment is faced with a large, and growing, number of options and decisions. The purpose of this chapter is to make those decisions easier.

Of course, in clinical settings, pain assessment in a broader sense should involve a detailed psychological and medical exam and often also involves the administration of general measures of physical and psychological function. These issues are addressed in detail in other chapters of this volume on different pain conditions. The focus of this chapter is on self-report and observational measures that are specific to the experience and impact of pain.

The chapter begins with a brief discussion of several important issues that clinicians and researchers need to consider when choosing from among pain measures and designing pain assessment procedures, including (1) evaluating the reliability, validity, and utility of pain measures; (2) determining the number of pain problems to assess; (3) choosing the pain domain(s) to assess; and (4) selecting the time period of assessment (e.g., current pain experience vs. recall of pain over the last day, week, or longer). The bulk of the chapter then reviews the available psychometric information regarding measures of six pain domains: pain intensity, pain affect, pain quality, pain site, pain’s temporal characteristics, and pain interference. Next, the chapter briefly discusses strategies for assessing pain in special populations (e.g., infants and young children or other patients who might have difficulty expressing themselves verbally). It ends with a summary of recommendations.


VALIDITY, RELIABILITY, AND UTILITY IN THE CONTEXT OF PAIN ASSESSMENT

No measure is perfect. No one measure assesses all pain domains, nor is any single measure useful in all settings and with all populations. Moreover, because of the imperfection of available instruments, it is theoretically possible to modify any existing measure to improve it further or to develop new and better measures to replace existing ones. As a result, new pain assessment procedures and measures are constantly being developed and published. Thus, the clinician or researcher seeking to find the best measure for his or her needs should not only be aware of the existing pain assessment literature but should also know how to evaluate new measures as they are published.1 The following section seeks to facilitate this task by briefly summarizing the three key issues that should be considered when evaluating any pain measure: validity, reliability, and utility.


Validity

Validity refers to the appropriateness, meaningfulness, and usefulness of a measure for a specific purpose. It is generally seen as the most important consideration in the evaluation of a measure.2 Validity always needs to be evaluated with respect to the specific purpose a measure or instrument will be used for; measures are not inherently “valid” or “invalid” in and of themselves. For example, a hammer is not inherently valid. It is valid (useful) for driving nails into wood but invalid for washing dishes.

Rarely, if ever, can the validity of a measure be determined with a single study. Rather, support for the validity of a measure is usually established over time and with a series of studies. When evaluating the validity of a potential measure, the clinician or investigator should consider content, construct, and criterion validity.


Content Validity

Content validity concerns the degree to which the items of a measure represent a defined universe or domain of interest. For example, if a measure of a patient’s usual pain or average pain over the last month is needed, then a single rating of current pain would not usually be considered to have content validity for assessing this construct because pain can vary so much from one moment to another. Similarly, if a measure of the impact of pain on a patient’s life is needed, and a measure includes items that ask only about pain’s impact on sleep and mobility (but not other important daily activities), the measure would not generally be viewed as adequately representing the domain of pain interference. Thus, a critical question that every test user should ask is whether or not a potential measure assesses or represents all of the key components of the domain of interest. If the measure does not meet this standard, it does not have content validity.


Construct Validity

Construct validity refers to how well the items of a measure perform as measures of the domain or construct of interest. Two measures can have similar content validity—that is, both may contain items that assess the critical components of some pain construct—but have different construct validity. For example, if two measures ask about pain interference with the same set of activities, yet respondents are asked to indicate the extent of interference with each activity using different response levels (e.g., yes/no response in one measure vs. 0-to-10 scales in the second), the latter measure may evidence more precision than the former. The more precise measure may represent the construct better and have more construct validity than the less precise one, despite the fact that the two measures have the same content validity.

Similarly, if the language used in the items of one measure is clear and succinct and the language used in another measure is confusing and complex, the former measure would likely contain less error than the latter measure, and the scores from the former measure would therefore be more likely to better represent the construct of interest. Thus, factors other than content validity will impact how the scores obtained from different measures behave, especially with respect to their associations with other important pain-related measures and the precision with which they represent the domain of interest. Evidence for the construct validity of a measure generally comes from studies that demonstrate strong associations between a measure’s score and other measures of the same construct or related constructs and weak to moderate associations with measures of other constructs.



Criterion Validity

Criterion validity refers to a measure’s associations with one or more key outcome criteria. Usually, the most important criterion of a pain measure is the responsivity of the measure to the effects of a pain treatment, or to changes in pain over time, because pain measures are most often used for detecting these differences and changes. Pain measures that are proposed to be used as outcome measures in clinical trials should therefore have evidence that they are able to detect treatment effects or show expected changes in pain over time.

But not all pain measures are designed to assess treatment efficacy. A number of measures of pain quality, for example, and as described later in this chapter, were designed to distinguish from among different types of pain (e.g., neuropathic vs. nociceptive). The validity of such measures should be determined by their ability to perform the task they were designed for or that they will be used for; their validity as measures of treatment efficacy need only be of concern when or if they are being considered for that specific purpose.


Reliability

Reliability refers to the extent to which the score from a test is free from errors of measurement. Many factors, other than a patient’s experience of pain, could potentially influence his or her response to a pain measure or scale. Such factors could include the specific assessment setting (e.g., home vs. clinic), assessment burden (e.g., single assessment vs. a daily diary), the person administering the measure (e.g., research assistant, nurse, spouse, primary health care provider), other subjective experiences and feelings (e.g., being more or less fatigued or upset), motivational factors (e.g., desiring to appear stoic, wanting a prescription for a specific medication), ethnicity or culture, and previous learning experiences (e.g., the consequences of reporting of higher vs. lower pain levels), among many others. The variability in a pain score (the “variance”) that is associated with these other factors, and that is not associated with the specific domain of interest, is considered error variance. Although no measure is 100% reliable, the best measures demonstrate relatively little influence of these other factors and potential sources of error.

Higher error variance means lower reliability. Unlike validity, which is considered with respect to the proposed use of the measure, and therefore varies depending on context, reliability is usually considered to exist within a measure. However, it is also possible for a measure to be more reliable in some settings or with some populations then in others. For example, as described in more detail later in this chapter, Visual Analogue Scales (VASs) of pain intensity (where the respondent is asked to make a mark on a line that represents the perceived magnitude of pain) have been found to be more difficult for patients with cognitive deficits than with patients who do not exhibit cognitive deficits. VAS measures, then, are now considered to be inadequately reliable in populations at risk for cognitive deficits, although evidence indicates that they may be adequately reliable in otherwise healthy adults. Thus, it is important that the reliability of any measure be established for the specific population with whom the measure will be used or at least in samples of individuals who are similar to the population with whom the measure will be used.


Utility

Finally, issues of reliability and validity need to be considered in light of a measures’ utility, given that there is often a trade-off among these. For example, to maximize the content validity of a measure of pain interference, one would want the measure to assess the pain interference of all, or nearly all, of the possible (100s? 1,000s?) activities a person could engage in. Such a measure, although it would have clear content validity, would not be practical; no one would use it. Similarly, to maximize the content validity of a measure of a patient’s usual pain over the course of the last month, one might ask the patient to report on his or her current pain every hour for 30 days and then average those responses into a single index of average pain. But few patients would be willing to perform this assessment task, and the costs of ensuring complete data for such a measure would be prohibitively expensive for most clinicians and many researchers. Deciding on which measure(s) to use for a particular application often comes down to selecting the measure that is both adequately valid and practical.


HOW MANY PAIN PROBLEMS SHOULD BE ASSESSED?

Patients often have more than one pain problem. For example, the majority of individuals with spinal cord injury have chronic pain, and the majority of these report pain at more than one site.3 Clinicians and those researchers who do not limit their sample to the (few) patients with only one pain problem are faced with the difficult task of determining the number of pain problems to assess in any one patient or study participant. If only one “primary” pain problem is assessed at a clinic visit, but on the next visit, a different “primary” pain problem emerges as the most distressing, then it would be very difficult to track the effects of pain treatment from one clinic visit to the next.

Similarly, researchers who limit the number of pain problems assessed to just one primary problem run the risk of underestimating the magnitude of pain and its impact in their research findings. On the other hand, it is not practical to assess every pain problem in every patient seen in the clinic or in every participant of a research study. These considerations suggest that, in many situations, patients should have the opportunity to report on more than one pain problem but not necessarily be required or expected to report on every pain problem that they have at every assessment point.

But how many pain problems should be assessed? Two? Five? More? One approach to deal with this issue is to begin by assessing pain “in general”; for example, asking patients to consider all of their pain problems together when rating the overall average magnitude or intensity of their pain and the impact of pain on their lives. This is a practical solution, especially for assessing pain interference, because it may be very difficult for patients to identify the unique contribution of each different pain problem to interference with different activities. Moreover, assessing global pain intensity and interference allows the clinician or researcher to have a single primary measure of these two key pain domains, making analyses and tracking over time easier.

However, limiting assessment to only pain “in general” may oversimplify assessment and also interfere with determining the true effects of pain treatment. For example, if a pain treatment reduces the pain associated with one pain problem (e.g., headache) but not another (e.g., low back pain or a neuropathic pain condition), the specific effect of the treatment on headache pain might be less noticeable or even lost altogether if a measure of “general” pain intensity is used. So, in many situations, allowing for the assessment of more than one pain problem would be useful.

Unfortunately, however, there is not yet a clear consensus in the field concerning the best number of pain problems to assess. In the clinical setting, it probably makes sense to assess as many of the pain problems that are of concern to the patient. If the patient experiences eight unique pain problems and views each as a significant problem that contributes to dysfunction, then perhaps each of these should be assessed, at least at the initial evaluation, and then tracked at subsequent clinic visits as appropriate.


When determining how many pain problems to assess in a research study, the number of problems that should be assessed would vary as a function of the research question(s) being asked and the specific population being studied. One reasonable option would be to select the number of pain problems to assess that would capture the majority of patients in the population. For example, in persons with spinal cord injury, it has been recommended that investigators should consider assessing basic information (such as pain location and intensity) for up to three presenting pain problems.4,5 In this instance, three was chosen as a way to balance the need for a thorough assessment against the need to minimize assessment burden, keeping in mind that the majority of persons with spinal cord injury and pain report three or fewer pain problems.3 Although it is unlikely that a single upper limit of pain problems can be identified that should be assessed in every research project and with every patient population, each investigator would do well to consider this issue when developing assessment protocols.


WHICH PAIN DOMAIN(S) SHOULD BE ASSESSED?

Clinicians and researchers have long recognized that pain is a multidimensional experience that includes a number of measurable qualities such as intensity, affect (global bothersomeness of the pain experience as well as the impact of pain on emotional functioning), sensory quality, spatial quality (location), temporal quality, and impact on or interference with daily activities.6,7 Although the focus of pain assessment in many clinical and research settings has often been, and continues to be, on pain intensity,8 there is now a well-established recognition and interest in the assessment of pain’s other domains.9

It is important that clinicians and investigators consider assessing more than just pain intensity for a number of important reasons. First, limiting assessment to only pain intensity leaves clinicians and researchers in the difficult position of having limited information about the presenting pain problem(s). In a clinical situation, changes in a pain domain not assessed might end up being critical for understanding the effects of a pain treatment (e.g., if pain qualities are not assessed, and a treatment reduces the “aching” and “deep” qualities of a pain problem, but perhaps not average pain intensity overall, or if a treatment produces a decrease in the impact of pain on sleep or other areas of functioning, even when there has been a minimal impact on pain intensity). For this reason, clinicians and researchers interested in assessing pain should at least consider all of the pain domains when determining which ones to assess and perhaps only avoid those domains they are certain will not be important to treatment (for clinicians) or understanding (for researchers).

One of the factors that might determine the selection of fewer domains and measures (e.g., perhaps choosing to assess just pain intensity and pain quality) over a more comprehensive assessment (e.g., including measures of pain site, pain interference, and the temporal qualities of pain as well as perhaps more general measures of psychological and physical functioning) is whether the pain problem being assessed is more acute or chronic. Acute pain, which may be defined as pain resulting from current or very recent damage to tissue, includes pain from medical procedures (e.g., injections, lumbar punctures, surgery) as well as both major and minor physical injuries. Because acute pain problems tend to resolve quickly in most individuals, their impact tends to be transitory. In this situation, and if the focus of treatment is on just one or two pain domains (e.g., pain intensity, mood), then it may be appropriate to assess only one or two pain domains.

Chronic pain, on the other hand, tends to be more complex than acute pain. It also tends to influence a large variety of quality of life domains (e.g., employment status, sleep quality, psychological status, social functioning). Patients’ responses to chronic pain can also be quite variable, as can its impact. For chronic pain, then, in both clinical and research settings, and in order to ensure a thorough understanding of the pain problem, more pain domains, and perhaps more measures that assess these domains, are often required.


RECALL RATINGS VERSUS SUMMARY SCORES FROM MULTIPLE RATINGS USING DIARIES

Often, the clinician or researcher wishes to have a measure of a patient’s usual pain during a specific period of time. A single measure or rating of current pain is not likely going to be an adequate index of usual pain, given that pain can vary from one moment to another. So what is the best way to assess usual pain? The three most common methods to assess pain intensity in clinical trials are (1) ask respondents to provide multiple ratings of current pain on pain diaries during the epoch of interest (e.g., four times per day for 7 days) and then compute the arithmetic mean of the ratings obtained, (2) assess pain once per day on several days, asking respondents to provide a recall rating of their average pain in the previous 24 hours and then computing the arithmetic mean of these 24-hour recalled average pain ratings to create a composite score representing usual pain, or (3) assess pain just once but ask respondents to provide recall ratings of their average pain over the entire epoch of interest. Each approach has strengths and weaknesses. In addition, pain assessment experts have not yet reached a clear consensus on which option should be recommended.

Support for the first approach, using multiple ratings of current pain from daily diary data (most often obtained electronically by asking patients to access a Web site to provide ratings or via automated telephone calls to patients) comes from studies that have demonstrated (1) a biasing impact of recent pain and worst pain (also known as “end” and “peak” effects) on recall ratings10,11,12 and (2) access to computer-based and electronic diary technology which facilitates data collection.13,14 When diaries are required, electronic, Web-based, or phone diaries are preferred over paper-and-pencil ones, given the common finding that respondents use paper-and-pencil diaries inappropriately.15,16 Because of the strengths of diary approaches, there has been an increase over time in the use of this approach to collect data in clinical trials.17,18,19,20

However, there are also good reasons which make some investigators hesitant to embrace an approach that requires multiple daily diary entries to compute pain intensity scores, especially for use in clinical trials. First, as a practical issue, daily diary data can be expensive to collect. The financial cost of the hardware and software management associated with data collection via electronic diaries may be beyond the means of some investigators. Related to this, there is also a cost in terms of patient assessment burden. Some procedures ask respondents to provide only one assessment per day,15,21,22,23 but it has been more common to ask patients to provide three23 or even more (4 to 6 times17,18,24) ratings per day. This requires a significant effort on the part of patients, which may lower compliance with the task. To the extent that less costly recall ratings (that require just a few or even just one assessment) may be adequately valid, investigators may save substantial resources and significantly decrease the patient assessment burden if recall ratings are used instead of diary ratings.

A second problem with diary data requiring many ratings is that using this approach will result in missing data. The reported percentages of missing data points from electronic diary studies range from 6%16 to 25%.25 The reported rates of study participants who provide incomplete data (i.e., at least some missing data during the study period) can range from 17%15 to 46%.21 The primary reason reported for missing electronic data is that the patient did not hear the alarm or cue asking for the assessment.26 Other reasons given include the alarm going
off at an inconvenient time, the participant being too busy to respond, technical difficulties with a computer, emotional reasons, and pain being too severe at the time of assessment.26 When data are missing, investigators need to either remove subjects from the analyses (which limits the generalizability of the findings and runs the risk of resulting in findings that overstate the impact of treatment) or use some approach to impute the missing data (i.e., estimate what the missing ratings might have been, had all subjects provided complete data). A variety of data imputation procedures can be used for clinical trial data, such as “last observation carried forward” which involves taking the most recent rating obtained and replacing all missing values with that rating.17 A more conservative approach is to replace missing values with pretreatment ratings. However, if the treatment being examined is effective, this approach can underestimate the treatment effect size. Regardless of the approach used, however, data imputation adds error; imputed data are estimates only. In fact, the error added by the need to impute missing scores could potentially be greater than that associated with recall bias, so the use of diary data over recall ratings could potentially result in a more costly and effort-intensive assessment procedure that is ultimately also less accurate.

A third issue is that the use of electronic diaries limits the subjects who can participate in a study. For example, in one electronic diary study that approached 52 possible participants, 6 refused participation outright, 1 did not have the motor ability to hold the computer stylus, and 5 had visual problems that interfered with their ability to read the computer display.21 By limiting the participants in clinical trials to those who are able and willing to use electronic diaries, their use in these trials limits the generalizability of the study findings.

Perhaps, the strongest argument against the strategy of computing average pain from multiple ratings of current pain over time is that research increasingly supports the conclusion that recall ratings of pain intensity are adequately valid for most research purposes. Although research does indicate that there can be both peak and end effects that bias recall ratings, these effects tend to be small.10,12 Moreover, research indicates that the correlations between recalled average pain (in the previous 7 days) and actual average pain during that same period (as assessed by diaries) are strong (correlation coefficients range from 0.68 to 0.9927,28,29,30,31,32,33)—well within a range that indicate they carry valid variance as measures of average or usual pain. In short, recall ratings reflect actual average pain and are therefore valid indicants of that pain. In addition, and perhaps most critically, the research finding that provides the most support for the validity of recalled pain ratings as outcome measures in pain clinical trials is indisputable: Recall ratings are responsive to the effects of pain treatments known to impact pain. Hundreds, if not thousands, of clinical trials have shown that effective treatments for chronic pain result in reductions in recall ratings of average pain. Thus, there is adequate evidence to support the use of recall ratings as valid measures of pain intensity in clinical and research settings.

That said, two important questions remain unresolved. First, if a researcher chooses to obtain multiple ratings of recalled pain over a relative small recall period (e.g., daily ratings of average pain in the past 24 hours) and compute the arithmetic mean of these ratings to create a composite score representing usual pain, how many of these recall ratings are needed to create an adequately valid measure of a patient’s average pain? Second, might a single rating of recalled pain over a relatively larger recall period (e.g., a single rating of recalled average pain in the past week) be as adequately valid as a composite score made up of multiple 24-hour recall ratings?

Unfortunately, to date, relatively little research has been performed to address these two questions, despite the significant implications of the answers for designing and conducting pain clinical trials. Only three studies were identified which address the first question. In the first of these, investigators compared the reliability and sensitivity (i.e., ability to detect significant treatment effects) of pain intensity scores made up of one to nine 24-hour recall ratings of pain intensity using data from a clinical trial evaluating the efficacy of oxymorphone extended-release for the treatment of low back pain.34 As would be expected based on psychometric theory, the reliability of the outcome measure reflecting average pain intensity increased as the number of ratings used to compute that measure increased from one to nine. However, and unexpectedly, this increase in reliability was not associated with improvements in the ability of the composite score to detect treatment effects. In fact, the first single 24-hour recall rating was about as sensitive for detecting treatments effects as the composite scores made up of 2 to 9 ratings.34 In a commentary on this finding, other researchers noted that these results may have been due to the unusually strong test-retest reliability of the 24-hour recall ratings in the study sample.35 To address this question further, these other investigators performed a similar set of analyses using data from a clinical trial evaluating the efficacy of a behavioral intervention for pain associated with osteoarthritis.35 They found that the individual 24-hour recall ratings of average pain intensity evidenced a great deal of variability in their ability to detect treatment effects, with effect sizes ranging from medium (d = .34, P = NS) to large (d = .88, P < .001). Thus, had only a single 24-hour recall rating been used, the actual treatment effect might have been under- or overestimated and deemed either significant or nonsignificant, depending on the single rating used. On the other hand, the effect sizes for composite scores—which were .51, .55, and .66 for composites made up of two, three, and seven ratings—were more stable. Also, all of the composite scores were statistically significant (Ps = .0005 to .007).35 The third study that addressed this question was a secondary analysis of a pilot study (N = 10 individuals with spinal cord injury and chronic pain) which examined the ability of pain intensity scores to detect changes in pain intensity from before to after 12 sessions of neurofeedback treatment.36 In this study, the single-item ratings (of current pain and 24-hour recalled least, average, and worst pain) also evidenced a great deal of variability in effect sizes (ranging from 0.08 to 1.03). They also became more stable when two ratings were averaged (range, 0.16 to 0.38) and were not that much more stable than this when four ratings were averaged (range, 0.17 to 0.42). These investigators concluded that just two ratings (e.g., two ratings of 24-hour recalled average pain) may be adequate for assessing pain in outcome trials in clinical trials. However, they also noted that it is possible that in some populations—especially populations where pain can vary markedly from 1 day to the next, such as in individuals with chronic headache—more than two ratings might be needed to ensure adequate reliability and validity.

With respect to the second question, regarding the potential of a 7-day recall rating for providing valid ratings of average pain during the previous 7 days, although there are findings showing a tendency for people to overestimate recalled pain over the past week,25,37 no study was identified which directly compared the validity of a single 7-day pain recall rating with composite scores made up of multiple diary ratings. Thus, this critical question remains unanswered at this point in time.

Given the current state of knowledge, what can or should researchers do with respect to assessing average or usual pain intensity? The answer probably depends on the resources available to the researcher. Certainly, there is evidence that a single 7-day recall rating of pain intensity can be sensitive to changes in pain with treatment (e.g., Hans et al.,38 Branco et al.39). Therefore, 7-day recall ratings can be considered when
resources are limited. However, given that psychometric theory predicts greater reliability (and therefore, ultimately, validity) when more measures are combined into composite scores, and until research is performed to compare single 7-day recall ratings with various combinations of 24-hour recall ratings, it would probably be wise for investigators to consider obtaining more than one rating (e.g., three or four 24-hour recall ratings obtained within a specified number of days) and combining these recall ratings into a single composite score.


Measuring Pain’s Domains


MEASURING PAIN INTENSITY

The single pain domain assessed most often in clinical and research settings is pain intensity, or the magnitude of felt pain.40 The three most commonly used scales to assess pain intensity are (1) the VAS, (2) the Numerical Rating Scale (NRS), and (3) the Verbal Rating Scale (VRS) (Fig. 20.1). The results from research across many different pain populations yield fairly consistent findings concerning the psychometric properties of these measures6,9,41 and may be summarized as follows:



  • Each of these measures is adequately valid and reliable as a measure of pain intensity in most settings.


  • For both VAS and 0-to-10 NRSs, changes (decreases) between 30% and 35% appear to indicate a meaningful change in pain to patients across patient populations.


  • For 0-to-10 NRSs, the rating chosen has a specific meaning in terms of the impact of pain on function. In most samples, ratings in the 1 to 4 range have a minimal impact on function and can be viewed as representing “Mild” pain. Once ratings reach 5 or 6, patients report that pain has a greater impact on function; these ratings can be viewed as “Moderate” pain. Ratings ranging from 7 to 10 have the greatest impact on function and can be viewed as representing “Severe” pain.


  • When examined, single-item measures of pain intensity appear to have adequate test-retest stability (often, but not always, greater than 0.80) over short periods of time.






    FIGURE 20.1 The Visual Analogue Scale, Numerical Rating Scale, and Verbal Rating Scale.


  • There are fairly consistent differences between available measures in terms of their failure rates. VASs usually show higher failure rates than NRSs and VRSs, and NRSs tend (when differences are found) to show slightly higher failure rates than VRSs, probably related to the increased complexity of matching a sensation to a line length versus a number or verbal descriptor.


  • In terms of preferences, patients in Western countries tend to prefer VRSs and NRSs over VASs. Patients from China prefer VRSs over NRS.42 Whether this finding replicates to populations of patients in other non-Western countries remains to be seen.


Recommendations for Assessing Pain Intensity

Given the empirical support for the validity and reliability of VASs, NRSs, and VRSs as measures of pain intensity, any of these could reasonably be considered as options in most clinical settings or as outcome measures in clinical trials. Primarily because of (1) differences in failure rates between these measures in some populations (supporting NRSs and VRSs over VASs),41 (2) the evidence that some people can differentiate between more than just four or five levels of pain between from “No pain” and “Extreme pain,”43,44 and (3) the potential benefits of standardizing pain intensity assessment to allow for increased comparisons between studies, the field has recently moved toward recommending that clinicians and researchers consider first using the 0-to-10 NRS (see Fig. 20.1) over other pain intensity measures, at least in Western countries and cultures.9

Of course, there may be times when the 0-to-10 scale may not be appropriate. This scale requires the respondent to match his or her pain experience to a number, a task that may not be that easy for the very young, the extremely elderly, or individuals who are very ill. In these cases, and perhaps others, alternative pain intensity measures may be needed (see “Measuring Pain in Special Populations” section). Moreover, research suggests that 0-to-10 scales are not preferred by patients in China, as these individuals tend to describe their pain severity using verbal descriptions.42 Additional research in non-Western countries regarding the utility and validity of the different pain intensity measures is needed to help determine which scale(s) might be most appropriate for cross-cultural research.


MEASURING PAIN AFFECT

The affective quality of pain includes both the general unpleasantness and/or bothersomeness of the pain sensation as well as the many varieties of affect (fear, anger, sadness, frustration, feelings of hopelessness) that pain can produce—especially as it becomes chronic. The most common measures of general, global pain unpleasantness are single-item rating scales (VASs, NRSs, and VRSs) that use endpoints that reflect extreme levels of unpleasantness (e.g., for a 0-to-10 NRS or 100 mm VAS, “not bad at all” for the 0 rating or 0-mm mark and “the most unpleasant feeling possible for me” for the 10 rating or 100-mm mark).45 In general, these measures have proven useful in highly controlled laboratory studies that seek to differentiate intensity from affective components of pain.45,46

On the other hand, outside of the laboratory setting, patients appear to treat single-item VAS, NRS, and VRS measures of pain unpleasantness much like measures of pain intensity so that the two are often indistinguishable from one another in clinical populations.47,48 Moreover, one might question the content validity of single-item measures of affect, given the complex and multidimensional nature of emotional experience.

Pain affect can also be assessed using multiple-item scales, the most common of which are the Affective subscale of the McGill Pain Questionnaire (MPQ)49 and its associated short form, the Short-Form McGill Pain Questionnaire (SF-MPQ)50 (Fig. 20.2). The original MPQ contains 78 descriptors that are
categorized into 20 subgroups, 5 of which assess the impact of pain on affect. The five affective domains are tension (assessed using “tiring” and “exhausting” descriptors), autonomic (assessed using “sickening” and “suffocating” descriptors), fear (assessed using “fearful,” “frightful,” and “terrifying” descriptors), punishment (assessed using “punishing,” “grueling,” “cruel,” “vicious,” and “killing” descriptors), and affective miscellaneous (assessed using “wretched” and “blinding” descriptors).49 Within each domain, when administered the MPQ, respondents are asked to circle or mark the single descriptor within each group that most accurately reflects or describes their pain. Descriptors are then ranked according to their position in the word set. The Pain Rating Index (PRI), which can be computed for each of the four primary MPQ subscales,
including the Affective subscale, is the sum of the rank values of these descriptors.






FIGURE 20.2 The Short-Form McGill Pain Questionnaire. (Reprinted with permission from Melzack R. The short-form McGill Pain Questionnaire. Pain 1987;30[2]:191-197.)

The short form of the MPQ (SF-MPQ) contains 15 descriptors, four of which come from the MPQ Affective subscale (“tiring-exhausting,” “sickening,” “fearful,” and “punishing-cruel”).50 However, unlike the MPQ, which requires respondents to select a single descriptor from each category list that best describes their pain, respondents to the SF-MPQ are allowed to rate the severity of each item individually on a 4-point Likert scale (0 = None to 3 = Severe). A severity or intensity score can then be calculated for the Affective subscale (as well as for Sensory and Total scale scores; see “Measuring Pain Quality” section). Research has shown that the correlations between the corresponding scales on the MPQ and SF-MPQ are high (rs range, 0.68 to 0.92).50,51,52

There is a substantial amount of data supporting the validity of the MPQ and SF-MPQ Affective subscales. First, like the other MPQ and SF-MPQ scales, the Affective subscale has been shown to be responsive to pain treatment.53,54,55,56 Additional support for the validity of the MPQ Affective subscale as a measure of the affective component of pain, specifically, was reported by Ahles and colleagues,57 who found that this scale was more strongly associated with measures of psychological distress than with measures of pain intensity. Also, Kremer and colleagues58 reported that patients with cancer report a greater affective component of their pain on the MPQ Affective subscale than patients with low back pain, consistent with the hypothesis that cancer pain may be associated with higher levels of affect (e.g., be more worrisome and cause more fear) than low back pain.


Recommendations for Assessing Pain Affect

Although single-item measures of pain affect or pain bothersomeness have demonstrated validity in highly controlled laboratory studies, supporting their use in this setting, they have shown less discrimination (from single-item measures of pain intensity) in clinical populations. Thus, with clinical populations, when an index of pain affect is needed, clinicians and researchers should strongly consider administering the MPQ or SF-MPQ Affective items. The MPQ Affective subscale, having a longer history than the SF-MPQ, has more empirical support for its reliability and validity. However, given the strong associations between the MPQ and SF-MPQ scales, their high degree of item overlap, and the relative brevity and greater simplicity of the SF-MPQ for scoring, adequate evidence exists to support the use of the SF-MPQ as well.


MEASURING PAIN QUALITY

The experience of pain consists of much more than its magnitude or intensity and affective components. Pain is also often described using a number of different qualities, such as “burning,” “aching,” and “tender,” among many others. Although historically, clinicians and researchers have focused on pain intensity as the single most important pain domain to assess,40 there has been an upsurge of interest in the assessment of pain qualities. The two primary purposes of such measures are (1) to help diagnose the pain problem and (2) to more thoroughly describe the pain experience and determine the effects of pain treatments on that experience.


Using Pain Quality Measures as Diagnostic Aides

A growing body of research supports the conclusion that different pain qualities are associated with different causes, sources, or types of pain. In one study supporting this conclusion, Chang and colleagues59 induced skin pain and muscle pain in human subjects through the use of intracutaneous and intramuscular injection of capsaicin into the left forearm, respectively. Although ratings of global pain intensity were very similar for both the skin and muscle pain, capsaicin injection into skin and muscle produced distinctly different pain qualities, as described by the subjects. When capsaicin was injected into the skin, subjects described their pain as sharp, cutting, and burning; pain induced by intramuscular capsaicin injection was described as throbbing, pulsing, and tingling. The results of this study support the idea that different pain mechanisms or sources of pain produce different pain sensations and that these differences can be reliably assessed through the assessment of specific pain qualities. Also, it is generally thought that different nociceptors and fibers underlie different pain sensations, with the myelinated Aδ delta fibers responsible for localized “sharp,” “stinging,” and “shooting” pain, and the unmyelinated C fibers responsible for less localized dull pain sensations.60,61,62

The four most commonly used measures of pain quality that have been developed specifically to assist in the diagnosis or classification of pain include the (1) Leeds Assessment of Neuropathic Symptoms and Signs (LANSS),63 (2) Self-Report Leeds Assessment of Neuropathic Symptoms and Signs (S-LANSS),64 (3) Neuropathic Pain Diagnostic Questionnaire (DN4),65 and (4) painDETECT.66 Two of these measures (the LANSS and DN4) have both patient self-report and clinician examination items, and two (the S-LANSS and painDETECT) include only patient self-report items.


Leeds Assessment of Neuropathic Symptoms and Signs

The LANSS63 was the first measure designed specifically to distinguish neuropathic from nociceptive pain. The self-report component of the measure consists of five items that ask respondents to indicate, yes or no, if their pain could be described as (1) “[consisting of] strange, unpleasant sensations … like pricking, tingling, pins and needles”; (2) “[making] … the skin in the painful area look different from normal … like mottled or looking more red …”; (3) “[making] … the affected skin abnormally sensitive to touch …”; (4) “[coming] … on suddenly and in bursts for no apparent reason … like electric shocks, jumping and bursting …”; and (5) “[feeling] … as if the skin temperature in the painful area has changed abnormally … like hot and burning….” The sensory testing component asks a clinician to test for allodynia (by lightly stroking a nonpainful and the painful area with cotton wool) and to test for altered pinprick threshold (by comparing the patient response to a 23G needle mounted inside of a syringe barrel placed gently on the skin in a nonpainful area and then in the pain area). Each response is weighted, and the weights of all positive responses are summed to create a total score, with a score of less than 12 indicating an unlikelihood that neuropathic mechanisms are contributing to the patient’s pain, and a score of 12 or greater (out of a total possible score of 24) indicating that neuropathic mechanisms are likely to be contributing to the patient’s pain.


Self-report Leeds Assessment of Neuropathic Symptoms and Signs

One potential drawback to the LANSS, that could limit its use in some clinical and research settings, is that it requires a trained clinician to administer. To address this limitation, a self-report version of the LANSS (S-LANSS) has been developed.64 The S-LANSS includes the same five pain quality items of the LANSS. However, the sensory items were modified to allow patients to self-administer them by gently rubbing the painful and a nonpainful area with their index finger for the allodynia item and gently pressing the painful and a nonpainful area with a fingertip to assess static allodynia.


Neuropathic Pain Diagnostic Questionnaire

The 7-item Neuropathic Pain Diagnostic Questionnaire (DN4)65 was designed to discriminate between neuropathic and nonneuropathic pain. It is administered by a clinician and begins by asking patients if they do or do not experience their pain as having burning, painful cold, or electric shock qualities.


Patients are then asked to indicate if they do or do not experience tingling, pins and needles, numbness, or itching in the same area that they experience pain. Finally, and similar to the LANSS, the evaluating clinician determines if hypoesthesia (decreased sensitivity) to touch or to pinprick exists in the painful area, and whether lightly brushing the area elicits pain. The items are weighted to yield a score that can range from 0 to 10. A score of 4 or greater is used to classify the respondent as having possible neuropathic pain.


painDETECT

The painDETECT (PD)66 consists of nine self-report items assessing seven sensory qualities (burning, tingling, sensitivity to light touch, electrical, sensitivity to temperature changes, numbness, sensitivity to light pressure), the temporal pattern of pain (e.g., persistent with slight fluctuations), and the spatial pattern of pain (i.e., if it does or does not radiate). The responses are scored and weighted to yield a score that can range from 0 to 38. Scores of 19 or more are used to classify the respondent as being likely to have a neuropathic component to their pain, scores of 12 or less as being indicative of being unlikely to have a neuropathic component to their pain, and scores from 13 to 18 are used to indicate an ambiguous result.


Strengths and Weaknesses of Pain Quality Measures as Diagnostic Aids

A growing body of research has examined the metrics of the LANSS, S-LANSS, DN4, and painDETECT, which has been summarized in review articles.67,68 These reviews have noted that evidence supports the ability of each one of these measures to distinguish neuropathic from nonneuropathic pain in many patient groups, but that their accuracy (including sensitivity, or ability to identify someone as having neuropathic pain, and specificity, or ability to identify someone as not having neuropathic pain) can vary a great deal as a function of the clinical population being considered. Among the measures that include clinician’s ratings, the DN4’s overall accuracy has been shown to be the best on average (with sensitivity ranging 76% to 100% and specificity ranging from 45% to 92%) for many populations (see Canadian Agency for Drugs and Technologies in Health67). As indicated by these ranges, specificity tends to be lower than sensitivity with the DN4 and has been reported to be particularly low in individuals with leprosy (45%), patients with mixed chronic pain conditions (57%), and patients with a history of breast tumor resection (60%), even when sensitivity is good to excellent in these same conditions (100%, 87%, and 90%, respectively) (see Canadian Agency for Drugs and Technologies in Health67). In addition, the DN4 lacks accuracy for individuals with failed back syndrome (sensitivity: 62%; specificity: 44%; see Canadian Agency for Drugs and Technologies in Health67). On the other hand, the DN4 was reported to be superior to the LANSS and painDETECT for classifying patients with cancer and spinal cord injury as having neuropathic pain or not (see Canadian Agency for Drugs and Technologies in Health67). The tendency for the DN4 to be more accurate for classifying patients was also noted in a review by Mathieson and colleagues.68

The available research evidence does not provide strong support for the use of one of the self-report measures (i.e., S-LANSS or painDETECT) over the other, when a self-report measure is needed67; either could be selected and used. However, it is important to remember that all of these measures are screening questionnaires only and should not be used in an attempt to provide a definitive diagnosis of neuropathic pain.69


Pain Quality Scales as Descriptive and Outcome Measures

Pain quality measures may also be used to describe the pain associated with different pain conditions as well as to identify the effects of pain treatments on various qualities of the pain experience. To the extent that different pain qualities are linked to different pain mechanisms, then understanding the effects of treatments on those qualities may be used to better understand the mechanisms of those treatments. In addition, given the evidence (reported later) that different pain treatments have different effects on various pain qualities, pain clinicians could potentially use pain quality assessment for helping to select from among different treatment options. For example, clinicians may offer patients reporting their pain as primarily “aching” those treatments shown to impact “aching” pain most effectively, while providing patients who describe their pain as “electrical” with treatments that have been shown to reduce “electrical” pain sensations.70,71

To date, six measures have been developed to assess pain quality and have been used as outcome measures in clinical trials. They include (1) the MPQ,49 (2) the SF-MPQ,50 (3) the Revised Short-Form McGill Pain Questionnaire (SF-MPQ-2),72 (4) the Neuropathic Pain Symptom Inventory (NPSI),73 (5) the Neuropathic Pain Scale (NPS),74 and (6) the Pain Quality Assessment Scale (PQAS)75 and its slight modification, the Revised Pain Quality Assessment Scale (PQAS-R).76


McGill Pain Questionnaire

The MPQ was introduced previously in the context of assessing pain affect. In addition to assessing the affective component of pain, the 78 MPQ descriptors can be scored to assess sensory pain (10 sensory categories, such as temporal, punctuate pressure, and thermal pain, assessed using 42 descriptors), evaluative pain (one category, assessed using 5 descriptors), and miscellaneous pain (four categories that do not clearly fall into sensory or affective components, assessed using 17 descriptors).49

As described previously, respondents are asked to select the single descriptor from each of the 20 categories (the number of descriptors listed per category varies from 2 to 6) that best describes his or her pain and the rank order of the descriptors in each category are summed to compute sensory, affective, evaluative, miscellaneous, and total scores.

Support for the usefulness of the MPQ comes from the fact that it has been used in hundreds of studies and has been translated into at least 20 languages.77 Moreover, a three-factor (sensory, affective, and evaluative domains) structure of the MPQ has been confirmed in two studies,78,79 although the high degree of association among these subscales suggests some limitations in discriminative validity of the different MPQ scales.78 In support of the measure’s validity, the MPQ scales have demonstrated validity as outcome measures given their responsivity to changes produced by pain treatments.56,80,81

A number of studies have examined the reliability of the MPQ. In populations of patients with cancer pain, studies have found that responses to the MPQ are generally consistent over the time span of several days.82,83,84 In a study with patients with low back pain, Love and colleagues83 found adequate test-retest stability for the MPQ scale scores (Total: r = 0.83; Sensory: r = 0.76; Affective: r = 0.78) over the course of several days.

Despite the many strengths of the MPQ, it also has some important limitations. First, although it has been reported to take only 5 minutes to complete by someone who is very familiar with the measure, the MPQ includes a large number of descriptors that are rarely used by individuals with pain; 78 descriptors suggests a very high degree of content validity, but so many descriptors may not be needed to adequately describe pain quality in many populations.

A second limitation of the MPQ concerns the way it is scored. Although it probably makes sense to combine multiple affective responses into a composite Affective subscale, there are limitations in combining a large number of different sensory descriptors into a composite Sensory subscale. Primary among these is the possibility that such a procedure does not allow
investigators to detect the impact of treatments on specific unique pain quality domains or descriptors. Thus, a significant effect of a pain treatment on the MPQ Sensory subscale could have been due to its modest effects on many different pain qualities, or a large effect on just a few. One of the important reasons to assess pain quality in clinical trials is to determine the effects of treatment on specific pain qualities; scoring the MPQ descriptors into composite scales does not allow for this.

Also, because it is unlikely that pain treatments impact all pain qualities in the same way, the use of composite pain quality scores, based on many different items (recall that the MPQ Sensory subscale assesses 10 quality domains using 42 descriptors), runs the risk of reducing one’s ability to detect significant effects. When using composite measures in clinical trials that include items that are not affected by treatment, or are affected only minimally, the effect size for the total scale is reduced. Indeed, when differences in responsivity to treatment are found, the MPQ scale scores tend to be less responsive to treatment effects than single-item pain intensity ratings.85,86,87


Short-Form McGill Pain Questionnaire

The SF-MPQ was developed in order to balance the need for pain quality data against the need to minimize assessment burden (see Fig. 20.2).50 As previously mentioned in the “Measuring Pain Affect” section, the SF-MPQ consists of 15 descriptors, each of which can be rated on a 4-point severity scale from none to severe. Eleven of the descriptors assess sensory pain (throbbing, shooting, stabbing, sharp, cramping, gnawing, hot-burning, aching, heavy, tender, and splitting), and, as described previously, four items assess affective pain.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Sep 21, 2020 | Posted by in PAIN MEDICINE | Comments Off on Measurement of Pain

Full access? Get Clinical Tree

Get Clinical Tree app for offline access