UK General Population Utility Values for the SIDECAR-D Instrument Measuring the Impact of Caring for People With Dementia

Objectives: Dementia affects many people, with numbers expected to grow as populations age. Many people with dementia receive informal/family/unpaid care, for example, from a spouse or child, which may affect carer quality of life. Measuring the effectiveness of health/social care interventions for carers requires a value measure of the quality-of-life impact of caring. This motivated development of the Scales Measuring the Impact of Dementia on Carers-D (SIDECAR-D) instrument. This study aimed to obtain general population values for SIDECAR-D to aid incorporating the impact of caring in economic evaluation. Methods: Members of the UK general public completed a best – worst scaling object case survey, which included the 18 SIDECAR-D items and EQ-5D-3L descriptions. Responses were analyzed using scale-adjusted ﬁ nite mixture models. Relative importance scores (RISs) for the 18 SIDECAR-D items formed the SIDECAR-D relative scale measuring the relative impact of caring. The SIDECAR-D tariff, on the full health = 1, dead = 0 scale, was derived by rescaling EQ-5D-3L and SIDECAR-D RISs so the EQ-5D-3L RISs equaled anchored valuations of the EQ-5D-3L pits state from a visual analog scale task. Results: Five hundred ten respondents completed the survey. The model had 2 parameter and 3 scale classes. Additive utility decrements of SIDECAR-D items ranged from – 0.05 to – 0.162. Utility scores range from 0.95 for someone af ﬁ rming 1 item to – 0.297 for someone af ﬁ rming all 18. Conclusion: SIDECAR-D is a needs-based scale of the impact on quality of life of caring for someone with dementia, with a valuation tariff to support its use in economic evaluation.


Introduction
Dementia is a syndrome linked to a range of progressive brain diseases causing gradual decline in mental abilities and increasing functional difficulties. People with dementia often have care provided by family or friends. Being a caregiver can place a large burden on people in time and resources, disruption to daily life, and psychological effects. 1 Dementia mainly affects those aged over 65, and increased life expectancy means that globally the number of people diagnosed is expected to rise from 47 million in 2015 to 3 times that by 2050. 2 As the number of people with dementia increases, so too does the societal burden, meaning greater need for support and services targeting people with dementia and their carers.
By carer we refer to any person providing care for someone with dementia who is not formally employed to do so. This definition encompasses the terms informal carer, family carer, and unpaid carer used elsewhere.
Efficient and optimal allocation of resources in this area requires preference-based quality-of-life measures, for both people with dementia and their carers. The relationship between severity of dementia and impact on a carer's quality of life is nonlinear. For example, a worsening of the condition may lead to greater provision of formal support, or the person moving into a nursing home, reducing carer burden. Carers may also adjust to their caring role over time, meaning their quality of life improves as severity of dementia increases. 3 Thus, extrapolating the impact on carers from knowledge about people with dementia is impossible: carers' quality of life must be measured directly.
Assigning values to the impact of caring allows the comparison of different individuals, different points for individuals, and the effectiveness of carer-targeted interventions. In addition, interventions targeted at people with dementia themselves may improve quality of life for their carers, termed spillovers. Recent National Institute for Health and Care Excellence and US panel guidance 4,5 on cost-effectiveness highlights the need to develop methods for valuing spillovers to formally consider them within economic evaluation, to determine the true overall benefit of interventions, and make decisions that maximize benefits from scarce healthcare resources. 6 In many health systems, including the UK National Health Service, 7 the preferred measure of health benefit is qualityadjusted life-years (QALYs), a composite endpoint comprising both quality and length of life. Quality is captured in utility or preferences for health states where 1 is defined as full health, 0 is defined as equivalent to dead, and negative values represent health considered worse than dead. Thus, 1 QALY is the equivalent of living 1 year in full health.
Because caring for someone with dementia can have an impact on all aspects of an individual's life, generic health measures such as EQ-5D may detect certain aspects of health-related caring impact, for example, preventing people from performing their usual activities. Health-related quality-of-life measures have been used with family carers in economic evaluations, including in dementia. 8 Nevertheless, there is concern they capture only limited aspects of the impact of caring on quality of life. 9,10 Measures including care-related items, such as relationships, fulfillment, and support, might be more appropriate for capturing carer quality of life. Many such general measures exist, 11,12 but few have preference-based scoring algorithms for use in economic evaluation.
The Adult Carers Quality of Life questionnaire 13 has been successfully employed in assessing carer services, 14 including for carers of people with dementia. 15 Nevertheless, Adult Carers Quality of Life questionnaire items have not been valued, meaning the relative impact of each cannot be assessed. The CarerQoL 16 and Carer Experience Scale (CES) 17 have been valued using choicebased methods. Nevertheless, the outcome scores cannot be obviously compared to, or summed with, patient QALYs, because they are not on the full health = 1, dead = 0 scale. The Dementia Quality of Life Scale for Older Family Carers 18 is specific to dementia carers, but only a subset (older family members). This neglects many younger carers, for example, those caring for a parent, for a spouse with young-onset dementia, or for friends. Valuation is also not available for the Dementia Quality of Life Scale for Older Family Carers.
There is hence a need for a scale assessing the impact on quality of life of caring for someone with dementia that reflects their experiences, summarizes the relative severity of the quality-of-life impact, and generates QALY values for economic assessment. The DEmentia Carers Instrument DEvelopment project created the SIDECAR (Scales measuring the Impact of DEmentia on CARers) instrument. SIDECAR was developed using a needs-led approach, with items generated from carers only, using exact phrases where possible. Development and evaluation followed Consensus-Based Standards for the Selection of Health Measurement Instruments, 19 including interviews (N = 42), 20,21 to generate an item pool and psychometric evaluation of the initial item pool. 22 This resulted in 3 instruments, SIDECAR-D (18 items), which measures the direct impact of caring for someone with dementia; SIDECAR-I (10 items), which measures the indirect effect; and SIDECAR-S (11 items), which measures support and informational needs. 22 This article details a valuation study of SIDECAR-D. Two measures are presented, the SIDECAR-D relative scale (SIDECAR-D-RS), which measures the relative impact of caring for someone with dementia on a 0 to 100 scale, and the SIDECAR-D tariff, which gives the impact on a scale with full health = 1 and dead = 0. The SIDECAR-D-RS is an easy-to-understand relative scale that can be used to compare groups of carers or one group over time. The SIDECAR-D tariff, being anchored to the scale usually used in analyzing health and care interventions, may be used to compare carer interventions with interventions in other fields, and to evaluate carer spillovers. Different ranges for the measures were deliberately chosen to emphasize that they are on different scales.

Survey Development
SIDECAR-D items are scored using a binary agree/disagree response format. The time frame reference is "today." Examples are "I don't take very good care of myself" and "I often feel I want to escape my caring responsibilities." The full list of items is not publicly available, although SIDECAR is free for use in public health, social care, voluntary sector, and not-for-profit organizations following registration (to use SIDECAR, please register with the University of Leeds Fast Licensing Platform at www.licensing. leeds.ac.uk).
The valuation exercise used best-worst scaling object case (BWS-OC) (also known as BWS case 1). It elicits relative preference for many items by presenting a small subset, using the simple task of survey participants choosing which item is best (or alternative term depending on context) and which is worst. Although the related method of BWS profile case has been used for valuation 17,23 and some BWS-OC studies in healthcare exist, 24 we are not aware of a study using BWS-OC to value a survey instrument.
Studies valuing items on a full health = 1, dead = 0 scale often use time-trade off (TTO) 25 or standard gamble (SG). 26 Best-worst scaling object case was carefully chosen to suit SIDECAR-D. In TTO and SG, survey participants are shown health states or vignettes constructed from the items being valued. In the former, they identify the length of life in full health that they consider equivalent to 10 years in the state being valued. In the latter, they state the highest probability of instant death they would risk to live 10 years in full health rather than live 10 years in the state being valued.
Both methods were unsuitable here. Completing SIDECAR-D entails asking individuals to agree or disagree with a list of statements, making it difficult to create descriptions of states without using negations of the statements that do not form part of the instrument. It would be possible to create vignettes by listing the statements agreed with, but there are 18 items, some comprising lengthy sentences originating in voices of carers themselves, reflecting the realities and nuances of their experiences. Thus vignettes for severe SIDECAR-D states would be overly long and difficult for respondents to take in. In addition, the tradeoff between quality and length of life is not as intuitive for carers as it is for patients, because their own condition is not being valued. Caring for someone means that to some extent someone is dependent on them, which may also influence their willingness to trade the possibility of death. Finally, there are 2 18 = 262 144 SIDECAR-D states, making valuing any meaningful subset of them using TTO or SG impractical.
Another alternative is discrete choice experiments (DCEs), 27 in which individuals choose which of 2 options they prefer. Discrete choice experiments can value a large number of states, and there is some evidence that they are more reliable than BWS profile case. 28 Although they could in theory be employed here, choosing between 2 potentially lengthy SIDECAR-D vignettes would present an impractical burden for survey participants.
The five EQ-5D level 3 descriptions were added to the 18 SIDECAR-D items and a statistical design constructed using the crossdes package for R. A design was created that balanced in order of priority (1) items, (2) pairs of items, and (3) items in a given position appearing an equal amount of times. The design had 6 versions, each with 8 questions. Participants were shown 6 items and indicated which would have the most negative and least negative impact on their quality of life. For an example, see the survey instrument which is included as Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.04.1827.
Participants also completed 2 visual analog scale (VAS) tasks, rating health states on a scale from 100 (the best health you can imagine) to 0 (the worst health you can imagine). A screenshot is included in the survey instrument provided as Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.04.1827. Participants rated their own health today, then rated 11111, 33333, and dead. From this task a valuation of 33333 on the full health = 1, dead = 0 scale was obtained and used to anchor valuations. Finally, participants answered questions about themselves (age, sex, etc.) to assess the sample's representativeness. For details, see the survey instrument provided as Supplemental Materials found at https://doi.org/10.1016/j.jval.2020.04.1827.
The survey was tested with 10 carers, including carers for people with dementia and wording of questions and instructions refined in response to feedback. The survey was administered online, through a survey company, and collected a representative UK general public sample from an existing panel. The survey was piloted with 50 respondents. Responses were gathered between August 22, 2018, and September 2, 2018, with a recruitment target of 500 (including 50 from the pilot, because no further changes were made). Formal power calculations were not performed; however, the sample size was considered more than adequate to estimate robust statistical models based on past experience and previous literature. 24

Analysis
The decision utility to a participant of selecting item i as having the most negative impact on quality of life is modeled as being with b a a vector of coefficients, x ia a vector of dummy variables indicating whether item i contains attribute a, and ε i an independent and identically distributed error term following a Gumbel distribution. The decision utility of selecting item j as the least negative impact is u j ¼ 2 ðV j 1 ε j Þ. Note that decision utilities are on a different scale than quality-of-life utility because it is convenient to model individuals receiving positive decision utility payoffs for identifying the most negative quality-of-life impact. It is assumed that individuals first select the item having the most negative impact, then the item having the least negative impact. The probability of a given choice is then Pði most negative; j least negativeÞ ¼ e sV i P k e sV k e 2sV j P ksi e 2sV k : Where s $ 0 is the response scale indicating how much responses are explained by the deterministic part of the model and how much by the random part. 29 Coefficients were transformed to relative importance scores (RISs) using Orme's method. 30 Following Zhang et al, 31 the link between the relative importance, I a , of an attribute in BWS-OC and level coefficients is modeled as The anchoring approach described above requires individuals to give logically consistent VAS responses, that is, VAS 11111 ,VAS 33333 and VAS 11111 ,VAS dead (though it does not preclude VAS dead ,VAS 33333 , that is, 33333 considered worse than dead). Thus respondents giving illogical responses were excluded from analysis. Scale-adjusted finite mixture models 32 were estimated, which allow for n P classes of response parameters and n S scale parameters, with the first normalized to 1. The probability of belonging to a given parameter (scale) class p (s) is e qp = P nP i¼1 e q i ðe f s = P n S i¼1 e f i Þ with the qs and fs parameters to estimate and q 1 ¼ f 1 ¼ 0.
Sixteen models were estimated with between 1 and 4 preference and between 1 and 4 scale classes. The final preferred model was the one minimizing the Bayesian Information Criterion.
The SIDECAR-D tariff, anchored to the full health = 1, dead = 0 scale, is calculated by taking the mean anchored valuations for each parameter and scale class weighted by probability of class membership. Similarly, the SIDECAR-D relative scale, giving the relative impact on an individual on a 0 to 100 scale, is found by taking a weighted mean over classes of RIS regarding only SIDECAR-D items. Note that SIDECAR-D-RS and the SIDECAR-D tariff have different endpoints as well as moving in opposite directions: a high relative scale score implies a large quality-of-life impact, whereas a high tariff value implies a low impact.
Statistical significance was judged at the 5% level after adjustment using Holm's sequential Bonferroni correction. 33 Design and analysis was conducted using R version 3.3.1, with models estimated using the Choice Modelling Centre Code for R version 1.1. 34

Results
Five hundred ten respondents completed the survey, of which 38 (7.45%) were excluded for illogical VAS responses, leaving 472 for analysis.  Table 2 summarizes the analysis sample's BWS-OC choices. Five of the top 6 differences in the most negative and the least negative responses are for EQ-5D items. These were chosen as having the most negative impact on quality of life far more often than they were chosen as having the least negative impact. At the other end of the scale, several items were chosen as having the least negative impact far more often than the most negative impact. This variation is an indication participants understood the tasks and could make meaningful distinctions between the impacts of different items. Table 3 gives models' Bayesian Information Criterion. The optimal fit had 2 preference and 3 scale classes. Table 4 gives the model coefficients. The second scale class has the largest scale parameter, that is, lowest error variance (14.6), followed by the third (7.61), then the first (normalized to 1). Respondents were most likely to belong to the second class (probability = 0.44), then the first (probability = 0.31), then the third (probability = 0.25). There are differences between the 2 parameter classes. In particular, in class 2 the EQ-5D items have the 5 largest coefficients, whereas in class 1, there is a SIDECAR-D item in the top 5, and usual activities has only the tenth highest coefficient. Respondents were most likely to belong to parameter class 2 (probability = 0.59). Table 5 shows the SIDECAR-D relative scale and tariff, with Figure 1 showing utility decrements for SIDECAR-D items and EQ-5D level 3 descriptions. The largest utility decrement for an item is

Discussion
This study successfully assigned values to SIDECAR-D items. Values of some items on the full health = 1, dead = 0 scale are considerable. For example, item 5, "I dread the future," has a value  of -0.162, a greater impact than EQ-5D-3L level 3 descriptions for self-care and usual activities. Even SIDECAR-D items of lower magnitude are comparable to level 2 coefficients from the EQ-5D five-level value set for England. 36 Previous exercises valuing CES and the ICEpop CAPability measure (ICECAP), both aimed at a population with some overlap with this one, also found large utility decrements. 17,23,37 Nevertheless, CES/ICECAP define the lowest state in the descriptive system as 0, thus values are not comparable, and as decrements must sum to -1, large decrements are found by construction.
This highlights that the UK general population recognize that caring for someone with dementia can have a large impact on the carer's life. Items generated from the voices of carers resonate with and are understood by the general public, reaffirming the value of a bottom-up, carer-led process in developing a measure.
Comparing the conceptual nature of BWS-OC to alternatives, TTO may be thought of as giving a valuation for a particular state, whereas DCE responses can be analyzed to provide a valuation of a descriptive system, both from the perspective of an average member of the population. BWS-OC responses when analyzed  give the relative importance of a set of items, again from the average population perspective.
Comparing what these values mean, BWS-OC is sometimes used solely as a user-friendly alternative to ranking items, or as a replacement for a VAS/rating scale. 38 Nevertheless, it is possible to extract much more information from responses, because BWS-OC can be grounded in random utility theory, 39 in the same way as DCEs. Conceptually, this means participants' responses to BWS-OC tasks reflect the same underlying utility scale as DCE, TTO, and SG tasks. Empirical evidence supports BWS-OC tasks measuring the same utilities as DCEs (up to a scale transformation). 31,40 Values from BWS-OC are thus a utility-based measure, satisfying the National Institute for Health and Care Excellence's requirements for measures suitable for use in health technology assessment. 4 This study compares the specific impact on carers, represented by SIDECAR-D, and general health impacts, represented by the EQ-5D. The conceptual validity of doing so rests on both reflecting the same underlying latent utility scale. SIDECAR-D hence contrasts with, for example, ICECAP, which seeks to measure capability. 41 Previous studies have illustrated the feasibility of relating condition-specific measures and the EQ-5D. An example is exercises mapping the cancer-specific measures European Organisation for Research and Treatment of Cancer (EORTC) and Functional Assessment of Cancer Therapy-General (FACT-G) to the EQ-5D. [42][43][44] Another example is bolt-ons to EQ-5D, which add another, condition specific, dimension, for example, vision. 45 In exercises valuing bolt-ons, survey participants are presented with combinations of EQ-5D and specific items, analogous to the exercise presented here.
Quality-adjusted life-years calculated using the SIDECAR-D tariff are potentially very different from those generated from health-related instruments such as the EQ-5D, yet both are on the same scale. Thus, in principle at least, interventions targeted at carers for people with dementia or spillovers from interventions for people with dementia may be directly and meaningfully compared to health-focused interventions. This is potentially of great benefit, especially given recognition of the importance of addressing the needs of a household or family to sustain caring for people with dementia in the community. There are also movements toward greater integration of health and social care, 46,47 implying a greater need to compare disparate interventions to optimally allocate resources. It is hoped measures such as SIDECAR-D can aid such comparisons.
There were several limitations to this study. Some SIDECAR-D items may capture some of the essence of EQ-5D descriptions using more vivid language. For example, item 15, "caring prevents me from fulfilling my other activities," sounds much like the EQ-5D description "I am unable to perform my usual activities." Although being "prevented" from doing something is not the same as being "unable" to do something, it is not clear that all participants made that distinction. Thus using SIDECAR-D in combination with EQ-5D could result in double-counting of quality of life. Nevertheless, double-counting remains a complex issue and depends on how a decision maker chooses to integrate the measures. In addition, SIDECAR-D items were each effectively valued in isolation. Hence, it may be that if a combination of several items were valued as a profile that valuations would differ.
The respondent sample is reasonably representative of the UK population, and there is no evidence that removing some responses from the analysis systematically excludes any particular section of society. Nevertheless, certain groups, in particular ethnic minorities, are not present in large numbers, and thus results may not accurately reflect their views.
A concern about using a general population sample might be that they were more familiar with EQ-5D items valuing general health than with carer-specific SIDECAR-D items. It is true that non-carers may value SIDECAR-D items differently than carers who have experienced them. Nevertheless, this is a well-known phenomenon, and is not limited to this instrument. 48 In addition, very few respondents will have had personal experience of the EQ-5D level 3 descriptions. SIDECAR-D items are written in plain language, and reflect the words of ordinary carers. Thus they should be clear even to those without direct experience of caring, lowering the probability that general population respondents neglected them owing to unfamiliarity. The data bear this out: SIDECAR-D items were ascribed meaningful utility decrements.
There is much potential for future research building on this study. The 2 SIDECAR-D scoring systems (relative scale and tariff) could be used to evaluate services and interventions targeted at carers for people with dementia. Another potential use is evaluating services and interventions targeted at people with dementia themselves. For example, enabling people with dementia to be more independent can impact the burden of caring, thus improving carer quality of life. Existing tools and methods can struggle to capture such spillovers, 8 thus SIDECAR-D may be a useful instrument to measure the wider impact of interventions.
The usefulness of SIDECAR-D in evaluation is an empirical question requiring further research. It will be important in the future to examine the validity of SIDECAR-D for ability to discriminate between groups of carers with established differences in quality of life. Other issues are how SIDECAR-D performs in practice in feasibility and whether interventions are capable of changing carer responses. It should also be investigated whether SIDECAR-D captures quality of life sufficiently alone, or whether specific health-related instruments (eg, EQ-5D) should be administered as well. Finally, future work should address to what extent SIDECAR-D captures spillovers, and to what extent the interventions that it is used to evaluate tend to be cost-effective. Future research could address the validity of using SIDECAR to evaluate services/interventions for non-dementia carers, although SIDECAR was designed for carers for people with dementia, and at present is intended for use in this population only. The participants whose voices were used in the development of the measure were carers for people with dementia, and thus their concerns and priorities may not be the same as for other carers. Nevertheless, only 1 item refers specifically to dementia ("almost all of my conversations are about dementia or caring"), meaning it would be simple to adapt for other carer populations.
Finally, this study only assigns values to SIDECAR-D, and the development of SIDECAR demonstrated that the indirect impact of caring, and support and informational needs are captured by SIDECAR-I and SIDECAR-S. Future studies could value the other 2 instruments to help capture the full burden of caring for someone with dementia.

Conclusion
This study presents an exercise valuing an instrument assessing the impact on quality of life of people who care for someone with dementia. Two scoring systems are the result, the SIDECAR-D relative scale measuring the relative impact on a scale from 0 to 100, and the SIDECAR-D tariff, measuring the impact on a scale with full health = 1 and dead = 0. Error bars show 95% confidence intervals. --