Dopamine and motivational state drive dynamics of human decision making

The mesolimbic dopaminergic system exerts a crucial influence on normal motivated behaviour, but the mechanism of this action in dynamic situations where decisions evolve over time remains unclear. In such circumstances, current (foreground) reward accrual rate needs to be compared continuously with potential rewards that could be obtained elsewhere (background reward rate) in order to determine the opportunity cost of staying or leaving. We hypothesised that dopamine levels specifically modulate the influence of background – but not foreground – reward information in a decision-making task that requires dynamic comparison of these variables for optimal behaviour, and that this effect would be disrupted in individuals with loss of motivation – apathy. We developed a human foraging task based on a normative theory of animal behaviour (marginal value theorem), in which participants decide when to leave locations in which rewards decreased over time in order to pursue greater returns in their environment. People’s decisions to move from current locations conformed closely to foraging principles. Pharmacological manipulation of dopamine D2 receptor activity in healthy individuals using the agonist cabergoline significantly modulated background, but not foreground, reward sensitivity. In a separate study, this same effect was observed in patients with Parkinson’s disease, dependent on presence of apathy. Using an ecologically derived framework we demonstrate a specific mechanism by which dopamine modulates dynamic human decision-making, and how impairment of this mechanism can contribute to pathological loss of motivation.

The mesolimbic dopaminergic system plays a crucial role in motivating behaviour towards 2 goals and has been closely linked to neural circuits which convey information about rewards 3 (1)(2)(3)(4)(5)(6). Several experiments across species have demonstrated a crucial role for dopamine in 4 overcoming costs to obtain rewards (2,3,7, 8) and for learning about reward outcomes to 5 update future behaviour (9,10). Tasks probing dopamine function typically require an agent 6 to make binary decisions between presented options, based on learning the contingent 7 relationship between stimuli and rewards, or an integration of cost and reward information 8 (8,9,11). However, animal models increasingly highlight that dopamine signals change 9 during on-going behaviours and carry information that is not exclusively tied to reward 10 predicting cues (1,12,13). 11 12 Moreover, in many real-life environments choices are not between binary options, but instead 13 evolve over time, involving decisions of whether to stay at the current location or switch to 14 an alternative one to maximize reward collection (14,15). Such dynamic decision-making 15 requires continuous comparison between current (foreground) reward rate relative to the 16 alternative (background) reward rate available in an agent's environment (16)(17)(18). However, 17 despite the clear ecological significance of such foreground vs background decision making 18 for normal motivated behaviour, the role of dopamine in modulating these processes -and 19 particularly when to switch from a current activity to pursue greater rewards in the 20 background environment -remains unclear. 21 22 Based on work examining the relationship between speed of movement (vigour) and 23 dopamine in animal models, it has been proposed that tonic (slower-changing) dopamine 24 signals encode information about environmental richness, and therefore background reward 25 rate (19). This theory is supported by recent voltammetry experiments linking slow (minute-26 by-minute) changes in dopamine levels to an experimental rodent's reward environment (1), 27 and evidence of changes in motor vigour as dopamine state varies in humans (3,8,20,21). 28 However this link has been questioned (22), and it remains unknown whether the proposed 29 link between tonic dopamine and vigour of movements applies to more abstract -but 30 ecologically crucial -decisions about when to switch location based on foreground and 31 background reward rates. Nor is it clear whether these principles would apply to how humans 32 make such decisions.

24
All participants were administered a computer based patch-leaving task in which they had to 25 decide when to move on from a current patch. The task design specifically manipulated the 26 background and foreground reward rates, in line with the predicted effects according to MVT 27 (23,24). 28

29
The task was framed as a farming game in which participants had to collect as much milk 30 (reward) as possible -this would be sold at a market at the end of the game and their 31 financial remuneration would therefore be according to the milk accrued. Participants spent a 32 fixed time (10 minutes) in each of two farms, collecting milk from fields of cows and making 33 decisions of whether to move on (leave the field for the next one) (Figure 1). Moving on to 1 the next field incurred a time cost of 6 seconds, during which no milk could be collected. 2 3 To manipulate the foreground reward rate, there were three field-types, which returned milk 4 at high, medium and low rates, which exponentially decayed over time in the field. The field-5 type was indicated by the rate at which the bucket on the screen filled. The distribution of 6 these field-types within a "farm" determined the background reward rate. (A) Participants had to decide how long to remain in their current patch (field), in which reward (milk) was 12 returned at an exponentially decreasing rate (displayed on the screen by continuous filling (white bar) of the 13 silver bucket), before moving on to the next patch, which incurred a fixed cost of 6 seconds during which they 14 could collect no reward. Their goal was to maximise milk return across the whole experiment. The 15 instantaneous rate of bucket filling indicated the foreground reward rate, whilst the coloured frame indicated 16 the distribution of different patch types, and thus the background reward rate. Participants were aware they 17 had approximately 10 minutes in each environment, but were not shown any cues to indicate how much total 18 time had elapsed. Following a leave decision, a clock ticking down the 6 second travel time was presented. (B)

19
Three foreground patch-types were used, differing in the scale of filling of the milk bucket (low, medium and 20 high yield), which determined the foreground reward rate. Two different background environments (farms) were 1 used, with the background reward rate determined by the relative proportions of these patch-types. The gold 2 farm contained a higher proportion of high yield fields, and a lower proportion of low yield ones, meaning it had 3 a higher background reward rate than the green farm, which had a higher proportion of low yield fields. (C) 4 According to MVT participants should leave each patch when the instantaneous reward rate in that patch (grey 5 lines) drops to the background environmental average (gold and green dotted lines). Therefore, people should 6 leave sooner from all patches in rich (gold dotted line) compared to poor (green dotted line) environments, but 7 later in high yield compared to low yield patches. Crucially, these two effects are independent from each other.

9
On the "rich" farm (signalled by a gold border on the screen) 50% of encountered fields were 10 high yield, 30% were medium yield and 20% were low yield. On the "poor" farm (signalled 11 by a green border) 50% of encountered fields were low yield, 30% medium and just 20% 12 high yield. Thus, the background reward rate was lower on the green farm than the gold farm. 13 Participants were aware that an unlimited number of fields were available to them, but for 14 only a fixed amount of time. The influence of foreground and background reward rates, and 15 where relevant dopamine and apathy, on patch leaving time was analysed using a linear 16 mixed effects model (LME) -see Methods for further details. 17 18

Healthy human foragers are guided by MVT principles 19
Within MVT the foreground and background reward rates should have independent effects 20 on how long an individual remains in a patch. Participants should leave low yield patches 21 sooner than high yield patches, and patches in rich environments sooner than patches in poor 22 environments. In line with these hypotheses, we found a main effect of foreground reward, a 23 main effect of background reward, but no interaction on participants' (N = 39) decisions 24 about when to leave their current patch (Foreground: F(1,74.6) = 528, p < 0.0001 ; 25 Background: F(1,37.5) = 40, p < 0.0001; Foreground × Background: F(1,1929) = 1.6, p = 26 0.2; Supplementary Table 1A). Furthermore, participants' behaviour conformed to 27 predicted directionality of these effects, with higher patch yield, and poor compared to rich 28 background environment, both leading to later patch leaving times (Figure 2A & 2B). 29 30 Are healthy people optimal foragers? 31 Although participants showed effects in the directions predicted by MVT, we wanted to 32 know whether the magnitude of these effects conformed to foraging theories, which stipulate 33 exactly the optimal time to leave each patch (Supplemental Figure 1). All participants 34 showed a significant bias to remain longer across all patch types (across both environments) 35 than optimal, on average leaving 8.0s later than MVT predictions (t38 = 8.4, p < 0.001, 1 Supplemental Figure 2A & B). However, it has been noted that non-human primates also 2 show a bias to stay, but are close to optimal once controlling for this bias, for example by 3 analysing the relative changes across conditions (35). Therefore for each participant we 4 subtracted their own mean leaving time from each of their patch leaving decisions, and 5 calculated the magnitude of the background (poor − rich) and foreground (high − low yield) 6 reward rate effects (Figure 2B & 2D).  predicted direction, with participants leaving on average 4.7s later as patch-type varied, and 3.6s later in poor 9 compared to rich environments. There was more variation between individuals in the effects of changing 1 background, compared to foreground, reward rates. (C) The foreground (patch) reward rate at which 2 participants chose to leave each patch varied as a function of background environmental richness (rich vs poor).

3
(D) The magnitude of this background environment effect was close to optimal (as predicted by the marginal 4 value theorem). Foreground reward rate at leaving did vary across patch-type (indicating a degree of suboptimal 5 behaviour) (C), driven by participants leaving high yield patches at a lower reward rate compared to medium 6 and low yield patches, which did not differ significantly. Error bars are ± SEM. MVT makes two core predictions about behaviour as foreground and background reward 10 rates change, which can be used to assess optimality of foraging behaviour (independent to 11 any systematic bias to remain in patches longer -Supplemental Figure 1). Firstly, as the 12 background environment varies (poor vs rich), the reward rate at leaving a given patch-type 13 should differ by this same amount. Secondly, foragers should adjust their leaving time as 14 patch quality varies, such that the instantaneous reward at leaving is the same in each patch 15 (for a given background environment). That is within an environment, each patch should be 16 left, regardless of its yield, when the rate at which milk is being accrued is the same. 17 Strikingly, participants varied their leaving times as background environment changed, such 18 that the difference in reward rate between the two conditions was very close to the actual 19 difference in background reward rates (mean difference in reward rate at leaving = 3. 33,20 actual difference between environments if behaving optimally = 3.30, t38 = 0.07, p = 0.95, 21

28
Thus, participants' sensitivity to changes in foraging parameters was close to optimal 29 predictions, adjusting leaving times in response to changes in their background environment 30 to closely match the actual changes in background reward rate. They also adjusted their 31 leaving behaviour such that the reward rate at leaving did not differ between low and medium 32 yield patches, although they tended to leave high yield patches later (i.e. after patch reward 33 rate had dropped further). 34 35

Cabergoline alters the use of background reward information to guide patch leaving 1
Having demonstrated that healthy human patch leaving behaviour is aligned with the 2 predictions of MVT, particularly in response to changes in background reward rate, we next 3 examined whether dopamine modulates the effect of background reward rate (environment) 4 on patch leaving behaviour. Using a within-subjects design, leaving times for 29 healthy, 5 elderly people on placebo or following administration of the D2 receptor agonist cabergoline 6 (which stimulates post-synaptic D2 receptors (36)) were analysed using a LME model. There 7 was a significant interaction between drug state and the effect of background reward rate on  Therefore cabergoline had a specific rather than general effect on patch leaving behaviour, 1 altering only the influence of background reward rate on leaving time. This suggests that 2 manipulating dopamine levels in healthy people alters patch leaving decisions by modulating 3 sensitivity to average reward rates. 4 5 6

Dopamine and apathy influence the effect of reward context in Parkinson's disease 7
Previous evidence implicates dysfunction of the mesolimbic dopaminergic system in PD 8 apathy (29,30,37). We hypothesised that this observation might be underpinned by reduced 9 dopamine levels leading apathetic patients to chronically underestimate the (background) 10 reward environment, and therefore not switch from their current behavioural states (even if 11 these are minimal or effectively inertial). Consistent with this prediction, we found a 12  that low motivational states present in some PD patients -those with pathological apathy -20 mediate impairments in the use of background, but not foreground reward information. 21 Furthermore, these can be recovered through dopamine interventions, suggesting a 1 dopaminergic origin for the effects of background reward rate. 2 3

Parkinson's disease but not dopamine or apathy reduced sensitivity to foreground rewards 4
As in the healthy control population, foreground (patch) reward rate strongly predicted 5 leaving times (F(1,68)  When to move on and leave a specific rewarding activity or location is an essential decision 2 problem for animals and humans alike. In this set of studies we elucidate a cognitive 3 mechanism which underpins how people use reward information to decide when to move on, 4 and the neurotransmitter system supporting such decisions. Specifically, dopamine is an 5 important contextual signal for knowing when a location is sufficiently bad or alternatives 6 sufficiently good to move on. Additionally, we demonstrate that in a disease that involves 7 dopaminergic systems (PD), disabling motivational deficits are associated with problems in 8 utilising background reward rate information to drive patch leaving, which can be recovered 9 through dopamine interventions. information about background reward rate, and therefore the opportunity cost (alternatives 20 that are foregone) of chosen actions (19,21), others have argued it encodes a more specific 21 signal for the value of a current action, independent of environmental context (22). Here, we 22 show that changing dopamine levels modulates the effect of background reward rates, not on 23 actions per se, but rather on the more abstract decision of when to move on within an The results presented here also reveal a precise cognitive mechanism by which dysfunction of 5 dopaminergic systems could lead to apathy (Figure 4). A role for dopamine as a contextual 6 signal of background reward rate, thus influencing 'exploration' behaviour, has clear appeal 7 as a mechanistic account of apathy (27). Simply, the hallmark of apathetic behaviour -8 reduced goal-directed activity -may occur because of an impaired ability to estimate or 9 utilise information about background reward rates, impeding switching behaviour from a 10 current activity (even if this activity is very minimal). Apathy is a common and debilitating 11 complication of many neurological and psychiatric conditions, and has been associated with 12 disrupted reward systems across many disorders -including Parkinson's disease 13  However, evidence to date suggests that -in tasks where reward is treated as a single 17 construct -dopamine exerts its influence on behaviour in a dissociable manner to apathy 18 Here, the results demonstrate a specific interaction between apathy and the effect of 21 background reward rate on patch leaving decisions, as a function of dopaminergic tone. In the 22 OFF state, apathetic patients showed a reversal of the predicted effect of background reward 23 rate (23), persisting in patches for longer when environmental reward rates were higher. This 24 behaviour is not consistent with patients simply estimating environmental reward rate as 25 lower, but rather suggests a failure to utilise available information about reward context to 26 appropriately guide decisions. Consistent with the main hypotheses of this study, dopamine 27 restored apathetic (but not non-apathetic) patients' behaviour to the predicted direction 28 (leaving patches earlier in rich compared to poor environments). In contrast, neither 29 dopamine state nor apathy altered the influence of foreground reward rate on patch leaving, 30 which instead varied as a function of disease. Overall, this offers a new interpretation of the 31 relationship between reward and apathy. Specifically, apathetic patients used background 32 reward information mal-adaptively to guide decisions of when to move on, dependent on 33 baseline dopamine levels. This result is consistent with the hypothesis that disrupted 34 representation of background reward rate -or opportunity cost -contributes to apathy in PD, 1 whilst suggesting a potential role for dopamine in ameliorating this deficit. More generally, it 2 demonstrates a novel component of cost-benefit decision making which may be disrupted in 3 apathy, further advancing understanding of this debilitating clinical syndrome (27). 4

5
Our results also highlight that human behaviour in an ecologically-derived decision making 6 task is closely described by a normative model based on the principles of the marginal value 7 theorem (Figure 2) (14,24). This accords with earlier field work in behavioural ecology 8 (14,23) and anthropology (46,47) literatures, and more recent work beginning to explore the 9 neural basis of such decisions (35). In the current study, the use of a foraging framework 10 informed by MVT enabled us to dissociate the effects of reward rates on different time 11 scales, in a way that is not possible in reinforcement-learning based manipulations of average 12 reward rates, where the receipt of an instrumental reward instantaneously increases average 13 reward rate (15,19). Here, participants utilised these dissociable aspects of their reward 14 environment to adjust patch leaving behaviour in close to an optimal fashion, as both 15 foreground and background reward rates varied. This provides evidence for a common 16 decision principle guiding foraging-style behaviour in both humans and other animals, and 17 allows further investigation of the specific neural mechanisms underlying it. 18

19
Although the effects of changing dopamine levels were specific to background reward rate 20 across studies, a differing pattern of effect on how this information altered patch leaving 21 decisions was observed between the cabergoline manipulation in healthy people, and 22 dopamine medication manipulation in apathetic patients with PD. One explanation for such 23 opposing effects is that whilst the ON state in PD patients is associated with increased 24 dopaminergic tone (as demonstrated by reduced motor disability scores - Table 1 appropriately process average reward rates when OFF their medication, but are restored -and 33 closer to optimal -when on their medication (Fig 4). However, healthy individuals show 34 reduced sensitivity to changing background reward rates when their tonic dopamine levels 1 are (putatively) boosted on cabergoline compared to placebo. Thus, people who have typical 2 dopaminergic function become poorer at utilising information about reward context when 3 their dopamine levels are boosted, while conversely dopamine medications restore the 4 performance of apathetic PD patients back towards normal. 5 6 Irrespective of the exact pharmacological mechanism underlying our observations, the 7 experiments presented here demonstrate a robust, consistent effect of dopamine on the 8 responsiveness to background reward rate, modulated in the last study by apathy status. 9 Importantly, variance in patch leaving times did not change as a function of dopamine or 10 apathy state. This, along with the specific rather than general changes in behaviour we 11 observed, make it unlikely in our opinion the observed results can be explained by a 12 confounding factor such as reduced attention or motor disturbance in the OFF state.

4
We performed three experiments aimed at identifying whether (i) humans make patch leaving 5 decisions in line with MVT, (ii) modulating dopaminergic systems with the D2 receptor 6 agonist cabergoline specifically alters people's sensitivity to the background reward rate and 7 (iii) whether apathetic PD patients show a differential effect of dopaminergic medication on 8 background reward sensitivity compared to non-apathetic PD patients.  were recruited via a local database. Potential participants were screened for the presence of 20 neurological, psychiatric or cardiovascular diseases, or for the use of medications that could 21 interact with Cabergoline, and excluded if any of these were present. One subject was 22 subsequently excluded because a core metric of task performance (variance in leaving times 23 per condition) fell outside three standard deviations of the mean variance, leaving 29 24 participants for analysis. Parkinson's disease (PD), confirmed independently by two neurologists, were recruited from 28 local movement disorders clinics in the Oxfordshire area. Inclusion criteria included an 29 absence of PD dementia or other major neurological or psychiatric conditions. Patients with 30 clinical apathy were intentionally recruited, such that the study recruitment had an equal split 31 of apathetic and non-apathetic patients. One patient was subsequently excluded due to failure 32 to understand the task and decisions that fell outside of 3 standard deviations from the group 33 mean, leaving 35 patients. A separate cohort was also recruited from the local Oxfordshire 1 region as a gender and age matched control group for the PD patients. This group was free 2 from cognitive impairment or apathy. Two were subsequently excluded because of concerns 3 about their task performance (not engaging with the task, identified at post-test debriefing), 4 leaving a total of 29 participants. 5 6 Demographics of participants are presented in Table 1.

Patch Leaving Paradigm 1
The aim of this design was to independently manipulate background and foreground reward 2 rates based on the principles of marginal value theorem, a theory of optimal foraging in patch 3 leaving (23,24) -see Supplemental Figure 1. The experiment was designed as a patch 4 leaving problem, with participants aiming to maximise their overall reward returns by 5 deciding how long to spend in sequentially encountered patches. In each patch, participants 6 obtained rewards at an exponentially decrementing rate. Moving to a new patch, which they 7 were free to do at any point, incurred a fixed time delay of six seconds, during which no 8 reward could be gathered. The experiment lasted a fixed amount of time (10 minutes per 9 environment type), however a potentially unlimited number of patches were available. 10

11
Foreground reward rate was determined by the patch reward function. Three patch-types 12 were used, differing in the scaling factor of the reward function (S in equation one below), 13 and corresponding to low (32.5), medium (45) and high (57.5) yield patches. The foreground 14 reward rate, after T seconds in a patch, was determined by the equation: 15 16 !′($) = ' * ) *+.+-. * $ (1) 17 18 Background reward rate was manipulated by varying the proportions of low, medium and 19 high yield patches. Two environments were used: a rich environment in which 50% of the 20 patches were high yield, 30% medium and 20% low yield, and a poor environment in which 21 50% of the patches were low yield, 30% medium and 20% high. Therefore, the background 22 reward rate was higher in the rich environment. MVT demonstrates that, to maximise reward 23 gain, participants should leave each field when the instantaneous reward rate in the field 24 (from equation 1) drops below the background average reward rate for the farm (determined 25 by the environment type; Supplemental figure 1). Simply, for a given patch-type, 26 participants should leave earlier in the rich environment compared to the poor environment 27 ( Figure 1C). 28

29
To improve engagement, the task was framed in a 'real-world' farmyard setting. Each patch 30 was a field of cows returning milk, displayed on the monitor as a bucket that continuously 31 filled during patch residency. The height of milk displayed in the bucket was proportional to 32 the integral of equation (1) between time = 0 and T, and was updated with a frequency of 33 20Hz. The rate of filling declined according to equation 1. Thus the rate of milk yield 34 indicated the foreground reward rate. Participants were not explicitly told which patch-type 1 they were currently in -rather they inferred this by observing the rate of milk accumulation. 2 The background reward rate was continuously cued by the coloured border on the screen, 3 indicating either the rich (gold border) or poor farm (green border). When participants chose 4 to leave their current patch (by releasing the spacebar they had been holding down), they 5 incurred a fixed time cost of 6 seconds, described as the time to walk to the next patch. 6 During this time a counter was displayed which ticked down the seconds until the next patch 7 was reached. On arriving at the next patch participants were cued to "press and hold the 8 spacebar", and after doing this the screen display changed to show the new patch. Experiment ONE: Participants were tested in a single session following training as above.

Statistical analysis 2
We used a hierarchical linear mixed effects model (fitlme in MATLAB, Mathworks, USA; 3 maximum likelihood estimation method) as our primary analysis method for all three 4 experiments, to account for between and within subject effects. All fixed effects of interest 5 (patch, environment and where applicable dopamine and apathy) and their interactions were 6 included, and the random effects structure was determined by systematically adding 7 components until the Akaike Information Criterion was minimised (60). Notably the reported 8 effects in all these models were also present in the simpler models fitting only a random 9 effect of subject.

22
Fixed effects are shown in blue, random effects in green.

24
To avoid the potentially biasing effects of outlying data points on the primary analysis we 25 excluded, subject by subject, any trials in which the leaving time was more than 3 standard 26 deviations above that individual's mean leaving time. Of note, this approach did not change 27 the significance (or otherwise) of any reported results compared to analysis of the full data 28  6 We declare no conflicts of interest.