In Conversation With

Prof. Emad Eskandar on Learning and Addictive Behaviours


In this episode, Dr Elisabetta Burchi speaks with Professor Emad N. Eskandar, M.D., who is an expert in Neurology with multiple specialised interests relating to brain functions. Prof Eskandar talks to us about learning and addictive behaviours. 

In particular, he discusses his work on building predictive models to understand behavioural and neurophysiological aspects of learning and addictive behaviours. 


Professor Emad N. Eskandar, M.D.

Professor, The Leo M. Davidoff Department of Neurological Surgery

Professor, Department of Psychiatry and Behavioural Sciences

Professor, Dominick P. Purpura Department of Neuroscience

Chair, The Leo M. Davidoff Department of Neurological Surgery

Jeffrey P. Bergstein Chair in Neurological Surgery, The Leo M. Davidoff Department of Neurological Surgery

David B. Keidan Chair of Neurological Surgery, The Leo M. Davidoff Department of Neurological Surgery



Dr Elisabetta Burchi, M.D., MBA

Clinical Psychiatrist 




Dr Elisabetta Burchi 0:00

Hello everybody. It is a pleasure to have with us today Professor Emad Eskandar, a renowned doctor and professor of neurological surgery at Montefiore Medical Center, New York. 

I would say Professor Eskandar is a polymath and we could really discuss about everything with the models you developed.

But today our topic will be learning. Learning with the underpinning biological phenomenon. [It reflects] our capability to adapt and thrive in any changing environment, and is also the core feature of humans, our brain and mind. 

Learning is a feature that, when disrupted, can underpin many mental conditions. The models developed by Professor Eskandar recently could explain the connection between disrupted learning and addiction. 

As such, we would love to know more about that and also the potentialities of neuromodulation for restoring disrupted learning. 


Professor Emad Eskandar 1:40

Well, thank you very much for having me today.  It is a real pleasure to be here and look forward to our discussion.

As you say, learning is really critical to who we are. Human beings enter the world with some basic reflexive behaviours but, really, the vast majority of our repertoire of behaviours is based on things that we learned.

And that's true for most vertebrates, but it's especially true for humans since much of the brain is actually devoted to learning.

And one way to sort of think about is that there are really three kinds of circuits in the brain that do different things and learning you move one circuit involves a part of the prefrontal cortex is involved in is salient identifying things that are important or of interest or a value to the, to the person.

Another big circuit is involved in the executive aspect of learning - how to connect a particular sequence of behaviour to movement or a certain outcome

And then there is the last circuit that’s really involved in essentially refining those patterns of movement, making them as smooth and efficient as possible. 

So those are the three circuits.  They all involve some part of the prefrontal cortex and a part of the basal ganglia called the striatum 

In a typical learning-type of experience, you might identify something that is of interest by trial and error, and try to find out what is the best way to do it. Once you have gotten it, you may repeat that action a few times until you really learn the optimal path.

The action will finally become a habitual performance. There is nothing negative about being habitual, it just means that the action becomes very well-grooved. You are very comfortable and efficient in doing it. 

So these are the three big circuits that we will be talking about. 


Dr Elisabetta Burchi 3:46

So Emad, in the beginning, we have the goal-directed learning, and over time these goal-directed learning become so well learned that they become habits, right? 


Professor Emad Eskandar 4:03

You are right.  And again, sometimes habit has a negative connotation but in this context there's nothing negative about it, it's just that, you know, like all the steps I take to get into my car and then drive, and I do it without hardly thinking about it. That's what I mean by habitual learning.

It gets to the point where [the actions are] very familiar and I'm not constantly thinking about it. A very important part of this whole system is the feedback mechanism - when we're exploring new environments and learning, there has to be some kind of indication, some feedback. ”Is this the right step?” “Is this the right thing to do or not?” 

The feedback [that] comes from outside can be many different things, but there's an internal feedback signal that is provided by these neurons in the midbrain that connects the cortex to the brainstem.

These dopaminergic neurons are in the midbrain and they convey a particular signal. It's called a reward prediction error, which basically means it signals the difference between the expected outcome and what actually happens.

So, if I encounter [an] unexpectedly good outcome - when I'm just exploring and I find something unexpectedly good - that causes a discharge of these dopaminergic neurons. That's a positive reward prediction error. 

If I encounter something negative, there's a reduction in the activity. And if I find what I expect, there's no change [in the reward prediction error].

This is a very finely grooved system that works really, really well and allows us to learn incredible amounts of things. You name it - whether it's [learning] a body of a particular subject matter, learning how to play the piano, even potentially learning how to play a sport - All of those follow the same sequence. 

You start not knowing exactly what to do, then you do some trial-and-error learning to explore different possibilities to find the best [method]. Eventually, it becomes very well-learned and habitual. All the feedback for those learning is the internal dopaminergic signals. 


Dr Elisabetta Burchi 6:24

I am sorry to interrupt you. This is a big topic - this connection between learning and reward as an internal and biological signal to consolidate learning. It makes sense from an evolutionary perspective, right? 

We tend to [learn] something that gives a reward. Generally, [learning] that gives a reward is something - from an evolutionary standpoint - advantageous. We can think about food, and we can think about sex. 


Professor Emad Eskandar 7:07

It can be anything. So there is the external, actual reward - the most simple example being food. For example, I am climbing a tree then I find a piece of fruit. That is an obvious external reward. 

There can be other rewards. It can be something positive like an environment to connect with someone. Or it can [be] just words - If I am a student and the teacher provides praise, that is also a kind of reward. So those are the many shades of external rewards. 

Internally though, they are all signalled by the same thing - the dopaminergic neurons. So there is this convergence where all these potentially rewarding outcomes get consolidated into this one signal. 

That is attractive in some ways because [the consolidation] makes it easier for the system to deal with it because there are around 10 different internal reward signals. . 

But, it is also a vulnerability because now there is only one highly privileged signal that is potentially susceptible to disruption. That is a very important point. 

While studying learning, we have learned a lot of things. For example, which parts of the brain are involved, as I have already explained, there is this salient circuit, the learning circuit, and there is the habitual performance circuit. 

We have also learned that you can potentially enhance learning by finding ways to mimic this episodic, or pulsatile release of dopamine. You can actually enhance learning beyond normal. 

You can even, in some cases, use that to potentially treat people who have had a brain injury - like stroke - to enhance their recovery. 

Those are many of the things that we have learned. What’s interesting is that the same three big circuits - cortical areas, dopaminergic neurons - are also the same circuits that have been implicated in a variety of addictive behaviours. They are the same circuits. 

It has always been very interesting to think about that but the full depth of that connection has not been fully understood. You know, how one affects the other. 

So the idea that we have been working on is trying to build, essentially, a computational mimic of this - a model or a system in a computer. It is codes that learn, and it learns using the same rules that are used in biological learning. 

The system has an agent which would be the organism or the person, and an environment [that] has many different states and possible choices, and different rewards. And it has a feedback signal - the internal feedback signal that works exactly [like the biological counterpart] that provides a reward prediction error just like what dopamine does in the real world. 

We specifically wanted to look at these kinds of facets of the learning system. In a simple first test, we had the agent tried to find the shortest path between a starting point and the reward point where it did it very easily. We then had the agent tried to find its way through a maze for the reward point, and again it did it very easily. So we know that [the system] works. 

Okay, and then we wanted to use that [system] to simulate and give us ideas or predictions about what would happen if this system were disrupted by some substance of abuse. First, we started by just having one reward point, [but now] we had four of roughly equal internal reward value. Conceptually, you can think of these as food, water, shelter, and companionship. They were all important. They were all necessary. 

They had broadly equal internal reward value unless one [component] was really deprived and it started to gain value. But [under] normal circumstances, they all had essentially normal reward values. 

In that case, the agent visited all of [the points] with roughly equal frequency and developed a cycle or a path. [It went to point] number one, then [point] number two, then [point] number three, then [point] number four and kept going. 

Interestingly, a variety of additive substances including things like psychostimulants - cocaine and methamphetamine, the whole class of opiates and narcotics, and ethanol - all of them evoke pulsatile dopamine release. But they do so in a way that is super physiologic. 

Normally, a rewarding event might generate one [dopamine] pulse of certain magnitudes, but these agents will generate trains of pulses - five, ten pulses - each of which is five times as big potentially as a spontaneous or a normal pulse. 

So these are driving the system very, very hard. It is a super potent internal feedback signal. Given that, we went back to our model and say, “Okay, so now we have our four reward states and they are roughly equal. What if we add one that has an internal reward value that is like five times as high as the other ones?” 


Dr Elisabetta Burchi 13:28

Such as the additive substances? 


Professor Emad Eskandar 13:29 

[Nod]. How does the agent behave in this context? What does the agent do? What happens next is that, essentially the agent starts to identify “this is very high internal reward value” and starts to visit that site very frequently - much more frequently than the other [sites]. 

In fact, the number of times it visits the other [site] - which we said might be things like food, shelter and water - much less frequently. It goes to this highly rewarding one at the expense of the other ones. 


Dr Elisabetta Burchi 13:58

Basically, it is highjacking the behaviour in a parallel way. 


Professor Emad Eskandar 14:17

Yes, exactly. I think of it as a decisional landscape. There are many, many possible states that any organism or a person can be in, with many possible choices over time. What happens in that is the whole landscape becomes really distorted - or warped - so whatever state the agent is in, it points towards the next state that gets [the agent] closer to the object or state that has a high internal reward value. 

If you think of those states in various combinations of stimulating behaviours as neurocircuits’ actual representations of possible actions and choices, it’s as if a huge part of the circuitry across all three of these areas is now completely absorbed into this process. 

We think that is very interesting. At a high level - a mental level - we can speculate what that might mean for people. You can imagine an unfortunate person who has a severe addiction to some substance, we often find that they don’t take care of themselves very well. They don’t eat very well. They are dishevelled. They might become homeless. 

They effectively deprioritise these [daily living essentials] for the benefit of this [addiction]. So that is one direct reflection of what we had just talked about. 


Dr Elisabetta Burchi 16:00

To summarise this first result from the many experiments you and other scientists have undertaken in the last decades, it shows that the brain areas underpinning learning - the normal physiological phenomenon - and [the areas for] additive behaviour overlap. This was the [theory] and something you have discovered. 

Then, you saw that dopamine was involved in learning and also the learning-and-addictive behaviour. Can you try to summarise in a couple of sentences, what did your model add to our understanding? 


Professor Emad Eskandar 17:09 

Basically, it gives us some actual predictions. Some sort of concrete predictions and what the consequences of these [behaviours] are. 

I think people have recognised the overlap and the potential of dopamine. But it is going to that next step and say, assuming both of those things are true, “What would that do?” “What are the predictions for the neurocircuits and what are the predictions for the outward behaviour?” That is the part that is missing. 

Can you actually model it and then come up with a set of predictions? Obviously, having a model is very important as we can push against it, we can test it. If the predictions are validated, then we accept [the models] but if they are wholly inaccurate we can discard [them]. 

Or, if they are partially inaccurate, we can revise them. At least [the models] give us a path forward. 


Dr Elisabetta Burchi 18:00

You can understand better but the power of the model may also be [used] to craft some strategies to affect this phenomenon. 


Professor Emad Eskandar 18:19

Exactly. So the model indeed captures a lot of what’s happening both behaviourally and neurophysiologically. Then we can say “Well, in what ways can we intervene and modify this?” 

We can test that in the model. If it works in the model, then you know with some reasonable chance that it will work in the real world. It gives us a more tractable thing to deal with than [having] an actual person with all their complexities. 

So that is one set of predictions - abandoning or deprioritising essential things. Other features that pop up as this propensity - once they are formed, they become very deeply embedded - so [the person] become compulsive. 

As I said before, habit itself does not have a negative connotation but when it becomes a compulsion - it means that the person keeps doing it even when it has negative consequences. 

This is not true for regular habits - the kind of habits I am talking about are easily unlearned or replaced - but compulsions are not. They persist despite having a negative outcome, or despite them not being helpful or just counterproductive. The tendency to have these very difficult-to-reverse habits also comes out from the model. 

The third thing that comes out of [the model], which is also relevant, is the full version of the model that includes the possibility of negative outcomes. The full model has an integration of the negative outcomes and the positive outcomes - that’s what leads to the choices made. 

When you run it, what you find there is essentially a relative devaluation of negative outcomes. Put it in another way, the agent essentially has a much higher risk tolerance. It adopts choices that entail much more risk within that context. 

Again, it is speculative that both of these things have the potential to reflect how people with a serious substance-use problem behave. [When] they are trying to obtain [the substance for abuse], they will often engage in a seemingly very risky behaviour that can get them in trouble with the police or [with] legal problems. They become incarcerated or lose their job.

The high-risk behaviours of using dirty needles [for example], and other high-risk behaviours  [when] under normal circumstances, even these people [outside of that context] would not do. 

The model will say, in that context, they perceive the risk to be lower than it really is. There is an actual devaluation of the risk, and it’s often these risky behaviours that are very problematic. So these are all the ideas that have come out of [our model]. 


Dr Elisabetta Burchi 21:49

[I can see the value and] potential application of your model [to restricting] the stigma that these disorders are still connected to.  

We can see that, of course, in the field [where] we know there are biological underpinnings that explain [the phenomenon], but I think I like [the explanation] that addiction is a spectrum as you mentioned - from hardcore to [just] stimulants and they all have the common denominator of dopamine. 

And when there is a disruption of dopamine, as you have very well explained as a common denominator of all the addictive substances, there are now biological underpinnings shown to us through your model. 

Is it a kind of ML model? We didn’t explain well about that, Emad, because I know you are also capable of coding and many other things. What kind of model is yours, maybe just have two words about that? 


Professor Emad Eskandar 23:18

It’s a computational model but it’s a dynamic [one] because it is iterative. It is not like it’s hard-coded - we have a system that learns, and then we look at how the learning evolves under these different contexts. 

The rules in [the model] are very simple - find the reward site and use the reward prediction error et cetera and nothing else. Everything else has emerged from it - it learns these things. 

I like that because it is more representative of what actually happens. Admittedly, it is very simplified because I can’t possibly include all the degrees of freedom and all the neurons that are actually in the brain, but if it exhibits this behaviour even with that very reduced degrees of freedom, then it tells me that there may be some element of truth or some values in these predictions. 

[However], they have to be tested. So the next step for us is to validate - test these in a variety of experimental models, [such as] using experimental animals and so on to confirm these specific elements. In fact, if the learning circuity becomes very distorted then that’s [what] contributes to the behaviour

And there may be even more important [findings] than just the hedonic value of these substances - it is not just the pursuit of these things because it makes the person or the animal feel good, but people will persist [in doing so] long after any hedonic value is gone. It is probably infinitesimally small, and yet [it] persists. 

So this is what we’re going to get: how much of that has to do with just this underlying change in the circuitry and the weights of the circuit? 

Then the next step, as you pointed out earlier, is with this understanding, how can we intervene through neuromodulatory techniques and try and restore a more balanced decisional landscape that is not so warped or distorted, but more flat if you will. More varied, that kind of thing.


Dr Elisabetta Burchi 25:45

So it’s nice [to see] the peak changes drastically in the landscape. Actually, I saw the images that you provided to me - your models produce such landscapes that mirror the path of resistance of different decisions that the brain can take in different biological situations. 

Now, as you have explained well and to translate to a layman language, when the additive substances basically disrupted the normal functioning of these internal reward signals, affecting dopamine signals. This landscape totally gets disrupted basically. 

So to reshape it, what do we need to do, Emad? Let’s make a leap here: how do you envision the use of neuromodulation to readjust this landscape? 


Professor Emad Eskandar 27:05 

I mean, let’s start by thinking about - and this is now at a very abstract level - the decision landscape is flat. There is a little turbulence on it - some things are a little higher, some things are a little lower. That’d be a situation where, essentially, you have free-will, a maximum capacity to change. 

It’s like “okay, I will go to this destination” or “No, I am interested in this, I go here and I am not overtly biassed,” right?  

And I imagine that one of these things is in play. Essentially, instead of having this flat landscape, now it’s like having a big hole in it. As soon as you get close to it, it starts circling around and it is almost inevitable that you have to go down. You can’t avoid it. 

You have to stay very, very far from it. And sometimes you can’t even reach the other point in the landscape because it has become so big. 

So you want to restore that. You want to get rid of this huge disruption in it. There are ways to think about it from a neuromodulatory perspective. You want to restore normal weights to these circuits. 

One way to do that is to expose people or experiment animals to something [that is] provocative - a stimulus that would normally trigger a response leading to [the people or experiment animal] wanting to get that thing. Then find a way to blunt that pulsatile dopamine release and do that repeatedly. 

Over time, the circuits don’t keep getting reinforced. They eventually return back to something normal. That would be one way to do it. 

Another way to do it is to find some other stimulation patterns of behaviour that maybe are more adaptive or productive, and reinforce those selectively so they come to have the same or more values than the negative one. So one [way] or some combination of the two.

Essentially, it restores that balance but [we still] require some work. Obviously, [we need] some [better] precision in how we apply the neuromodulation et cetera, but potentially [it is] doable. 


Dr Elisabetta Burchi 29:21 

It seems to me that the combination between behavioural - also pharmacological approaches of course, but mainly I would say behavioural and the neuromodulatory approaches - can really be the scenario in accelerating new learning or healthier learning for the individuals. 


Professor Emad Eskandar 29:47

Exactly. And by doing that, in effect you’re giving the person back that freedom to make other choices. Because now not so much of the circuitry is ingrained in this. 

You are actually restoring this capability to make decisions fairly, and to make a wider range of decisions without constantly getting drawn into that [hole]. 

And in time, I suppose if you go long enough, I think behavioural therapy analytics would be very important. 

[If] you keep reinforcing that, [the patients] really can regain the agency and the ability to actually make appropriate decisions, and not just be warped by this problem. 


Dr Elisabetta Burchi 30:37

I think, Emad, that is not just an interesting topic but it’s really the core, as we said at the beginning [of this interview]. Because a free will and the capability to be deliberate about our decisions is probably the most important feature of being a human. So [our efforts in] trying to solve these conditions and trying to help people that have this problem is really something relevant. 

[Of course] we would like to know more. This 30 minutes [interview] is just to give us an idea of what is going on behind the scene. We will follow you closely. 


Professor Emad Eskandar 31:32

I really enjoyed speaking with you and sharing my thoughts. We will see when we check back in a couple of years what you have [achieved in learning]. 


Dr Elisabetta Burchi 31:41

I would say faster [since] you have a way to learn.  Now you have talked about learning, you have to accelerate the process. Thank you. 


Professor Emad Eskandar 31:54

Thank you again for your time, really appreciate it. 

Back to blog