Classical and operant conditioning

Classical and operant conditioning are two basic psychological processes that explain how humans and other animals learn. The fundamental concept that underlies both these modes of learning is association.

Simply put, our brains are associating machines. We associate things with each other so that we can learn about our world and make better decisions.

If we didn’t have this basic ability to associate, we couldn’t function normally in the world and survive. Association allows us to make quick decisions based on minimal information.

For example, when you accidentally touch a hot stove, you feel pain and pull your arm back quickly. When this happens, you learn that ‘touching a hot stove is dangerous’. Because you have this ability to learn, you associate the ‘hot stove’ with ‘pain’ and you try your best to avoid this behaviour in the future.

Had you not formed such an association (hot stove = pain), you most likely would’ve touched a hot stove again, putting yourself at a greater risk of burning your hand.

Hence, it is useful for us to connect things to be able to learn. Classical and operant conditioning are two ways in which we form these such connections.

What is classical conditioning?

Classical conditioning was scientifically demonstrated in the famous experiments conducted by Ivan Pavlov involving salivating dogs. He noticed that his dogs not only salivated when food was presented to them but also when a bell rang just before the food was presented.

How could that be?

Salivation resulting from watching or smelling food makes sense. We do it too but why would the dogs salivate on hearing a bell ring?

Turns out, the dogs had associated the sound of the ringing bell with food because when they were given food, the bell rang almost at the same time. And this had happened enough number of times for the dogs to connect ‘food’ with the ‘ringing bell’.

Pavlov, in his experiments, found that when he presented food and rang the bell simultaneously many times, the dogs salivated when the bell rang even if no food was presented.

In this way, the dogs had been ‘conditioned’ to salivate in response to hearing the bell. In other words, the dogs acquired a conditioned response.

Let’s start everything from the beginning so that you can familiarize yourself with the terms involved.

Before conditioning

Initially, the dogs salivated when the food was presented- a normal response that presenting food typically generates. Here, food is the unconditioned stimulus (US) and salivation is the unconditioned response (UR).

Of course, using the term ‘unconditioned’ indicates that no association/conditioning has yet taken place.

Since conditioning hasn’t occurred yet, ringing bell is a neutral stimulus (NS) because it doesn’t produce any response in the dogs, for now.

During conditioning

When the neutral stimulus (ringing bell) and the unconditioned stimulus (food) are repeatedly presented together to the dogs, they get paired in the dogs’ minds.

So much so, that the neutral stimulus (ringing bell) alone produces the same effect (salivation) as the unconditioned stimulus (food).

After conditioning happens, the ringing bell (previously NS) now becomes the conditioned stimulus (CS) and salivation (previously UR) now becomes the conditioned response (CR).

The initial stage during which the food (US) is paired with the ringing bell (NS) is called acquisition because the dog is in the process of acquiring a new response (CR).

After conditioning

After conditioning, the ringing bell alone induces salivation. Over time, this response tends to diminish because the ringing bell and food are no longer paired.

In other words, the pairing becomes weaker and weaker. This is called the extinction of the conditioned response.

Note that the ringing bell, in and of itself, is powerless in triggering salivation unless paired with food which naturally and automatically triggers salivation.

So when extinction happens, conditioned stimulus goes back to being a neutral stimulus. In essence, pairing enables the neutral stimulus to temporarily ‘borrow’ the ability of an unconditioned stimulus to induce an unconditioned response.

After a conditioned response has become extinct, it may reappear again after a pause. This is called spontaneous recovery.

Generalization and discrimination

In classical conditioning, stimulus generalization is the tendency of organisms to elicit the conditioned response when they’re exposed to stimuli that are similar to the conditioned stimulus.

Think of it this way- the mind tends to perceive similar things as being the same. So Pavlov’s dogs, even though they were conditioned to salivate on hearing a particular bell ring, may also salivate in response to other similar-sounding objects.

If, after conditioning, Pavlov’s dogs salivated on exposure to a ringing fire alarm, a bicycle ring or even tapping of glass sheets, this would be an example of generalization.

All these stimuli, though different, sound similar to each other and to the conditioned stimulus (ringing bell). In short, the dog’s mind perceives these different stimuli as the same, generating the same conditioned response.

This explains why, for instance, you may feel uncomfortable around a stranger whom you’ve never met before. It may be that their facial features, gait, voice or manner of speaking reminds you of a person you hated in the past.

The ability of Pavlov’s dogs to distinguish between these generalized stimuli and other irrelevant stimuli in the environment is called discrimination. Hence, stimuli that aren’t generalized are discriminated from all other stimuli.

Phobias and classical conditioning

If we consider fears and phobias as conditioned responses, we can apply classical conditioning principles to make these responses go extinct.

For example, a person who fears public speaking may have had a few bad experiences initially when they got up to speak in public.

The fear and discomfort they felt and the action of ‘getting up to speak’ got paired such that the idea of getting up to speak alone generates the fear response now.

If this person gets up to speak more often, despite the initial fear, then eventually the ‘speaking in public’ and the ‘fear response’ will get untangled. The fear response will become extinct.

Consequently, the person will get rid of the fear of public speaking. There are two ways this can be done.

First, expose the person to the feared situation continuously till the fear diminishes and eventually goes away. This is called flooding and is a one-time event.

Alternatively, the person can undergo what’s called systematic desensitization. The person is gradually exposed to the varying degrees of fear over an extended period of time, each new situation being more challenging than the previous one.

Limitations of classical conditioning

Classical conditioning may lead you to think that you can pair anything with anything. In fact, this was one of the early assumptions of the theorists working in the area. They called it equipotentiality. However, it became known later that certain stimuli are more readily paired with certain stimuli.1

In other words, you can’t just pair any stimulus with any other stimulus. We’re likely ‘biologically prepared’ to generate responses to certain kinds of stimuli over others.2

For instance, most of us fear spiders and this fear response may also get triggered when we see a bundle of thread, mistaking it for a spider (generalization).

This type of generalization rarely occurs for inanimate objects. The evolutionary explanation is that our ancestors had more reason to fear animate (predators, spiders, snakes) objects than inanimate objects.   

What this means is that you may sometimes mistake a piece of rope for a snake but you’ll hardly ever mistake a snake for a piece of rope.

Operant conditioning

While classical conditioning talks about how we associate events, operant conditioning talks about how we associate our behaviour with its consequences.

Operant conditioning tells us how likely we are to repeat a behaviour based purely on its consequences.

The consequence that makes your behaviour more likely to occur in the future is called reinforcement and the consequence that makes your behaviour less likely to occur in the future is called punishment.

For example, say a child gets good grades in school and his parents reward him by buying him his favourite gaming console.

Now, he’s more likely to perform well on future tests also. That’s because the gaming console is a reinforcement to encourage more future occurrences of a particular behaviour (getting good grades).

When something desirable is given to the doer of a behaviour to increase the likelihood of that behaviour in the future, it is called positive reinforcement.

So, in the above example, the gaming console is a positive reinforcer and giving it to the child is positive reinforcement.

However, positive reinforcement is not the only way in which the frequency of a particular behaviour can be increased in the future. There is another way in which the parents can reinforce the ‘getting good grades’ behaviour of the child.

If the kid promises to do well in future tests, his parents may become less strict and lift some restrictions that were previously imposed on him.

One of these undesirable rules could be ‘play video games once a week’. The parents may do away with this rule and tell the kid that he can play video games twice or maybe thrice a week.

The kid, in return, has to continue performing well in school and keep ‘getting good grades’.

This type of reinforcement, where something undesirable (strict rule) is taken away from the doer of a behaviour, is called negative reinforcement.

You can remember it this way- ‘positive’ always means something is given to the doer of a behaviour and ‘negative’ always means something is taken away from them.

Note that in both the above cases of positive and negative reinforcement, the end goal of reinforcement is the same i.e. increasing the future likelihood of a behaviour or strengthening the behaviour (getting good grades).

It’s just that we can provide the reinforcement either giving something (+) or taking something away (-). Of course, the doer of the behaviour wants to get something desirable and wants to get rid of something undesirable.

Doing one or both of these favours on them makes it more likely that they’ll comply with you and repeat the behaviour you want them to repeat in the future.

So far, we’ve discussed how reinforcement works. There’s another way to think about the consequences of behaviour.

Punishment

When the consequence of a behaviour makes the behaviour less likely to occur in the future, the consequence is called punishment. So reinforcement increases the likelihood of a behaviour in the future while punishment decreases it.

Continuing with the above example, say, after a year or so, the kid starts to perform badly on tests. He got carried away and devoted more time to video games than to studying.

Now, this behaviour (getting bad grades) is something that the parents want less of in the future. They want to decrease the frequency of this behaviour in the future. So they have to use punishment.

Again, the parents can use punishment in two ways depending on whether they give something (+) or take something away (-) from the kid to motivate him to decrease his behaviour (getting bad grades).

This time, the parents are trying to discourage the child’s behaviour so they have to give him something undesirable or take away something that is desirable for the kid.

If the parents re-impose the strict rules on the kid, they are giving him something that he finds undesirable. So this will be positive punishment.

If the parents take away the child’s gaming console and lock it up in a cabin, they are taking away something that the child finds desirable. This is negative punishment.

To remember what type of reinforcement or punishment is being carried out, always keep the doer of the behaviour in mind. It’s his behaviour that we want to increase or decrease using reinforcements or punishments respectively.

Also, keep in mind what the doer of a behaviour desires. This way, you can tell whether giving something and taking something away is a reinforcement or a punishment.

reinforcement and punishment types

Successive approximation and shaping

Have you ever seen dogs and other animals perform complex tricks at the commands of their masters? Those animals are trained using operant conditioning.

You can make a dog jump over an obstacle if after jumping (behaviour), the dog gets a treat (positive reinforcement). This is a simple trick. The dog has learned how to jump at your command.

You can continue this process by successively giving the dog more rewards until the dog gets closer and closer to the desired complex behaviour. This is called successive approximation.

Say you want the dog to do a sprint right after it jumps. You have to reward the dog after it jumps and then after it sprints. Eventually, you can discard the initial reward (after the jump) and only reward the dog when it carries out the jump + sprint sequence of behaviour.

Repeating this process, you can train the dog to jump + sprint + run and so on in one go. This process is called shaping.3

This video demonstrates the shaping of a complex behaviour in a Siberian Husky:

Schedules of reinforcement

In operant conditioning, reinforcement increases the strength of a response (more likely to occur in the future). How the reinforcement is provided (reinforcement schedule) influences the strength of the response.4

You can either reinforce a behaviour every time it occurs (continuous reinforcement) or you can reinforce it some of the time (partial reinforcement).

Although partial reinforcement takes time, the response developed is quite resistant to extinction.

Giving a child candy every time he scores well in an exam would be continuous reinforcement. On the other hand, giving him candy some of the time but not every time the child scores well would constitute partial reinforcement.

There are different types of partial or intermittent reinforcement schedules depending on when we provide the reinforcement.

When we provide the reinforcement after a fixed number of times a behaviour is done it is called fixed-ratio.

For example, giving candy to the child every time he scores well in three exams. Then, rewarding him again after he scores well in three exams and so on (fixed number of times a behaviour is done = 3).

When reinforcement is provided after a fixed interval of time, it is called the fixed-interval reinforcement schedule.

For example, giving the child candy every Sunday would be fixed-interval reinforcement schedule (fixed time interval = 7 days). 

These were examples of fixed reinforcement schedules. Reinforcement schedule can also be variable.

When reinforcement is given after a behaviour is repeated an unpredictable number of times, it is called variable-ratio reinforcement schedule.

For example, giving the child candy after scoring well 2, 4, 7 and 9 times. Note that 2, 4, 7, and 9 are random numbers. They don’t occur after a fixed gap as in fixed-ratio reinforcement schedule (3, 3, 3, and so on).

When reinforcement is given after unpredictable intervals of times, it is called variable-interval reinforcement schedule.

For example, giving the child candy after 2 days, then after 3 days, the after 1 day and so on. There isn’t a fixed time interval as in the case of fixed-interval reinforcement schedule (7 days).

reinforcement schedules in operant conditioning

In general, variable reinforcements generate a stronger response than fixed reinforcements. This may be because there are no fixed expectations about obtaining rewards which makes us think that we may get the reward at any time. This can be highly addictive.

Social media notifications are a good example of variable reinforcements. You don’t know when (variable-interval) and after how many checks (variable-ratio) you’re going to get a notification (reinforcement).

So you’re likely to keep checking your account (reinforced behaviour) in the expectation of getting a notification.

References:

  1. Öhman, A., Fredrikson, M., Hugdahl, K., & Rimmö, P. A. (1976). The premise of equipotentiality in human classical conditioning: conditioned electrodermal responses to potentially phobic stimuli. Journal of Experimental Psychology: General105(4), 313.
  2. McNally, R. J. (2016). The legacy of Seligman’s” phobias and preparedness”(1971). Behavior therapy47(5), 585-594.
  3. Peterson, G. B. (2004). A day of great illumination: BF Skinner’s discovery of shaping. Journal of the experimental analysis of behavior82(3), 317-328.
  4. Ferster, C. B., & Skinner, B. F. (1957). Schedules of reinforcement.