What I learned from Dr Dunbar at a weekend conference!

I spent three days sitting in a conference room listening to Dr. Ian Dunbar talk about dogs, training and learning theory; and somewhere during day three I had an epiphany!

You see, I have a degree in psychology and always thought I really understood learning theory. But during this seminar, I started looking at learning theory in a different way. I am not sure if this was Dr. Dunbar’s intention, but it radically changed the way I look at the subject.

In college I learned several different premises under the auspices of “learning theory,” but each was taught individually as a “stand alone” idea. After hearing Dr Dunbar speak, I realized that the three major corner stone’s of learning are actually pieces of a much larger and complete way of looking at learning.

I was originally looking at this based on how it would apply for dogs, but as I flesh this concept out, I think this will transcend dog training and help explain successful training for all! But before we get too far ahead of ourselves, let’s talk about the three cornerstones of learning theory and their “fathers.”

Thorndike, addressed do it right=reward, do it wrong=punishment or binary learning. Skinner (operant conditioning) covered the four quadrants of punishment and rewards. Pavlov (classical conditioning) champions learning through associations, either positive or negative!

Thorndike’s law of effect states:  Responses, closely followed by satisfaction, will become firmly attached to the situation and therefore more likely to reoccur when the situation is repeated. Conversely, if the situation is followed by discomfort, the connections to the situation will become weaker and the behavior of response is less likely to occur when the situation is repeated. In essence here, we are speaking of binary learning, learning that occurs through one of two choices and their eventual result…positive or negative, reward or punishment, black or white!

B.F. Skinner, whom many would say is the father of operant conditioning, provided us the following: Operant conditioning (sometimes referred to as instrumental conditioning) is a method of learning that occurs through rewards and punishments for behavior. Through operant conditioning, an association is made between a behavior and a consequence for that behavior. This association is where positive dog training has taken its lead from – the four quadrants; positive reinforcement, positive punishment, negative reinforcement and negative punishment.

To save time, I have included a link for the definitions of each HERE. Suffice it to say though; many who claim to be positive reinforcement trainers only rely on one of the four quadrants. Those who many consider to be “aversive” or positive punishment trainers again only rely on one of the four quadrants…albeit a different one!

The other two quadrants, even though proven in the laboratory, are much less used methods in dog training, and care best described as torture/nagging (negative reinforcement) and the giving of a time out or being grounded (negative punishment).

Ivan Pavlov is credited by many as defining the key to classical conditioning and it is a technique used in behavioral training, especially in dog training. A naturally occurring stimulus is paired with a response. Then, a previously neutral stimulus is paired with the naturally occurring stimulus. Eventually, the previously neutral stimulus comes to evoke the response without the presence of the naturally occurring stimulus. The two elements are then known as the conditioned stimulus and the conditioned response. The way I have always thought of this is to make associations either positive (reinforcing) or negative (scary or threatening) from a dog training perspective.

I do not want to get into which of these three is right or which is wrong. I would much rather look at them as a whole, realizing that each is simply a part of yet another larger and more complete theory on learning; where each existing and proven theory plays a vital role in the dog being able to learn and retain the knowledge!

In many cases, the existing learning theories have brought dog training light years into the future; but because we are human, it has allowed us as trainers to segment and alienate certain parts of training as right or wrong! We have even allowed ourselves to demonize certain aspects of learning as cruel. If I learned anything in college, it was that you can prove anything with statistics and logic.

So now we have a situation where dog training, which has made huge leaps in the last 25 years, has now bottle necked to a standstill. This is because we are more interested in who is right, rather than what is the most comprehensive and best way for the dogs to learn!

If you look at any hot button argument in the world today there are always two sides, and the opposite ends of the spectrum are the loudest and most vocal for their respective sides! Remember that we were taught in statistics about something called the standard deviation (or bell) curve? It simply tells us that for any population, 12.5% will fall on each of the far ends of the population, but 75% will fall somewhere in the middle and out of the extremes.

I really want to stress here that because of this right & wrong, black & white, positive & negative perspective that has been at the forefront of dog training of late, we are now creating dogs that have very little reliability; and in the end don’t really know what we humans want or expect from them! With that being said, I firmly believe that the reality, the truth, the answer to reliability and more successful dog training(whatever you want to call it) falls in that 75%, the biggest section of the bell curve!

Dr. Dunbar, in his seminar, had a really unique 1-2-3-4 process for teaching dogs that moves them through the Cue (1), Lure (2), Behavior (3) and Reward (4). The uniqueness of this process is that, over time, you eliminate steps 2 and 4 leaving you only a cue and the behavior, as well as a dog that does not require lure or reward because they are working for life rewards or the positive feeling created just by giving the behaviors.

The first thing you do is 1-2-3-4 which gets you the behavior. Then you will do 1-3-4 to teach and/or learn the behavior. Finally you will only have to do 1 and 3. Now you have a dog that is given the cue and responds to the behavior, all because of the process of learning! A real life example might be teaching a child to clean their room! Early on, the parent has to tell the child to clean the room (cue) and then show the child how to clean their room and offer some enticement (say an after-school snack) to get the ball rolling (lure)! The behavior is easy – cleaning their room, and the reward is – let’s say – an allowance.

As time goes on, the parent most likely still has to tell the child to clean their room, but does not have to entice them to do it, because the allowance is sufficient to get the behavior! Fast forward many years and the child owns his/her own home. Guess what? If the house is dirty (cue) they clean it (behavior!) Why…the pride of seeing their own home clean (life reward!)

As you can see in the above example, by the time the child grows up and buys his/her own home, the behavior of cleaning has become reliable, right? Well, this is the main reason for coming up with the 1-2-3-4 method. Over the last 25 years, Dr Dunbar had noticed that training (while becoming much more kind and gentle) with such dependence on luring and rewarding, was actually killing the reliability of behaviors. While in the prior history of dog training, before positive reinforcement training, reliability might initially be stronger with the positive punishment methods, many of those dogs fell into learned helplessness due to the use of only those punishments.

Remember the previous discussions on the extremes? Hopefully some of you are thinking HMMMMMMMMMMM? Once again, the answer lies with the words punishment and reward! By definition, punishment means anything that decreases the frequency of a behavior! Likewise, rewards are nothing more than something that increases the frequency of a behavior! If you decide to follow only one or the other, then here’s a little baseball analogy to put it in perspective…try to hit a fastball with only one half of a baseball bat! To me that sounds kind of like using your forehead to drive a nail, very painful and not very productive!

This, in a matter of speaking, is the bottleneck I previously mentioned. Positive Reinforcement training relies too much on food (lures and rewards), creating dogs that will only work when they see or smell food. Positive Punishment training uses only punishment, ending up with dogs that just give up! There has to be a middle ground to direct us to the 75% or Promised Land, and there is!

As I sat there in the seminar listening to Dr Dunbar explain the 1-2-3-4 method to teaching; it made me, as a psychology major, attempt to figure out where this idea would fit in learning theory. Then it dawned on me that it not only did not fit into my own mental constructs’ of learning theory it actually redefined it! It was like a splash of cold water in my face when I realized that using only one of the learning theories was now no longer an option.

The time of considering yourself classical, operant or a disciple of Thorndike was really not possible. But maybe, just maybe, combining them could be the answer. What if each of them was just a different way of describing the 1-2-3-4 method all of which should be taken into consideration when training?

Not to make light of Thorndike or the law of effect, but his theory is the broad, base of learning that we all understand at some intrinsic level. Let’s consider it to be the bottom layer, or foundation of a pyramid. We know that all decisions we make, we do out of some sense of survival…gaining pleasure or avoiding pain! In other words, things we like – we do more often and those we dislike – we do less often, if at all. But where do we go from there?

Next up we have Skinner and operant conditioning where we are using a reward to teach a behavior. Finally we come to Pavlov and Classical Conditioning to associate the behavior with something positive.

Well according to the 1-2-3-4 method we first ask for or name (cue) the behavior (let’s use “Sit”.) Then we lure a behavior to teach the dog the “how- to” of the behavior (for a sit we take a treat and move it over the top of the dog toward the tail…till the butt hits the ground (behavior)!) We then say thank you and give a treat (reward).

Over time, the act of sitting on command becomes the reward itself because of the associations of the treat we have used in the past, as well as the “thank you”, other praise and pets. The key to this learning is to make sure not to pigeon hole yourself into one style, thought or aspect of learning theory. It’s best to incorporate them all into a simple and easy to understand idea that everyone can understand! 1-2-3-4 anyone! The goal of any training: people, dogs, goats or monkeys; is to get the behavior you want, when you ask for it, without a lure or a reward. For those parents with kids getting an allowance, I am sure none of you expect to be paying out when they are 40 do you???

So while I sat and listened (at least most of the time, when I wasn’t frantically scribbling notes), I figured out how to make learning theory fit inside in my own head. But once I got home and re-read my notes, I realized there was an even easier way to explain Dr. Dunbar’s 1-2-3-4 method and to convince/explain to people that by fading out steps 2 and 4 (lure and reward) we could once again revolutionize dog training.

The biggest difference between operant and classical conditioning is position in the world and perception! The order they follow is also incredibly important. We now know that to be reliable, you can’t only use one of these; so which came first – the chicken or the egg? As we discussed earlier, the emergence of positive reinforcement and positive punishment training both jump started dog training as well as bringing it to a screeching halt; but it alone is not the problem!

We have fallen into our pit of success! We found out 25 years ago that using rewards (food) was a much more productive way to getting behaviors quickly than the status quo of training using aversion methods that came home with soldiers after World War II. But what we did not realize was that we, humans, always ruin a good thing!

We got so caught up in food training and changing the mindset of training from negative to positive that we did not even see the writing on the wall. We have been creating treat addicts for the last 25 years! On top of that, we did not even notice that while we were busy changing the world, dogs were becoming less dependable, and in the end more interested in the lure than they were in the behaviors we were teaching!

Dr Dunbar nailed it with the 1-2-3-4 method he introduced at the seminar in Kansas City I attended, and in this case it was not what he added, but what he subtracted! We must fade out the lure and the reward as quickly as possible so that in the end we end up with a dog that receives a cue and gives a behavior of his/her own accord! That in itself sounds pretty positive doesn’t it? I am honestly ashamed I did not make this connection myself! In the end, you still need both operant and classical conditioning! I am just of the opinion that our definitions have been slightly out of whack! Let me explain…

The keys to the 1-2-3-4 method are steps 2 and 4, the lure and reward! If we continue as we always have we will never fade either out and have dogs that just wait to see the treat before giving the behavior that we are asking for (cue)! All we have to do is change our perception of lures/bribes vs. rewards/motivation! Do we work harder (over time) for a boss that gives us more money or for the boss that gives credence to our work?

Do people or dogs for that matter find money or praise more important? I say it is how life and those around us have raised us. If you ask me, step 2, the lure, is 100% operant conditioning and step 4, the reward, is one of two things, depending on whether you did it wrong or right! If you have been bribing your dog, the lure becomes the reward and then they become the same thing!

These are the dogs we have all seen, that refuse to work unless a treat is available! Not fading the lure makes the reward a bribe! If, on the other hand, the lure goes away and the behavior is insisted upon before the reward, the integrity of the reward is still valid! Don’t get too excited, just because the lure is gone you still run the risk of the reward not living up the demands of the behavior expected!

Have you, or anyone you know ever quit a job out of dissatisfaction or frustration only to be offered more money to stay? In the end it was not the wage (reward) that made you decide quit the job it was you realizing that no amount of reward (money) was worth staying.

This brings us back to the idea of reliability! Reliability is not something that is bought with money or reward; it is something you choose to do internally. Some call it pride, others self worth, but in the end it comes from within! What an epiphany! Thorndike gives us the parameters in which to learn, operant conditioning gives us the lure to keep interested, but classical conditioning is what builds the associations that create rock solid reliability that make us choose to do what we are asked!

Put in the simplest terms I can think of… operant conditioning relates to the lures and rewards that come from an external source and can get the ball rolling, but classical conditioning refers to the associations that come from within that make us choose to do requested behaviors! In the end, life rewards will and should always beat out lures and food! Learning Theory models going forward must delve into how they work as a unit for success not how they work individually to fail! Who would have thought learning theory would be as simple as 1-2-3-4?

6759 Total Views 1 Views Today