Puppy Class Instructors Course VICKI AUSTIN CANINE BEHAVIOUR AND TRAINING Operant Conditioning NB. These notes are in support of power-point presentation. Operant conditioning is also known as: Instrumental Conditioning Skinnerian Conditioning Consequence of Behaviour Thorndike s Law of Effect: The Experiment Thorndike s best-known work was with cats. He would place a cat in a puzzle-box and leave food in clear view but out of reach of the cat. The various puzzle boxes had doors that could be opened by a simple act on the cat s part, such as stepping on a tread-plate or pulling a loop attached to a string. Thorndike reported that the cats typically: tried to squeeze through any opening; clawed and bit at the bars or wire; thrust their paws out through any opening and clawed at everything they could reach; they continued their efforts when they struck anything loose and shaky; they might claw at things within the box. Thorndike found that cats showed minimal interest in the food and seemed more interested in getting out of the box. But in experiments with dogs, he found the opposite. Hardly surprising for cat and dog owners! B F Skinner Burrhus Frederic Skinner (1904 1990) Skinner gave us operant conditioning. He continued on the work of Watson and Thorndike. He invented Skinner boxes with which he conducted his work on operant conditioning. Although his PCI Operant Conditioning Notes Page 1 of 12
students, Keller and Marian Brelland, achieved high success in the operant conditioning of animals on a commercial basis (see Animal Behaviour Enterprises) for almost 50 years, it was not well-received by the general dog training community for several decades. Skinner gave us operant conditioning. He invented Skinner boxes with which he conducted his work on operant conditioning. Although his students, Keller and Marian Brelland, achieved high success in the operant conditioning of animals on a commercial basis (see Animal Behaviour Enterprises) for almost 50 years, it was not well received by the general dog training community for several decades. Skinner Boxes A typical Skinner box set-up. Skinner conducted conditioning experiments on rats and pigeons: animals were put into Skinner boxes and trained to conduct tasks to either gain a reward or avoid a punishment. Phases of Training: Teaching The puppy does not yet connect the cue to the behaviour or the result. He does not yet know what is expected. Reward every correct response during the teaching phase even the sloppy, imperfect ones. Punishment for incorrect responses is not appropriate during the teaching phase. Training The puppy understands the cue or command for a particular behaviour and is complying most of the time. Variable/intermittent schedule of reinforcement can be introduced. Distance, duration, complexities can be introduced. PCI Operant Conditioning Notes Page 2 of 12
Proofing Dog can recognise the cue and respond correctly in any environment or situation, under high levels of distraction. Consequence of behaviour can be: Reinforcing Punishing Neutral Consequence of Behaviour Reinforcement Strengthens behaviour. The dog wants it to happen. Increases the likelihood of the behaviour in future. NB. The behaviour is reinforced. The dog is rewarded. Consequence of Behaviour Punishment Weakens behaviour Dog does not want to happen Reduces the likelihood of that behaviour in the future Negative or Positive Positive something is added. (+) Negative something is taken away. (-) Punishment and Reinforcement Positive reinforcement: something the dog likes is given Positive punishment: something the dog does not like is given Negative reinforcement: something the dog does not like is taken away Negative punishment: something the dog likes is taken away Examples: Positive reinforcement something the dog likes is given food treat, praise, toy thrown, allowed into the house, tickle around the ear, access to something desirable Positive punishment something the dog does not like is given smack, yelling, jerk on lead, electric shock, citronella spray, splash of water PCI Operant Conditioning Notes Page 3 of 12
Negative reinforcement something the dog does not like is taken away the scary thing goes away, the pressure is released on the collar, the dog escapes from something scary thing. Negative punishment something the dog likes is taken away withdrawal of the food treat, withholding of the toy, time-out, omission Positive reinforcement techniques are sometimes called reward training, but Skinner objected to the term. He said: The strengthening effect is missed when reinforcers are called rewards. People (or dogs/animals) are rewarded, but behaviour is reinforced. Operant behaviour is voluntary and goal directed. Three-Term Contingency A B C Antecedent Behaviour Consequence Antecedent Stimulus = cue, command, trigger Behaviour = response Consequence = reinforcement, punishment or neutral Food Rewards as R+ Dogs will work harder for larger or tastier rewards. Given a choice, dogs prefer numerous small pieces of food over one large piece even if it is the same amount as all the small pieces. Reinforcer sampling serves to energise and motivate the dog. Jackpots are used to reinforce excellence. (Differential reinforcement of excellence DRE) Sampling examples: Give an obedience trialling dog a quick game of tug or a food treat immediately before entering the trial ring to energise and motivate. Give a scent detector dog a food treat prior to starting the search. A mere smell of the food treats or visual flapping of the toy can also be considered a reinforcer sampling. Nothing succeeds like success! French proverb. PCI Operant Conditioning Notes Page 4 of 12
Timing of Consequence For dogs, the optimal association timeframe is considered to be when the reward or punishment is delivered during or within three seconds of the behaviour. When delivered after ten seconds from the behaviour, it will not have an effect on the behaviour this is called the disassociation time. The time interval from three seconds up to ten seconds after the behaviour is considered to be a grey area the reinforcement or punishment delivered during this timeframe may or may not affect the behaviour. Reinforcement Schedules Continuous Reinforcement Schedule (CRF) Fixed Interval Reinforcement Schedule (FI) Variable Interval Reinforcement Schedule (VI) Fixed Ration Reinforcement Schedule ((FR) Variable Ration Reinforcement Schedule (VR) CRF is used during the teaching phase of an exercise. When the dog is learning, he is reinforced for every correct response to ensure clarity in the learning experience. FI is rarely used in animal training. VI is useful in producing desired behaviours at higher rates in order to reduce undesirable behaviour. Example: to discourage a dog from randomly digging up the garden, we allocate a digging spot in the yard. To encourage the dog to dig in the digging spot, we bury toys, bones and other treats in the spot, from time to time. The dog is never sure when digging in the spot will be reinforced with treats, but he knows that the reinforcement of buried treats requires him to dig in that location. FR is used when a set number of repetitions of the same behaviour is required. Example: a dolphin jumps three times. VR is used when a behaviour is required to be repeated or continued until cued to cease. Example: barking the answers to sums, heeling, stays. Rates of Responding Under Reinforcement Schedules Fixed Ratio: On FR schedules, animals perform at a high rate but reinforcement is often followed by a pause in responses. The pause becomes longer as the FR becomes higher ie there will be longer post-reinforcement pauses on a FR100 than a FR15. It is like the animal is having a break before returning to work. Variable Ratio: Because VR schedules typically produce fewer and shorter post reinforcement pauses than FR PCI Operant Conditioning Notes Page 5 of 12
schedules; they produce more behaviour over a time period than a FR, even though the pay-off is the same. Fixed Interval: Similar to FR, FI produces post reinforcement pauses. Limited application in animal training. Variable Interval: VI produces high and steady rates; higher than FI but not as high as FR and VR. Duration Schedules In animal training we commonly use another schedule of reinforcement known as duration schedule. FD: reinforcement is dependent on the continuous performance of a behaviour for a set length of time. VD: reinforcement is dependent on the continuous performance of a behaviour for varying lengths of time. Duration exercises include heeling, stays, searching, etc. Most commonly, the teaching of these exercises is commenced on a very short FD schedule and progressed to a VD schedule as training progresses. Duration schedules are a technique used by animal trainers, but will not be found in scientific texts. Differential Schedules of Reinforcement Differential Rate Schedules Differential reinforcement of high rates of behaviour (DRH) Differential reinforcement of low rates of behaviour (DRL) Differential Type Schedules Differential reinforcement of other behaviour (DRO) Differential reinforcement of incompatible behaviour (DRI) Differential reinforcement of excellent behaviour (DRE) Differential Rate Schedules DRL (Differential reinforcement of low rates of behaviour) is not generally used in dog training. It requires the animal to wait a set amount of time after the last response before responding again. Can produce superstitious behaviours. PCI Operant Conditioning Notes Page 6 of 12
DRH (Differential reinforcement of high rates of behaviour): the animal has to perform a behaviour a minimum amount of times in a given period or it will receive nothing. DRH schedules can produce extremely high rates of behaviour, higher than any other schedule. Differential Type Schedules of Reinforcement are commonly utilised in animal training and behaviour. DRO (Differential reinforcement of other behaviour) to curb unwanted biting and pulling on clothing, the handler reinforces any behaviour that is not biting or pulling on clothing, such as tug-o-war, retrieving, seeking, sitting, standing, walking close, focused attention, barking, chewing on toys, etc. Consequently, the dog offers other behaviours that achieve reinforcement. DRO is a term used by animal trainers it is similar to the scientific DR0 (Differential reinforcement of zero behaviour) which reinforces a lack of a particular behaviour after a set period of time. DRI (Differential reinforcement of incompatible behaviour) to curb unwanted jumping up behaviour, the handler trains the dog to sit in order to gain reinforcement. Sitting is strengthened and offered more often. Sitting is incompatible with jumping up; you cannot do both at the same time. This technique is highly successful in treating problem behaviour in dogs. DRE (Differential reinforcement of excellent behaviour) is used to improve the animal s performance. Responses are not equal; instead of randomly rewarding some responses, the trainer chooses to reward only the best. The animal will offer less mediocre level responses and more best level responses. The trainer will then look for a further improvement to become the new benchmark for reinforcement, until a perfect level has been achieved and only it is reinforced. DRE is possibly the most important schedule of reinforcement for animal trainers to comprehend. However, it is a technique used by animal trainers and will not be found in scientific texts. Variable Schedules of Reinforcement Behaviour that has been reinforced on a variable/intermittent schedule becomes more resistant to extinction. This is such an important point to consider when designing a behaviour modification program to reduce or eliminate undesirable behaviour. An extinction program will not be highly successful where the undesired behaviour has a history of variable/intermittent reinforcement. Extinction may eventually occur but the process will be arduous for both dog and owner. Stimulus Control Also known as discriminative difference. Placing the behaviour on cue or command. A is for Antecedent stimulus (ABC) PCI Operant Conditioning Notes Page 7 of 12
Learning Stimulus Control To place a behaviour under stimulus control, the dog has to learn: 1. Which behaviour produces the reward; and 2. Which stimulus (cue, command, signal) predicts when the behaviour will be rewarded. The dog has to learn that lying down will bring a reward and then he has to learn that it will only bring a reward when offered after the command, DROP. The Quarantine Detector dog has to learn that indicating on plant material will only bring reward in the airport when wearing his jacket. Discrimination and Generalisation of the cue: The dog learns to discriminate that the behaviour will be rewarded in the presence of the cue, but not otherwise. The dog learns to generalise the cue to different environments; tones or loudness of cue; different people giving cue, etc. Discrimination and generalisation is involved in scent detection training. The target odour becomes the antecedent stimulus. For a quarantine detector dog, sitting at a bag will only be reinforced when the target odour is within. The dog has had to generalise his experience of being rewarded during training for sitting at the odour source of carrots, celery, apples, oranges and bananas, to all fruit and vegetables. He also has to learn to discriminate that mango scented shampoo, fruit flavoured lollies and dried fruit, do not signal reinforcement for sitting. Salience of the Cue Various stimuli are more or less noticeable or salient to the dog. Body movement, especially a hand holding a lure will be more noticeable than a verbal cue spoken in a regular voice. When poor timing occurs where the command and the luring hand are presented simultaneously, the food treats and the movement of the hand holding them, is most likely to overshadow the spoken command. Selective Attention Test by Daniel Simons and Christopher Chabris (1999) Youtube https://www.youtube.com/watch?v=vjg698u2mvo&t=9s Instructions: Count how many times the players wearing white pass the basketball. PCI Operant Conditioning Notes Page 8 of 12
Overshadowing Example: The dog is being lured into a drop position with a food treat in hand. During the lure, the trainer says DROP. Chances are the word DROP will be overshadowed by the food lure hand. But an empty hand imitating the lure will readily become conditioned. For the dog to learn the verbal cue in this scenario, the verbal cue must be given prior to the hand movement of the lure. If the hand even begins its movement before or at the same time as the verbal command, it will overshadow the verbal. The verbal is redundant; it is not providing any new information that the commencement of the hand signal had not already provided. Three-term contingency: A, then B, then C Negative Reinforcement (R-) Strengthens behaviour Something the dog does not like is taken away. The behaviour is more likely to occur in future. Examples of R- Barking at the postman results in the postman going away distance increasing behaviour the dog wanted the postman to go away. A show dog growls at the Judge and sensibly the judge retreats. The dog wanted the Judge to go away. A rat in a Skinner box presses the lever to stop the electric current. A person takes Panadol and the headache goes away. A fearful dog out walking on lead with his owner barks at every dog he encounters, to keep them away. A tight lead is loosened when the dog performs the sit. The tightness or tension in the lead is removed as a consequence of sitting. A parrot screeches in alarm at the sight of a large plastic disc, the disc is removed from sight. The parrot is being trained to screech on cue. Positive Punishment Positive punishment procedures are also known as: Aversive Control of Behaviour PCI Operant Conditioning Notes Page 9 of 12
WARNING! The use of positive punishment can produce: 1. Emotional side effects in the dog that detrimentally influence the learning process. 2. Changes in procedure can dramatically affect the rate and extent of learning. 3. Inappropriate application can traumatise the dog. The use of a positive punishment may be appropriate in certain circumstances and may even be the most humane course of action. However, the risk is high that P+ procedures can produce unwanted and dangerous side effects. Many, many years ago at my local dog training club where traditional check and release techniques with check chains were utilised, we introduced positive reinforcement training using food treats. We found that people who had been poor trainers using the check chain were also poor trainers with food treats. It seems their timing and ability to motivate their dogs was going to be poor no matter what technique they utilised. However, it was recommended that the positive reinforcement techniques be adopted by the club as their philosophy on dog training because there was a reduced rate of aggression in dogs and increased safety for both dogs and people. I do not recommend positive punishment techniques to pet owners; it s just too dangerous for them, their children and the wider community. Non-Contingent Punishment Definition: the behaviour and the punishment are not related; they occur independently of one another. Example: scolding the dog several hours later for soiling in the house. Example: owner scolding the dog on arrival home for destructive behaviour performed whilst the owner was away. Doh! Punishment must be immediate! Non-contingent punishment will create a nervous dog and will be generally detrimental to the human/dog bond and relationship. It can also result in learned-helplessness. PCI Operant Conditioning Notes Page 10 of 12
Medical, Nutritional, Physical Antecedent Arrangements Hierarchy of Effective Procedures (Friedman, 2008) Positive Reinforcement Differential Reinforcement of Alternative Behaviours Extinction, Negative Reinforcement, Negative Punishment Positive Punishment Negative Punishment (P-) The dog does not want this to happen. Something the dog wants is taken away. Will reduce the likelihood of the behaviour recurring in the future. Examples of P- The food lure is withdrawn as a consequence of the dog lifting his front feet out of the sitting position. Heeling (progression towards the reward) is ceased when the dog leaves the perfect heel position. The seal trainer leaves the training area as a consequence of the seal s non-compliance. The child stops playing with the puppy when he bites. As a consequence of indiscriminate barking in the house; the dog is placed in his crate or in the laundry. A dog barks at his owner for attention; the owner leaves the room. A dog is barking outside the glass doors, to be allowed inside; the curtains are drawn. The trainer will not allow a game of tug to happen when the dog grabs the toy independently of the cue to do so. The dog jumps up for attention and the person then looks skyward, folds their arms and turns their body so that the dog falls off; attention has been withheld in response to jumping up. Negative punishment is also known as omission; withdrawing or withholding a desired result. PCI Operant Conditioning Notes Page 11 of 12
Negative punishment is combined with positive reinforcement in inducive training techniques such as luring, free-shaping (clicker) and capturing. Conditioned Negative Punisher (Pavlovian/Classical Conditioning) Ah-ah or oops tells the dog that the anticipated food reward will now be withdrawn or withheld: It is not shouted or growled at the dog it is not meant to threaten or intimidate, just communicate. Ensure that the dog is not able to gain success or reward via another means. It is conditioned on the go during training, as a consequence of incorrect responses or undesired behaviour. Conditioned negative punisher is also referred to as a Non-reward marker. This presentation of Operant Conditioning is designed to be undertaken in conjunction with the following texts: Learning and Behaviour by Paul Chance Excel-erated Learning by Pamela J Reid Ph.D. PCI Operant Conditioning Notes Page 12 of 12