Rewritten from the article, "Rewards, Reinforcers and Voluntary Behavior" published in Ethics Vol. 84, No. 1, October 1973

© 1999 Edward G. Rozycki

See also,
More on Rewards and Reinforcers
Evaluating Theories: a first approach

  edited 10/3/17
For Google (and other) web-rankings
of this article go to

It is a common error to suppose that any statement that one person rewarded or punished another can be more precisely put or tied into that body of knowledge we have about operant conditioning by appropriately rewording it. Thus "T rewarded (punished) S" is recast as "T positively (negatively) reinforced S." This confusion is widespread and bolsters the illusion that our everyday experience with reward and punishment attests to the adequacy of an operant conditioning model to explain this experience.


1. T positively (negatively) reinforces S's behavior, B, with (by) R.

2. T rewards (punishes) S for his act, B, with (by) R.

The first part of our argument will be to the point that neither does 1 imply 2, nor 2, 1. This will be shown to be so by virtue of the fact that the crucial concepts are, at best, haphazardly related. The identification of 1 with 2 is enabled by that same profound confusion which informs the argument of Michael Schleifer in his article, "Instrumental Conditioning and the Concept of the Voluntary.'' (Ethics 82, 2. 163-70) Our second interest will be to examine that argument.

From premises of the sort,

U: Only behavior which can be affected by reward or punishment is voluntary behavior

P: Behavior is voluntary if and only if it can be operantly conditioned,

F: Behavior heretofore thought involuntary has been found to be operantly conditionable,

Schleifer attempts to arrive at conclusions such as
M: It would seem as much (or as little) just)fied to blame or punish a person for illness as for some wrong-doing or crime
O: All behavior is voluntary.

Premise U is what Schleifer calls the Utilitarian's premise, and he claims that his argument admits of rational rejection only if U is rejected. He is mistaken.

Premise P is the psychologist's premise. Schleifer assumes that 'voluntary' means the same in both U and P because many psychologists who hold P hold to a variant of U, call it U'.

U': Only Behavior which can be affected by Reward or Punishment is Voluntary Behavior.

'Reward' (read 'reward-prime') and 'reward' are homonyms; likewise, 'Punishment' and 'punishment.' Other similarities are only apparent. It will be argued that U is not U'. Thus is Schleifer's assumption incorrect.

The crux of his argument is what he claims to be the relationship of premise F to premise P. If experimental findings are to have relevance to our concept of the voluntary, P must stand firm. It can be easily dismissed, however, for F does not -- through P -- bring pressure to bear on our notion of the voluntary, but undermines the factuality of P itself.

Rewards, Punishments, and Reinforcers

Is every positive reinforcer a reward? Is every reward a positive reinforcer? We will see that the answer to both questions is no. To begin we might notice that what is reinforced is neither persons nor organisms but behavior (operants). By way of contrast, one rewards or punishes persons for acts, not acts per se. Colloquially persons or organisms are said to be reinforced. This slovenliness prepares the ground for the merging of the two concepts. (A similar distinction can be made to contrast 'punishment' with 'negative reinforcer,' that is, one punishes persons for acts, not acts per se. One negatively reinforces not persons but operants.)

To have made out these distinctions, however, wins us no debating points. Whereas we might want to claim that they indicate a profound difference between the concept of reward and reinforcer, the counterclaim might be made that these very distinctions obscure the importance of the characteristics shared by the two. But there is more of a difference to be made out than this. Theoretically, the identity of the reinforcer as reinforcer is independent of that of the agent, whereas a reward cannot be identified as such without certain knowledge about the agent. For 'T positively reinforces S's behavior, B, with R,' R's identity as reinforcer is determined by its effects, for example, increasing the rate of B-type responses. That it was T who used R as a reinforcer for B is irrelevant to R's identity as positive reinforcer. Such is not the case with 'reward' (or 'punishment'). It is important both with respect to ethical and motivational considerations to distinguish 'reward' from 'gift' and 'stroke of luck'; also, 'punishment' from 'infliction' and 'mishap.' These distinctions are not made in terms of -- nor need we be able to distinguish among -- their effects, but in terms of the beliefs of T and S vis-a-vis each other and R. Such distinctions, however, are irrelevant to R's being a reinforcer. A gift, a stroke of luck, infliction of pain may all increase the frequency with which certain behavior is exhibited. Thus we may conclude that 'R is a positive reinforcer' does not imply 'R is a reward'; also 'R is a negative reinforcer' does not imply 'R is a punishment.'

If R is a reward, need it be a positive reinforcer? No, despite the fact that it is 'reward' rather than 'gift' or 'stroke of luck' that is conceptually linked to 'motivation.' What moves some to translate from 'reward' to 'positive reinforcer' is that the presentation of a reinforcer is contingent upon an appropriate response seemingly as a reward is contingent upon some performance. A reward is always for something; a gift, by definition, cannot be -- 'stroke of luck' is, conceptually irrelevant to 'motivation' entirely. Another seeming parallel between 'reward' and 'positive reinforcer' is that the latter -- by definition -- increases the likelihood of a response of the reinforced type; it is commonly thought that reward increases the likelihood of behavior of the rewarded kind.

However, it is not the reward per se that increases the likelihood that S's previously rewarded behavior will be repeated, but S's belief that the reward is contingent upon such repetition, that is, S's expectation of reward. It is clear that if we were to reward him with the understanding that he was to be henceforth disqualified from reward for the same behavior, his continued repetition of that behavior could not be accounted for by his having been rewarded. We might account for it as compulsive behavior, or as the effect of reinforcement -- in which case we would be normally disinclined to think it voluntary. Rational, reward-seeking individuals would not persist in the absence of expectation of reward. It is expectation which mediates between reward and its effects on behavior; by way of contrast, there is no theoretical mediator between reinforcer and operant.

But it is misleading to speak of expectation as a mediator, for it is sufficient unto itself -- given certain assumptions about the "nature" of the organism -- to produce, that is, account for, repetition of the selected behavior, even in the absence of reward. Indeed, one wonders why there is not so much insistence that expectation of reward be identified with the reinforcer as there is that the reward be. A plausible explanation can be given if we consider the measurement fetish which dominates research in motivation and learning. Expectation seems unsuitable for measurement, but if we (mis-)take as the reward that measurable object which is given in reward, the incongruence of concepts from reinforcement theory with those of our traditional moral and educational discourse can be more easily overlooked. Rewards are not brute facts; they are institutions, which is to say, they require for their identification, such things as recognizing contexts, intents, traditions, etc.

The identity of R as reinforcer is dependent on its having certain effects. Thus to identify some R, as a reinforcer is either to (a) claim that it has had these effects previously, or (b) hypothesize that it will have these effects in the future. But to identify R as a reward requires no knowledge either of the history of R or hypotheses about its future. We can thus conclude that 'R is a reward' does not imply 'R is a positive reinforcer.'

Behavior and 'behavior'

The reinforcement theorist must maintain the conceptual independence of reinforcer and operant. But we will see that the reward cannot be identified independently of the behavior for which it is the reward.

Johnny straightens his bed and picks up the toys in his room. His mother gives him a dime, saying that it is his reward for helping her. Johnny goes back to his room, musses his bed, and strews his toys around. He then restraightens the bed and picks up the toys. This time, however, his mother will not give him a dime. She is not being inconsistent, and it is true when she says to Johnny, "That's not what I rewarded you for the last time!"

The dime is clearly not a reward unless it is intended to be for something. Its identity as reward is a matter of what it is a reward for. There is an ambiguity here in common usage: we can say, "An eight-dollar reward is the same as a 100-peso reward," where what is given in reward is spoken of as the reward. But we do -- because we must -- distinguish what is given in reward from its identity as reward. The thief who steals the reward money has not gotten a reward. A dime given in reward for straightening one's bed is not the same reward qua reward as a dime given in reward for helping one's mother.

It is a commonplace that human bodies in -- or out of -- motion may admit simultaneously of several incompatible descriptions, and that the correctness of any given description is not so much a matter of what we, "camera-like," observe, but of the assumptions we bring to the observation. We,.rationally choosing rewards to offer to prudent and desirous individuals, are doing something quite different from that of the reinforcement theorist looking to find out what classes of stimuli will produce behavior in an organism which he can count as a repetition at a preferred level of description. What was Johnny doing that his mother rewarded him for? We cannot tell without knowing the description of the reward qua reward, that is, what it was for. Only then will we know what to count as a repetition. Consider how different the repetitions might be if what Johnny did were described as straightening the bed and picking up the toys as contrasted with helping his mother or filling in for the maid on her day off or even showing his mother he can assume responsibility for his own things.

The reinforcement theorist chooses a particular type of description and works with the hope that other types will "reduce" to it, thus allowing generalization beyond the particular context for which it was chosen. He does this not necessarily out of fanatical adherence to a particular philosophical viewpoint, but because what he conceives his enterprise to be demands it. His data must admit of certain mathematical manipulation; more specifically, the categories of behavior he uses must (a) classificationally exhaust all possible behavior, and (b) be mutually exclusive; otherwise the axioms of the probability calculus cannot apply to them. Let us call any system of behavior categories which meets conditions a and b above a Behavior Partition (BP). If the category by which any given behavior is classified is not a member of a BP, then any statement about its having occurred a certain percentage of times in the past, or about the probability of its future occurrence, is statistical nonsense. If B is not a member of a BP, then "B has been reinforced" is gibberish.

It is easy enough to show that the categories of human action of Standard English do not constitute a BP. Take any two categories -- call them here X and Y -- for which the following formulations may hold for some agent, S:

w: S is X-ing but not Y-ing, and

x: S is Y-ing but not X-ing.

If we can now formulate

y: S is Y-ing by X-ing, or

z: S is X-ing by Y-ing,

X and Y do not belong to a BP, for they are not mutually exclusive. If there is such a pair in Standard English -- there are many such -- then Standard English is not a BP.

To the extent that a system of behavioral categories excludes the beliefs of the agent as a criterion for distinguishing among rypes of behavior, so must it fail to identify voluntary behavior. Reinforcement theory excludes such criteria. We will use two criteria of voluntariness:

Vl: If 'S is X-ing' describes S as performing X, then X is voluntary if and only if the following description is also available: 'S is refraining from X-ing.'

The description must be available; it need never be applicable.

V2: X is voluntary if and only if the following description is available: 'S is trying to X' (corrected 3/2/14 -- EGR)

The criteria V1 and v2 are truth-functionally equivalent but manifest different aspects of voluntariness. Criterion V1 says that S's behavior is voluntary if and only if he can prevent it; V2, that S's behavior cannot be spontaneous lest we lose the distinction between 'refrain from' and 'suppress.' The aspects of voluntariness mentioned above manifest themselves in the asymmetry between 'refrain from' and 'try to.' One can refrain from only that which is known to be possible, but one can try whatever one believes is possible. One can try -- albeit futilely -- to do the impossible, but one cannot knowledgeably so try. Depending on S's beliefs and knowledge, the formula

m: S is trying to X by Y-ing

need never be false for any values of X and Y. "What is S trying to do?" is a question about S's beliefs (or intentions) and seldom indicates an inability on the part of the questioner to find some suitable way of describing S's behavior.

'Trying to X' is not a category of the same rank and order as 'X'; the two can never be mutually exclusive -- otherwise we would get "S is X-ing, therefore he is not trying to X."

Our argument is straightforward: no system of categories is adequate to identify voluntary behavior, unless for every category X, it contains also Xr, 'refrain from X,' and Xt, 'try to X' -- any X for which Xr and Xt are not available is by definition not voluntary. But X and Xt are never mutally exclusive. Therefore no system of categories of voluntary behavior is a BP. A similar argument gets us the converse.

In summary: We have argued to establish that

I. T positively (negatively) reinforces S's behavior, B, with (by) R,

does not imply -- nor is it implied by --

2. T rewards (punishes) S for his act, B, with (by) R.

But a stronger conclusion is warranted: 1 cannot imply 2; 2 cannot imply 1.

Research and Moral Concepts

In light of our previous discussion we consider now the argument of Schleifer; we will examine the premises only and let the conclusions fend for themselves. The utilitarian premise is a quote from DuCasse -- call it U -- , "that a person is now morally responsible for his voluntary acts . . . means simply that to praise or blame him or otherwise reward or punish him for something he now does or did will tend to cause him to act, or tend to inhibit him from acting in a similar manner on similar occasions."s Rewards, etc. are said to tend to cause persons to act. This is not significantly a weaker statement than to say merely that they cause persons to act. There is no tendency for which there is no conceivable circumstance in which it manifests itself. Therefore, let us imagine an occasion -- call it O -- on which this tendency is manifested. We can say of the agent -- according to DuCasse -- that on O his act was caused by the prior reward or punishment. It is notoriously difficult to unpack the notion 'cause,' but we can make the following minimal demand, that is, that

3. 'A causes B' implies 'B is accounted for by A'

for intuitively one would not want it that A caused B but still B was not to be accounted for by A. Let us reformulate with R, a reward, S, the agent, and B, the act. We want to say that on occasion O, R caused S to B. Now, the description of O either (c) contains a statement of S's belief that R is contingent on B, or (d) it does not. If (d), then R is not the cause of B because in the absence of such expectation R cannot account for B -- the utilitarian premise is false. If (c), then at the very least the formulation that reward or punishment has the tendency to cause S to act is misleading to the extent that the confusion of 'reward' with 'positive reinforcer' is easily made. DuCasse's definition does not unequivocally commit him to the conflation of these concepts, but the suppression of the important role played by S's expectation goes a long way toward muddling the distinction between them.

Schleifer claims that DuCasse's characterization of 'voluntary act' "is accepted by virtually all psychologists." What this claim amounts to is unclear. Does he mean (e) the characterization is accepted as formulated in ordinary English, or (f) the characterization is accepted as formulated in that degenerate form of English in which 'reward,' 'gift,' and 'stroke of luck' are lumped together and distinguished from the conglomerate 'punishment-infliction-mishap' by its consequences only? I suspect that he means (f), for only then could he begin to think that there was an argument to be made out along these lines.

The psychologist's premise -- quoted from J. Cohen -- is this: "Operant conditioned responses . . . are defined as changes in responses after they have been followed, on prior occasions, by the presentation of reward and punishment. . . . These are the 'voluntary responses." Schleifer renders this

P: "Behavior is voluntary if and only if it is instrumentally conditionable "


F: "It has been satisfactorily demonstrated that all the paradigms of involuntary behavior can be instrumentally conditioned: heart rate, . . . [etc.], . . . respond to reward and punishment.''
Cohen's degenerate usage has become Schleifer's. If P were unassailable, it would allow us: 'If B is conditionable, then it is voluntary.' F allows B to range over involuntary behavior. But P is questionable apart from its resting on the reward-punishment-reinforcer confusion. P is not an empirical finding. It is conceivable that it might be factually true. But according to F, it is factually false.

English and English

It w ould seem that whenever someone constructs a Behavior Partition, his language must of necessity be a degenerate form of ordinary English. We might go so far as to say that psychologists using a BP talk not about learning but about learning -- read this "learning-prime" meaning 'quantifiable learning' -- not about behavior but about behavior, etc. The psychologist claims to be talking about, say, voluntary behavior. We want to say that he is really talking about voluntary behavior. But there is no prima facie distinction to be made between what someone claims to be talking about and what he is in fact talking about -- we would not want to deprive him of the possibility of making false statements. It is usually only after analysis that we can say that -- for example -- psychologists' statements about voluntary behavior are false. But the psychologist rejects this evaluation and counters that within the framework of his definitions, his statements are true. The alternative that seems to be open to us now is to concede him his truths but to qualify them by saying he is speaking a special language. The psychologist is unhappy with this too, because it deprives him of a certain polemic advantage, that is, the use of this language in suggesting that the objects of his research are, after all, the objects of a more general concern. But then, too, he may concede our point, insisting, however, that his language -- our criticisms to the contrary notwithstanding -- is ultimately more "objective" and less "philosophical" than that in ordinary use.

I would -- in conclusion -- like to address a brief argument to the point that in fact ordinary language categories are more objective and less "philosophical," that is, committed to acknowledgedly untenable philosophical assumptions, than those of the psychologist using a Behavior Partition.

Let us imagine we are engaged in constructing a test for anxiety, that is, some instrument by which we hope to justify judgments that the subject being tested is more or less anxious at some time t1 -- perhaps even to a given degree -- than he was at some reference time to. In test construction there are two general theoretical concerns, that is, about the internal and external validity of the test. The internal validity of the test -- commonly called the reliability -- is an assessment of the degree to which the outcomes of the test will vary only randomly given repeated application under conditions assumed to be the same in all relevant respects. Reliability is an assessment of test consistency. The external validity is an assessment of the degree to which the test actually measures what it is purported to measure. For example, our test will be externally valid if and only if it measures anxiety.

It is important here to notice that two sources of information external to our instrument must be available to make assessments of validity: (1) we must know, or have good reason to assume, that for assessing reliability, retesting conditions are the same in all relevant respects; (2) for external validity we must have some way of identifying what we are testing for. As regards our anxiety test, we must have some test-independent means of identifying anxiety and making rough assessments of degree.

Let us consider as test instruments ordinary English, E, and quantifiable English (BP English), E'. We will evaluate E and E' for internal and external validity. A criterion of reliability is degree of agreement among users of E and E' and individual and group consistency through a sequence of retests. We find them both to be highly reliable although differing in range of applicability. But when it comes to assessing external validity, it strikes us that there is no external source of information for making this assessment that is independent of E.' Looking back to our reliability assessments, we find that sameness of test conditions is formulated in E for both E and E'. The psychologist wants to claim that E' is more objective than E. If by 'objective' he means 'independent of the observer qua individual,' then E is no less objective for concepts of a public character; if he means by 'objective' 'statable in public criteria,' then there is no reason to believe that where the criteria are indeed public, concepts in E cannot be stated thus. If by 'objective' he means 'having only public criteria of identification,' the burden of proof rests on him to show how, failing this, E fails other conditions of objectivity.

Thus for E' it yet remains to be shown that it is superior to E -- comparing its general range of use with that of E, it fares badly indeed. There is yet one criticism of E to be considered: E is not so objective as E' because its use commits one to untenable philosophical theories; thus B. F. Skinner in Science and Human Behavior (1953, 40): "Historical and comparative facts about particular governments, religions, economie systems, and so on have led to certain traditional conceptions of the behaving individual, but eaeh of these eoneeptions has been appropriate only to the particular set of facts from which it derived.'' But actually the user of E' does presume the truth of certain philosophical theories, that is, that the world presents itself to the observer prelinguistically partitioned, thus warranting thc rluantificational model used, that is, the BP. The preferred level of description is not preferred for aesthetic reasons but for covert philosophical ones, for example, that such things are real. Such theories are demonstrably untenable, but it is through E, not E', that one disabuses oneself of them.

And what is the relevance of E' -- and more specifically, reinforcement theory -- to our moral, educational, and other concerns? I quote from Skinner again, saying thar thc conception of the behaving individual expressible in it "is appropriate only to the particular set of facts from which it derived."