Cari Yang Anda Inginkan

Thursday, December 10, 2009

EVALUATION

Tis with our judgments as our watches, none Go just alike, yet each believes his own.
-ALEXANDER POPE
Evaluation, the fourth component of the curriculum, is probably the most nar- rowly viewed aspect of the educational enterprise. In most curriculum book that deal with the topic, it is almost always treated exclusively in terms of the evaluation of student achievement, often in connection with assigning "grades or "marks." Even in a comprehensive text, which accords curriculum evaluatior far broader scope than most treatments of curriculum, the focus of evalua Bon is principally on "the degree to which pupils attain . . . objectives" (Tab: 1962, p. 312).


Of course, there is substantial truth in the claim that "the proof of the pud ding is in the product." But while the evaluation of student achievemen (product evaluation) certainly constitutes an important part of curriculum evaluation, it by no means approaches what may generally be conceived of a comprehensive curriculum evaluation. (A comprehensive evaluation, for exam ple, would also emphasize such considerations as the correspondence betweer stated objectives and curriculum content and even an evaluation of the objet tives themselves.) Preponderant reliance on product evaluation is appropriate mainly in situations which involve training and the technical model of cur riculum development.!
Product Evaluation: The Technical Model
What passes for curriculum evaluation today is almost always "product evalua¬tion" centered on the student and based on the technical model of curriculum
1. See the section "Training: the Technical Model" in Chapter 13.
370 Evaluation
development. It will be recalled that in the technical model, learners, viewed as "raw material," are subjected to certain curricular and instructional treat¬ments in order to produce a "finished product" that meets predetermined objec¬tives. Since judgments of curriculum effectiveness are based almost entirely upon an assessment of the degree to which curriculum objectives are attained by learners, "curriculum evaluation" based on this model in fact turns out to be little more than an estimation of goal achievement. The fallacy inherent in this narrow concept of evaluation is pointed up in the following example: "An American History curriculum, K-14, which consisted in the memorization of names and dates would be absurd—it could not possibly be said to be a good curriculum. no matter how well it attained its goals" (Scriven 1967, p. 52).
Narrow and inadequate though it may be as the sole basis for comprehensive curriculum evaluation, however, product evaluation provides important data For comprehensive curriculum evaluation. Clearly, one criterion by which curricu¬lum effectiveness is legitimately judged is its "payoff"-that is, the quality of the "product" that it turns out. For this reason, the process, which is in Fact a highly complex one, merits careful study by curriculum planners. In the following sections we shall discuss some of the more prominent aspects of product evalu¬ation.
MEASUREMENT AND EVALUATION DISTINGUISHED
Judgments regarding the degree to which learners have achieved curriculum objectives will be most valid if they are based on empirical evidence. This empirical evidence often takes the form of educational measurement, defined as "the process that attempts to obtain a quantified representation of the degree to which a pupil reflects a trait" (Ahmann and Glock 1967, p. 11). Measurement data are basically descriptive in nature and usually are expressed in numerical terms in order to avoid the value connotations that are connected with words. Thus, an individual's height and weight, recorded at seventy-two inches and ninety-seven pounds, simply provides measurement data without implying that the individual is short or tall, light or heavy.
Evaluation, in contrast to measurement, constitutes a value judgment. For example, we may offer the evaluation—i.e., the judgment—that an individual is "underweight for his height" and support the evaluation with the measurement data reported in the previous paragraph. Of course, in order to make the evalu¬ation "stick," it will be necessary to demonstrate that most people who are seventy-two inches tall weigh more than ninety-seven pounds (or that most ninety-seven-pound people are shorter than seventy-two inches). Even the simple example illustrated above suggests that good evaluations, generally, are based on a great deal of information derived from many sources.
It should be clear from this discussion that while "measurement" and "evalu¬ation" are distinct in meaning, they are decidedly related terms. Measurement comprises a substantial part of the more inclusive process of evaluation.
Product Evaluation: The Technical Model 371
TESTS
The most common resource for measurement data in schools is paper-and-pencil tests. Thus, tests provide the bulk of the data on which product evaluations usually are made. A test is ordinarily defined as a group of questions or tasks to which learners are asked to respond—orally, in writing, or sometimes even in pantomime. It is presumed to consist of "a representative sample of all possible questions and tasks related to the trait measured by the test . .." (Ahmann and Clock 1967, p. 14). But measurement need not always involve testing (e.g., teachers' responses on checklists and rating scales related to learner achievement can also constitute measurement data); and evaluations need not even be based on tests at all. Tests constitute a particular kind of measurement that can pro-vide useful data for curriculum and learner evaluation, but when they are over¬emphasized, they can distort curriculum evaluation and even unintentionally in¬fluence curriculum goals and outcomes (e.g., "test anxiety" might inhibit the attainment of a "creativity" goal).
INDIRECT NATURE OF MEASUREMENT
The discussion of behavioral objectives in Chapter 13 pointed up the signifi-cance of operationalism in the process of empirically determining the degree to which curriculum objectives had been achieved. Many of the same issues of operationalism underlie the process of measurement.
When we measure the length of a piece of paper or a board to be cut, the element being measured is clear, observable, "obviously present and measurable" (Ahmann and Clock 1967, p. 21). The process is one of direct measurement. But when we utilize a test to measure some psychological trait—such an anxiety, self-concept, creativity, intelligence, of history achievement—we are inferring the presence of the trait from observable responses to' the measuring instrument. In other words, we are inferring the presence of the trait from what we assume to be the effects of the trait.
The rationale used in constructing measuring instruments is quite nicely illustrated by the procedures to which we all subscribe in judging the "intelli¬gence" of friends and acquaintances. For example, if asked to justify our opinion that a friend is "highly intelligent," we may point out that he holds a master's ..degree, that his income is above average, that he "catches on quickly" to diffi¬cult riddles, that he is a fascinating conversationalist, and that he reads pro¬found philosophical treatises. In principle, we have "measured" our friend's intelligence by counting up a significant number of observable behaviors and achievements which we consider to represent the effects of intelligence. On a more formal basis, this is the very same procedure employed by psychologists wishing to construct a measuring instrument for general intelligence. Starting with an analysis of the concept "general intelligence" they would go on to iden¬tify a number of second-order constructs such as are represented in Figure 16-i.
P2 Evaluation
These second-order constructs would then be translated into observable behav¬iors. Thus, as we do, the psychologists infer the presence of intelligence from the manifestation of behaviors that are taken to represent the effects of intelli-gence. Of course, the example presented in Figure 16-1 is grossly oversimplified; but it places into sharp profile one of the basic questions raised by the indirect nature. of measurement: Are the specified behaviors, and only the specified behaviors, the effects' of the hypothesized construct? The answer, of course, must depend on one's definition of the construct; and the only possible conclu¬sion is that the construct being measured is defined by the test that measures it. This notion that, in effect, "IQ is whatever the IQ test measures" is borne out by research with intelligence tests. It has been found that two of the most highly developed and reliable individual IQ tests (the Stanford-Binet and the Wechsler Intelligence Scale for Children [WISCD]) correlate only .60 to .80 (Ahmann and Glock 1967, p. 380). These statistics indicate that the two tests indeed define "intelligence" in significantly different ways and so are measuring different (although apparently overlapping) constructs.
The implications of this problem of indirect measurement are significant for curriculum evaluators. In the first place, it seems clearly imperative that curricu¬lum workers know not only what construct a test is measuring, but what effects are being taken as evidence of the construct's presence. Clearly the behaviors rep- resented in Figure 16-1 are an inadequate index of any sophisticated conception of intelligence. Correspondingly, an inspection of the behaviors demanded by most achievement tests (both standardized and informal) raises serious questions with respect to the inferences that can legitimately be drawn about student "achievement" in the areas being tested. Additionally, the fact that all psycho
logical traits are inferred from behavior should place anyone using this measure
ment data on warning that the process is so highly complex that measurement data can rarely be relied on to provide definitive answers to the questions

CONSTRUCT I PRIMARY
INTELLIGENCE
Highly Verbal
boy is to girl as _ is to hen
simian, invective
FIGURE 16-1
Measuring the intangible construct "intelligence."
SECOND ORDER CONSTRUCTS
Good Immediate Memory
Good Concept Formation
BEHAVIORAL EFFECTS
Repeats Six- Digit Numbers
Solves Analogy Problems
Defines College Level Vocabulary Words
Product Evaluation: The Technical Model 373
posed by evaluation. "All too frequently, the crudeness of the information pro-duced by measuring procedures prevents us from even ranking pupils" (Ahmann and Clock 1967, p. 26). This does not mean that psychological measurement ought to be abandoned; it does mean that extreme caution and prudence need to be exercized in drawing inferences from measurement data.
STANDARDS FOR PRODUCT EVALUATION
Given a sufficient quantity and wide variety of data on which to base student evaluations, what standards should be used to judge relative success or failure? Basically, there exist four standards for evaluation:2 the absolute maximum standard, the absolute minimum standard, the relative standard, and the mul¬tiple standard.
THE ABSOLUTE MAXIMUM STANDARD. An absolute standard is an arbitrarily set level of achievement against which all students are evaluated. It may be set either at a maximum or minimum level. At the maximum level, the standard is out of reach of all but the most able students. The percentage system of eval¬uation prevalent in most public schools is an example of the absolute maximum standard: 90 to 100 percent "correct" responses to curriculum tasks represents an "excellent" level of achievement. All students theoretically can achieve this fixed level, but rarely do more than a very few reach it. On the other hand, all may theoretically "fail" by achieving less than the 70 percent correct responses that represents the customary fixed level for "passing." Sometimes the majority of the class actually does fall below this level. If the absolute maximum standard is taken seriously, actual student achievement does not call into question the legitimacy of the standard. If all fail, the student raw material is judged to be inferior; if many achieve high evaluations, the students are viewed as high quality raw material. (In actuality, however, the "absolute" standard often "adjusted" to bring about a "reasonable" distribution of achievement levels.: The maximum absolute standard is most appropriate to product evaluatior within the framework of the technical model of curriculum.
THE ABSOLUTE MINIMUM STANDARD. The minimum level of the absolute stand
and usually is set at a point that ensures success for virtually all students in the program. Students who do not achieve mastery of curriculum objectives at the minimum level are retaught until the standard is met. A curriculum is no, judged to be effective unless all students achieve ail the objectives that hay( been prescribed for them. When the minimum standard is exclusively employer as the evaluation criterion, the problem of "grading" or sorting of students is eliminated. As the exclusive criterion of evaluation, the minimum absolutt standard is most appropriate in the training paradigm. Indeed, the minimun absolute standard is advocated very strongly by many proponents of behaviora
2. I am indebted to Kenneth H. Hoover (1968, pp. 553-555) for the basic schema used to classify evaluation standards.
374 Evaluation
objectives under the nomenclature of "performance standards" or "learning for mastery" (e.g., see Bloom, Hastings, and Madaus 1971, pp. 5-57).
Sometimes the minimum standard is used as a mastery base which ensures a "passing" grade for students, but which many students may go beyond in order to achieve "higher grades." Depending upon the standards used to determine the "higher grades" the total product evaluation may be useful in either the technical or humanistic model.
THE RELATIVE STANDARD. The relative standard is most Familiar in connection with "scaling grades" on the "normal curve." This standard of product evalua¬tion judges each student against the relative performance of the group. Thus, the group's mean performance (in conjunction with the standard deviation) operates as a kind of sliding scale against which individual achievement is judged. Unlike the absolute standard, it is highly competitive, since high achievement in this relative situation consistently demands achievement higher than that of most others in the class. Often, competitive pressure builds to the point of getting in the way of learning or, more significantly, of producing learning outcomes that ate unintended and undesirable (e.g., the attitude that any measures that enable one to "beat the competition" are acceptable).
Another problem associated with the relative standard of evaluation is that it assumes that all competitors are essentially equal in the ability to succeed, i.e., that every person in the class has the potential for being top scorer in the com¬petition. The assumption, of course, is false. Some individuals have more ability in math, some in language subjects. Some students have reading difficulties,. others lack prerequisite skills. Thus, the contest is "stacked" so to speak so that it is reduced to competition only among the most able few. The result for the less able usually is discouraging and demoralizing. Indeed, many just quit, so that the evaluation of these students represents a rejection of achievement rather than lack of it. (Ironically, their rejection of the contest makes the bona fide competitors look even better than they would if the less able in fact. com¬peted.) The conclusion, of course, is simply that a competitive situation is good (and valid) only for those who really believe they have a chance to win.
A relative standard for product evaluation, however, has certain advantages in terms of feedback for guidance in curriculum revision. For example, the rela¬tive standard provides us with a normative base line: that can serve as a guide with respect to reasonable expectations for student achievement. To illustrate: when the mean measurements taken for a science curriculum project consistently fall below the 30 percent level for correct answers, the greater probability is that one or more components of the curriculum (or instruction) is drastically dysfunctional, rather than that the students are very weak. Perhaps the objec¬tives are far too ambitious; perhaps the content is inappropriate; perhaps the learning activities are badly sequenced; or perhaps the measurement instruments are faulty. In any case, the relative standard has provided the kind of data that direct us to "check up" on the curriculum—to reexamine and revise the cur¬riculum plan. But for the time being, at, least, students are not penalized with
Product Evaluation: The Technical Model 375
the low grades they might have received under an absolute standard. Individual levels of achievement are adjusted to the relative performance of the group..THE
MULTIPLE STANDARD. The fourth standard For product evaluation, the mul¬tiple standard, consists of the growth that each student undergoes from the in¬ception of instruction to the point of evaluation. Figure 16-2 shows graphically how this evaluation standard operates. While student A's final achievement after instruction is 2 units more advanced than student B's (9 compared with 7), A began instruction at a point that was 4 units in advance of B (6 compared with 2). Thus, the bars representing each student show that B's actual growth is 2 units greater than A's (a growth of 5 units for B compared with only 3 units for A). Strictly speaking, then, under the multiple standard of evaluation, B's achievement would be rated higher than A's.
Although the multiple standard is the most "individualized" and therefore the "fairest" standard to use, there are many problems associated with its use. The measurement of clearly defined, operationally stated objectives after instruc¬tion is a difficult task. To attempt to measure a variety of traits at the inception of instruction for the purpose of determining each student's general stage of development is virtually impossible. Then, too, "units of achievement" rarely proceed at equivalent intervals; i.e., the degree of growth between units 3 and 4, For example, cannot be assumed to be equivalent to the degree of growth be¬tween units 6 and 7. Finally, the extensive use of the multiple standard is highly impractical when one thinks about the large numbers of students to be evaluated. For a teacher having responsibility for five classes of thirty-five to Forty students there is little opportunity to implement the procedure.
Implementation of the multiple standard of product evaluation, however, is Feasible in a highly restricted training situation, but it is usually not well suited in principle to the purposes of the technical model. For example, the preassess¬ment of a learner's stage of development in reading a foreign language can be done in a fairly precise manner; but for training purposes, a postinstructional evaluation based on a minimum absolute standard or a relative standard would be more appropriate. We should• note, however, that a foreign language cur¬riculum built on the humanistic model might very well employ the multiple standard.
One final note on the human dimension in product evaluation is appropriate.

FIGURE 16-2
The Multiple standard of evaluation.
2 3 4 5 6 7 10
1 T 1 1
Student A

Student B
1
1 2 3 4 5 6 7 8 9 10
376 Evaluation
To the extent that human relationships are taken into account in the develop. ment and implementation of curriculum, they supplement, enrich, and some¬times even transcend evaluations based on precise, recordable data. Broad new insights; a revised Weltansicht; or feelings of warmth or excitement about a teacher, another student, or a new idea may represent a far more significant learning outcome than all those represented by the data amassed through con¬ventional instruments.
The point is illustrated in the story of Andy and Bill, two sixth-grade boys. They had completed a current events lesson in which one of the facts learned was that seventy-six American soldiers had been killed in Viet Nam the previ¬ous week. The evaluation of the lesson was a quiz, and one of the questions was: "How many American soldiers were killed in Viet Nam last week?" Both boys answered "seventy-six" and were given full credit for the "correct answer."
The information that seventy-six American soldiers had been killed was school data for Bill. He might forget it the next day, or he might always remember it; but at least for now it was only a statistic. But eight weeks before this school lesson Andy and his family had received word that his older brother Ken had been killed in Viet Nam, and only four weeks before, Ken's body had arrived home. The news of his brother's death and the funeral constituted an emotionally significant experience for Andy. Because of it, the information that seventy-six young men like Ken" were now dead had a terrific impact on him. Recalling Ken and the relationship he had had with him—playing catch, going for rides, building shelves for his room, talking about sports and school and even politics—Andy was overwhelmed by the enormity of seventy-six dead Kens and of the holocaustic meaning of mass killing and war. Questions began to Rood into his brain:• What happens to a person when he dies? What happens to all those dead people? What does it mean to be alive? Why are we here, living in this world? Why do wars start?
The evaluation of the kind of powerful• incipient learning suggested by the above questions does not ordinarily occur through the vehicle of conventional evaluation instruments because the kinds of data it requires are not available from these sources. The information necessary for assessing many of the most important curriculum outcomes of liberal education resides in the human rela¬tionships developed during the course of curriculum implementation. Given a conducive relationship between teacher and learners, unexpected questions, feelings, and ideas are channeled back to the teacher as a consequence of inter¬actions. Such data serve as an invaluable guide in the revision and evolution of the curriculum. It is in informal situations involving sensitivity to the messages inhering in human interaction that the multiple standard of evaluation is by far the most appropriate.
Which of the four standards of evaluation is best? The question is probably an inappropriate one. While the absolute maximum standard is probably not defensible in any situation, conditions usually call for some combination of the other three. Evaluations which utilize a variety of standards tend to reflect most accurately the multidimensional richness of human learning. •
Product Evaluation: The Technical Model 377
EVALUATION VERSUS GRADING
Before concluding this section on product evaluation, a few words regarding the distinction between evaluation and "grading" are in order. Grading, whether by letter, number, or other symbolic representation, is a kind of shorthand sys tem for recording and reporting the evaluation of individual student achieve ment. Grading (the shorthand record-keeping system) is convenient to the degree that mass education involves keeping achievement records and peri odically communicating educational progress for large numbers of students. Bu while certain inferences may be drawn from grades, grades do not constitute and should not be confused with, evaluation. Product evaluation—the evaluation of student learning—is far too complex an enterprise to be reduced to a single symbol. An effective evaluation that would constitute a comprehensive represen tation of a student's educational progress would include, among other factors measurement and other relevant data; an analysis of the student's interests capabilities, and achievement; and conclusions based explicitly on appropriate combinations of minimum, relative, and multiple standards.
Bin while a system of grading does not constitute an evaluation, it neverthe less influences (sometimes significantly) curriculum outcomes. For example, the "ABCDF" system has been criticized for increasing student anxiety because o its built-in threat of failure. In addition, it has been said to be punitive and ti have discouraging effects because it continues to reduce the student's grade point average long after he has "caught on" and is doing creditable academi work. One alternative to this traditional way of grading is the ABC no-entr system, which removes the threatening and punitive aspects of failure by simple "not counting" course work that is not satisfactorily completed. The proponent of this system claim interest only in degrees of successful performance (ABC; not in degrees of inadequate performance (DF).
Another "innovation" in grading is the "pass-fail" system, which simply n duces the five-point (ABCDF) scale to a two-point (P-F) scale. It is said of th P-F system, however, that it is just as anxiety producing and punitive as th ABCDF system, but it is worse because it encourages mediocrity by not recognizing and rewarding excellent performance.
The problems of grading have been responded to in a variety of ways, including the call to "abolish grades." But recent calls by school reformers to abolis grades have not seemed to take into account the distinction between gradin and evaluation made in the previous paragraphs. To abolish evaluation woul be unthinkable, if not impossible. Even if we could operate without makin judgments about the value of what we were doing in curriculum, it is doubtful that intelligence would permit such a course. With respect to our present sy terns of grading, however, abolition might be a real possibility. The reason is that the systems not only fail to communicate student evaluations reasonably clearly but their side effects are punitive, threatening, discouraging, and in a genera sense antithetical to much of what we are trying to achieve in education. (St "Goals and Roles of Evaluation" in the following section for an extended trea
378 Evaluation
ment of this issue.) In view of these conditions, it would seem that we ought to be able (1) to devise better record-keeping systems than we have, and (2) to interpret these shorthand systems far more intelligently than we do. Thus, cur¬riculum planners should be urged to experiment on a broad scale with short¬hand systems that would serve the recording and reporting requirements of mass education and at the same time avoid the adverse effects on curriculum outcomes that are so prevalent in present procedures.
Comprehensive Curriculum Evaluation
Comprehensive curriculum evaluation is an enormously complex undertaking that defies attempts to codify the process either in terms of sequence or com¬ponents. The reason for this distressing state of affairs is that comprehensive curriculum evaluation involves not only the assessment of a written document (the "inert curriculum" or curriculum plan) but more important, of the im¬plemented curriculum as a functional corpus of phenomena involving the interaction of students, teachers, materials, and environments. To make matters worse, most of the significant aspects of the implemented curriculum have to do with intangibles, such as thought processes, attitudes, meanings, relation¬ships, feelings, etc., which can only be inferred from tangible behaviors that we assume (sometimes mistakenly) to be the effects of the constructs in which we are most interested. Furthermore, the implemented curriculum can only be assessed in terms to a large degree controlled by the instructional medium through which it is. executed or made operative. This condition injects into the evaluation process a whole new series of variables which must be taken into account.
Other difficulties arise when we consider that the "inert curriculum," i.e., the document which constitutes the total curriculum plan, often does not even exist Most often, an established curriculum is already operative in the school setting and the curriculum staff is challenged to change the old for some new plan. But the new curriculum plan as a finished document usually exists only hazily in the minds of the curriculum staff, and pilot implementation begins (as it should) with just a portion of each of the four curriculum components formulated and intact. Evaluation of this embryonic plan and its preliminary implementation, then, proceeds in tandem fashion, the feedback ahem e y provided by each aspect contributing data useful in further development of both the inert and functional curricula. Thus, while it is sometimes conveni¬ent to think of the curriculum plan as a full-blown document ready to be implemented afresh in a virgin school setting,, reality rarely permits such an ideal situation. Even where the "old" curriculum might exist complete in document form, it usually is obsolescent and therefore largely irrelevant to much of the curriculum operative in the school.
The above paragraphs suggest just some of the many practical operational difficulties associated with comprehensive curriculum evaluation. But as a
Comprehensive Curriculum Evaluation 379
matter of principle, it should also be clear that, because curriculum evaluation is a component of the total curriculum, its design and procedures will be significantly affected (as is the total curriculum design) by such foundational factors as philosophy, cultural analysis, conceptions of the nature of man, and other values. Hence, the design for a comprehensive curriculum evaluation cannot be legislated abstractly and no single "method of evaluation" can be proposed as an appropriate instrument for the evaluation of all curricula. In short, the nature of the curriculum evaluation will be substantially determined by the intent and design of the curriculum to be evaluated.
In spite of this limitation on the preplanning of a standard procedure for curriculum evaluation, certain recurring principles and issues in evaluation can be cited that will provide guidance to planners as they design the evalua¬tion component for a specific curriculum plan. The following sections briefly explore some of the most significant issues connected with comprehensive curriculum evaluation.
GOALS AND ROLES OF EVALUATION
Clarity about the function of evaluation in curriculum is essential if the evalu¬ation is to contribute what it should to the implemented curriculum. Scriven (1967, pp. 40-43) draws the distinction between the goals of evaluation and the roles of evaluation. The principal goal of evaluation is the determination of how well a curriculum performs when measured against certain criteria or when compared with another curriculum. Arriving at this overall determination, of course, implies a number of more specific subgoals. But the ultimate purpose of evaluation is essentially the same, whether we are trying to evaluate "coffee machines or teaching machines, plans for a house or plans for a curriculum" (Scriven 1967, p. 40).
The roles of evaluation as it operates in a particular sociological or curricular context, however, can (and probably should) vary enormously. Depending upon how the evaluation is designed and executed, it can perform differentially (play a variety of roles) in the curriculum development process, in the execu¬tion and implementation of curriculum, or even in the political/economic arena, where many important curriculum decisions are ultimately made. The particular role played by evaluation, of course, will have important effects on the curric-ular end product. It is the wide variety of roles that curriculum evaluation can play that makes it difficult to prescribe in advance a generalized sequence of procedures for curriculum evaluation.'
3. Scriven (1967, p. 40) proposes a generalized evaluation methodology: "The evalua¬tion activity consists simply in the gathering and combining of performance data with a weighted set of goal scales to yield either comparative or numerical ratings, and in the jus¬tification of (a) the data-gathering instruments, (b) the weightings, and (c) the selection of goals." The latitude allowed for by the conceptual breadth of this generalized methodo
logical sequence provides little specific guidance in particular curriculum evaluation
situations.

380 Evaluation
The variety of roles that evaluation can play is illustrated simply and con-cretely by observing its operation in any school situation. Evaluation plays a motivational role for some students; for others it plays a threatening and coercive role. From the teacher's point of view, evaluation often operates as a lever or a control device; and From the school's point of view, it performs as an instru¬ment For sorting and classifying students into "homogeneous" groups. It should be noted that none of these roles is inherent in the goals of evaluation, nor is any of them necessarily a consequence of evaluation: the roles are dependent upon how evaluation is executed and used in a particular curriculum setting.
We can now see that arguments for the abolition of evaluation (or grades) are based mainly on what turns out to be inappropriate roles assigned to evalu¬ation. Inappropriate roles, however, are not always (or even usually) a matter of conscious intent. They may emerge as a result of accident or as a matter of unconscious value orientation. As a result, planners need to be aware of the roles that their evaluation procedures assume as the development of curriculum proceeds.
Of course, curriculum evaluation can and should play a variety of appropri¬ate and productive roles even as it moves toward its principal goal of assessment of curriculum quality. One such role (connected with the process of curriculum development) might be ongoing improvement of the curriculum (and instruc¬tion). But if the evaluation component, as designed and administered, for some reason operates as an anxiety-producing agent among the teachers who are im¬plementing the curriculum, the actual role of evaluation may turn out to be exactly the reverse of what was intended. Again, sensitivity to the roles of evaluation has to be a continuing concern of curriculum ulum workers.
One final note is necessary with respect to t relative emphasis that should be accorded evaluation goals as opposed to eva ation roles. Excessive concern about evaluation roles has often resulted in t dilution of evaluation "to the point where it can no longer serve as a basi for . . . the estimation of merit, worth, value, etc. . . ." (Striven 1967, pp. 1, 42). For example, when the goals of content evaluation in a sex educatio program (i.e., the scrupulous determina¬tion of the merit of the content) re blunted because a favorable verdict on highly controversial content would play an antagonistic role in the school's community public relations pro: m, then evaluation has failed to function properly in the curriculum devel ment process. While it is certainly prudent to take political considerations into account in the implementation of curriculum, educational criteria ought to pred minate as the basis for essentially curriculum decisions, such as those concerning the relative merit of content. Certainly, the realities of the school setting often make it necessary to be content with "half a loaf," but politics, economics, and other factors by no means require the dilu¬tion of evaluation goals to the extent that we delude ourselves into thinking that half a loaf is a whole one. Such predetermined closure in the curriculum devel¬opment process is self-defeating because it often turns out to be nothing more than a self-fulfilling prophesy.
The goals of evaluation, then, need to be kept clearly in view, but in the
Comprehensive Curriculum Evaluation 381
Deweyan "ends-in-view" sense. The roles of curriculum evaluation, because they constitute other consequences of evaluation design and execution, are im¬portant factors whose impact should be influential in the development of the evaluation component of the curriculum.
SUMMATIVE AND FORMATIVE EVALUATION
The principal distinctions between summative and formative evaluation have to do mainly with (1) purposes, (2) time, and (3) level of generalization (Bloom, Hastings, and Madaus 1971, p. 61). Since these characteristics are relative rather than absolute, the definitions of summative and formative evalu¬ation should be taken in a relative sense.
Summative evaluation, as its name implies, is conducted in order to obtain a comprehensive assessment of the quality of a completed curriculum. Thus, summative evaluation ordinarily takes place at the completion of the curriculum development process and provides a terminal judgment on the completed prod¬uct in overall, general terms.
Formative evaluation, by contrast, while providing assessments of curriculum quality, is conducted during the curriculum development process for the addi¬tional purpose of providing data that can be used to- "form" a better finished product. Thus, formative evaluation takes place at a number of intermediate points during the development of a curriculum and in connection with rela¬tively more specific aspects of it. We might say of both summative and formative evaluation that their roles in curriculum development are a major consideration in their use.
• From the definitions of these two terms, it is obvious that they do not rep-resent radically new concepts in. evaluation. What constitutes summative
evaluation has appeared under the Labe terminal," "outcome," and "prod
uct" evaluation; and the principles ormative evaluation have been discussed
under such labels as "continuo or "ongoing" evaluation. Nevertheless, the
distinction is an important one and one that is especially valuable in curriculum construction.
It seems clear that for c culum development purposes, formative evaluation is a far more useful tool t ran is summative evaluation, although both types are necessary. The problem with summative evaluation is that once a curriculum has been established in relatively completed form, everyone connected with it resists anything that s ggests the necessity for major changes. On the other hand, because curricula m is an evolutionary phenomenon, formative evaluation is a uniquely well-suit instrument in the guidance of its evolution. Its partic ular strength is that it encourages a Deweyan ends-means position with respect to goal reassessment and the examination of unintended outcomes. In short, formative evaluation, as feedback and guide, operates to keep the curriculum development process "open."
One final note on the utilization of summative evaluation is in order. Sum mative evaluation should not be perceived exclusively as a one-time only pro
382 Evaluation
cedure which always occurs "at the end." Comprehensive summative evaluations can (and probably should) occur at certain infrequent but strategic points during the curriculum development process. Such intercessions provide an opportunity to step away from the flow of curriculum development activity and assess in toto the emerging curriculum product. It is sometimes valuable at these times to bring in outside evaluators in order to gain a fresh perspective on the entire project.
GOAL EVALUATION
Curricula that provide for evaluation of the degree to which stated aims; goals, and objectives are attained are abundant; those, however, that also include procedures for the evaluation of the goals themselves are conspicuous by their rarity. This condition is astonishing since it seems very clear that those responsi¬ble for school curricula should certainly be held accountable for the outcomes that they say their curricula should produce. The conclusion seems unavoidable, then, that evaluation of the merit of curriculum aims, goals, and objectives themselves should constitute a significant part of the evaluation component. •
A number of issues connected with the evaluation of aims, goals, and objec¬tives were discussed in Chapter 13. For example, the section on "The Problem of Ends and Means" dealt with the philosophical framework in which purposes should be considered and the section on "Sources of Aims, Goals, and Objec¬tives" discussed certain criteria against which the value of curriculum purposes might be assessed. These considerations will be very helpful in reevaluating cur
riculum purposes as the process o rriculum development evolves. Other con
siderations that should come play as the curriculum plan is tested in school
situations can be briefly no -I here.
ONS. Questions such as the following should recur
uation of curriculum purposes: What assumptions are particular purpose is singled out as a desirable curriculum
ns given the "real" reasons, or simply "good" reasons? should be continually reassessed in the light of an ongoing c philosophical commitments of the people responsible for n:
SOCIAL/CULTURAL ANALYSIS. Inquiry in this foundational area is basic to both goal formulation and goal assessment. Because cultures are evolving continually and because perceptions of the culture's value orientations are themselves in an evolutionary state, an active dialectic between curriculum aims, goals, and ob¬jectives and cultural analysis is necessary throughout the development of the curriculum. To avoid the limiting effects of cultural encapsulation, sociological and cultural analysis should be conducted in the light of an ongoing inquiry
into philosophical assumptions.
PHILOSOPHICAL ASSU
tingly arise in the
being made when a
outcome? Are the
Curriculum purpose
inquiry into the bas
curriculum construe
Comprehensive Curriculum Evaluation 383
THE EDUCATED PERSON. The central purpose of the curriculum is, in the last analysis, the development of the educated person. Thus, whether we are con-scious of it or not, stated aims, g ls, and objectives in a very real sense consti¬tute a composite definition of this *deal type. Certainly, it seems desirable (if not imperative) that curriculum pu •oses be continually reassessed in the light of maximally conscious reflection on our best and most noble conceptions of what man can become.
VALUES. Closely connected with the th considerations of goal evaluation dis
cussed above is values. In assessing parcular goals and objectives, such ques- tions as the following should receive car ful and honest thought: What values are we reaffirming when we place a pri rity on a particular goal or objective? Are these values consonant with our • t conceptions of the educated person, of the ideal society, and of the good• 1 fe? To what extent does this goal or objective represent acquiescence to tra•tion, some level of government, a busi¬ness or labor group, a religious orga ization, a political party, an influential patriotic society, an "aroused" taxp yers association, or some other special- interest Force? It may be that awar ess and honesty with respect to values will reveal with embarrassing transparency the degree to which our aims, goals, and objectives have been influenced by special-interest forces to the detriment of a reasoned and principled determination of curriculum purposes.
The four centers of reflection discussed above represent basic considerations in goal evaluation. Of course, a large number of other considerations, such as material well-being, the freedom-responsibility continuum, and learning theory, might be utilized. Many factors such as these would be identified as a result of feedback acquired from pilot implementation of the curriculum plan. The crucial consideration, however, is awareness that statements of aims, goals, and objectives are never finished products, but require frequent reevaluation, not only for the obvious reason of keeping curriculum direction contemporary and up to date, but for the more important reason of correcting for the prior biases to which all human beings are subject. For if curriculum construction is the dynamic life process that it should be, planners will themselves move closer to the ideal of the educated person as they engage in the process of curriculum building.
EVALUATING THE COHERENCE OF CURRICULUM
The problem to be considered in this section is the same one touched on in the section "Relationship of Aims, Goals, and Objectives" in Chapter 13. There, we noted that curriculum planners had an obligation to demonstrate that specific objectives were reasonably consistent with stated goals, and that these goals were in turn congruent with the ultimate curriculum aims that students were to attain. As an example of inconsistency between levels of purpose, take a curriculum goal like "Students will be competent writers of expository prose" and one of the curriculum objectives that is commonly subsumed under it:
384 Evaluation
"Students will write out from memory (1) the definition of a preposition and (2) the to most commonly used prepositions:: Tradition notwithstanding, the behavior r uired by the subsumed objective both logically and in terms of reported res rch seems to bear little relationship to the behavior sought in the longer-ra ge curriculum goal.
The lack •f coherence between levels of purpose discussed above represents one of the ost prevalent causes of curriculum dysfunction. In the following sections we ill discuss some other of the more common points of inconsistency; but planne need to be aware that inconsistency can occur within and be en any of the dozens of elements and operations involved in curriculum an in¬struction Because a functioning curriculum is a dynamic and organic whole, its effectiveness depends to a large degree on the coherence of its interrelated com¬ponents.
CONSISTENCY OF PURPOSES AND EVALUATION. Discrepancies between evaluation procedures and stated purposes are perhaps the most visible area of dysfunction in curriculum coherence. Pace (1958, pp. 78, 79) describes one such discrep¬ancy that serves as a classic example of the kind of inconsistency that ordinarily, though unintentionally, occurs in this area. He reports that the stated purposes of a certain freshman college course in "Responsible Citizenship" included "critical thinking" and "analysis of complex ideas and relationships." These purposes were quite clearly communicated, and the teaching procedures and student activities both in and out of class were fully congruent with them. For example, in class discussions students were "encouraged, rewarded, and given frequent opportunity for the exercise of critical thinking and the analysis of complex ideas and •relationships . . . ," traits clearly associated with "responsible citizenship." But he goes on to say that a "major portion of the final examination typically consisted of true-false and multiple-choice questions requiring the recall of historical information, definition of terms, and similar factual material con¬tained in required readings." Clearly, the evaluation was dysfunctional because it assessed behavior that was essentially quite different from that which the course was intended to develop. Furthermore, student word-of-mouth concerning the evaluation procedures in the course would very likely be responsible in future offerings for diverting energies from the kind of activity that fulfilled stated objectives to memorization and other behaviors that resulted in "payoff"— i.e., good grades. In so altering their behavior, the students would, in effect, be demonstrating that they had learned what many experienced curriculum evalu¬ators know: If you want to find out what the purposes of a curriculum really are, do not read the statement of objectives; look at the final exams.
It should be pointed out that the evaluation described above was not entirely dysfunctional since the learners were probably expected (quite properly) to have a reasonable command of the facts and information important for critical thinking and analysis. But knowledge of facts constituted only a fraction of the total range of objectives; not only were most of the objectives not evaluated, but the most important ones, involving higher mental and attitudinal outcomes,
Comprehensive Curriculum Evaluation 385
were not considered. This condition of inconsistency between stated purposes ant evaluation occurred because the evaluation program lacked scope or compre•
hensiveness.
Lack of scope or comprehensiveness has been described as "the most flagrant defulciency of current evaluation programs" (Taba, 1962, p. 317). This condition usually occurs because most of our evaluating devices tend to be inadequate fot assessing the higher-level complex areas of human psychological functioning: e.g., such areas as reflective and intuitive thinking, creativity, social attitudes aesthetic valuing, and moral development. The result is that evaluation become centered on those objectives that are most easily evaluated. It is no accident, rot example, that in the language arts curriculum objectives in spelling receive intensive evaluative attention, while such critically important outcomes as devel opment of aesthetic taste in literature (which are given heavy emphasis it statements of purpose) are virtually omitted from evaluation programs. Clearly insistence on precision in evaluation can result, not only in a deceptive assess ment of curriculum outcomes, but in the suppression of significant intended learning outcomes.
Precision in curriculum evaluation certainly is a characteristic that should be sought. But lack of a precise measuring instrument is never a good reason fa narrowing the scope of an evaluation. Indeed, the learning outcomes that rep resent the highest levels of human development are least amenable to precis( measurement, but to exclude them from the evaluation is effectively to suppress their attainment and thereby to subvert the best intents of the curriculum. comprehensive evaluation program—one that is consistent with the full rang of curriculum purposes—will be precise where it is possible to be, but wheneve necessary, will accept as valid the best rough approximations of goal attainmen it can get in the interests of balance and consistency.
CONSISTENCY OF ALL CURRICULUM ELEMENTS. Although not as conspicuous a
inconsistencies between goals and evaluation, lack of correspondence among al of the other curriculum components is common and contributes significantly to curriculum incoherence and dysfunction. Inconsistencies can occur between goals and content, goals and learning activities, content and learning activities learning activities and evaluation, etc. Indeed, inconsistencies can occur at many junctures in the curriculum plan that it would be virtually impossible t provide a complete account of them here. The following few examples, how ever, will furnish some insight into the nature of the more common discrepar cies among curriculum elements and show how these discrepancies operate t undermine curriculum effectiveness.
Inconsistencies often occur between curriculum objectives and learning activ ties. For example, "Students will understand the use of the scientific method c inquiry" is an objective often found in science curricula. An activity that woul seem congruent with this stated objective might be a rather extended project i which each student formulates a problem and then uses the groping, but neve theless structured, model of inquiry to reach some tentative solution. It is no
386 Evaluation
unusual. however, to find that the activity connected with the above objective is merely learning by rote "the five steps of the scientific method." Such an activity, clearly, is not functional in helping students to reach the objective, but rather promotes the "ingestion of information," or as Dewey has put it, "verbal learning."
Another discrepancy that is common to many curricula occurs between learn¬ing activities and evaluation. An example of this inconsistency was noted in the previous section. College students who had engaged in critical thinking and analysis of ideas in classroom activities were evaluated on the basis of recall of information. While the activity appeared to be valuable and productive, no real evidence of its value was available because the evaluation was inconsistent with the activity. Of greater import is the possibility that the dysfunctional evaluation might, in the future, be responsible for student rejection of apparently productive activities.
The final example of curriculum incoherence that we shall present occurs between goals and content. A curriculum goal found in many statements of pur¬pose is that "Students will develop an appreciation of (visual) art." Although this goal is essentially affective, the content of many art appreciation curricula is heavily cognitive. It usually is specified in terms of an historical survey of art and includes information relating to cultural background material, biographical data about the artists, and analyses of selected works. The assumption, of course, is that "knowledge about" automatically transfers to "appreciation of." Certainly appreciation is enhanced by knowledge, but experience dearly indicates that "knowledge about" is insufficient either to initiate or support desired affective dispositions. Other visual art content, organized differently, clearly is demanded by this objective..
The dysfunctional effects of incongruous content are nowhere more striking than in the poetry appreciation sections of most high school English courses. Here, students become proficient in identifying all of the poetic meters, from iambic pentameter to dactyl hexameter; they can define a host of poetry terms: ballad, imagery, couplet, blank verse, lyric, etc.; and they can adeptly recite the rhyme schemes of the Italian and Elizabethan sonnets. Yet they leave school despising poetry! Surely, other factors (e.g., the adverse disposition to poetry in American culture) contribute to this lack of appreciation for poetry; but the mechanical content of poetry courses, so alien to the nature and function of poetic values, undoubtedly is counterproductive in achieving stated goals.
These few examples demonstrate the need for curriculum planners to build into the evaluation component provisions for assessing the internal consistency of the curriculum plan. Unfortunately, no shortcuts exist for avoiding the tedious task of continually checking and cross-checking curriculum components to ensure congruence among all the elements of the plan.
CONSISTENCY OF CURRICULUM COMPONENTS AND FOUNDATIONAL COMMITMENTS.
In evaluating the curriculum, provision should be made for frequent references to commitments in the foundational areas: the culture, the individual, learning
-4°
Comprehensive Curriculum Evaluation 387
theory, and epistemology. For example, we might ask: Do the objectives reflect movement toward the kinds of society and individuals for which we hope? Does the content and its organization reflect our beliefs about the nature of knowl¬edge? Are the proposed learning activities consistent with our notions of how human beings learn? Perhaps most important, because it can affect the very design of the curriculum, is the consideration: Is the evaluation itself congru¬ent with our foundational and theoretical commitments?
Enough has been said about the derivation of aims, goals, and objectives in Chapter 13 for us to recognize their especially close ties to philosophical and theoretical commitments. But a continual reevaluation of purposes in terms of these commitments is made necessary for two related reasons: first, because of the tentative nature of the curriculum purposes themselves (Dewey's concept of purposes as "ends-in-view"); and second, because of the evolutionary charac¬ter of cultures and individuals. We are never at the same place (psychologically as well as physically) as we were a little while ago. Added experience affords us more information, new insights, and a generally broader perspective on man and culture. Given these two considerations, it is unthinkable that periodic reassessment of purposes would not take place.
Content, too, as we noted in Chapter 14, is dependent on foundational com mitments. But it is all too easy, when immersed in the day-to-day particulars of selecting and organizing content to lose sight of larger theoretical concerns and to fall back (unconsciously) on the more familiar criteria of custom and tradi¬tion. The temptation of a traditional body of well-defined and organized con. tent ready for automatic transfer into the curriculum is hard to resist; and a hard critical assessment of the degree to which this content really matches founda tional commitments all too often gives way to rationalization. The development of content based on a novel epistemology or organizational pattern is a high!) demanding task because it requires broad knowledge of the traditional discipline: coupled with epistemological imagination. To avoid the ubiquitous influence of unexamined (or invisible) custom and tradition, curriculum planners need constantly to examine the congruence between the content they 'propose ft,' inclusion in the curriculum and the foundational positions they claim to hay( assumed.
A few comments with respect to consistency between curriculum evaluatior and theoretical foundations will round out this section, although we will by n( means have provided a complete discussion of curriculum coherence. Evalua tions that are not consistent with foundational commitments not only provide misleading assessments of curriculum effectiveness, but, because of unintended roles, can produce outcomes that are antithetical to the foundational beliefs o the producers of curriculum. For example, take a situation in which student: have demonstrated in an evaluation that they have an excellent cognitive gran of democratic concepts and values. The curriculum then is judged to be effective by the planners, whose social and individual commitment has been based on the democratic ideal. The evaluation may be misleading, however, because whip the students have demonstrated knowledge about democratic functioning, the,

388 Evaluation
were not evaluated in terms of their dispositions to behave democratically. Indeed, where emphasis is placed exclusively on intellectual performance, and evaluations rigidly focus on individual achievement and competition, the out¬comes, in attitudinal terms, may in fact be extremely antidemocratic. It is all too true that many students learn authoritarianism in "Problems of Democracy" classes.
Coherence of curriculum elements is a central concern of evaluation. Because of the complexity of the curriculum enterprise, however, its achievement is elusive, and even in optimal situations only partial. Increased attention to evalu-ation in this area, however, can help to produce far more functional curricula than those that have customarily resulted from emphasis on product evaluation.
COLLECTING EVALUATION DATA
Most established procedures geared to collecting information for curriculum evaluation have to do with product (i.e., student) evaluation. Taba (1962, p. 329) notes three sources for such evidence: "standardized tests, nonstandardized or teacher-made paper and pencil tests, and informal devices." Paper-and-pencil tests, both standardized and informal, are the predominant source of evaluation data in most schools. There are many reasons for this: paper-and-pencil tests are purported to be objective, economical, easy to administer, and they provide a "norm" against which individual achievement can be judged. But there are a number of limitations inherent in paper-and-pencil tests, also. Some of these were discussed in the first section of this chapter, "Product Evaluation: The Technical Model," and have to do with such matters as the indirect nature of measurement and the inferences that can legitimately be drawn from test per¬formance. Other significant limitations, however, have to do with the limited range of objectives that paper-and-pencil tests measure and the fact that com¬plex and novel forms of mental functioning are generally beyond their capabil¬ity to measure. Within the framework of limitations that are conceded, however, paper-and-pencil tests do provide data for curriculum evaluation, and deserve serious consideration, though not the emphasis ad absurdum that they have customarily received.
Informal evaluation devices, a third source of product evaluation data, are useful in assessing complex objectives, novel or unique objectives, student inter¬ests, and other outcomes of curriculum. For example, Taba (1962, p. 330) sug¬gests as sources of evidence, "records of all sorts, classroom observations, student products, diaries, essays and simple classroom exercises.... [WI hen students de¬scribe what they saw on a trip or react to .a story they read, these reactions .
can be analyzed for the levels of awareness or social attitudes displayed. . . ."
The above constitute sources of data for product evaluation. What are some sources of evidence that can be used in evaluating the curriculum as a whole? The topics of this section of the chapter, of course, suggest a major source: the curriculum document itself, including the statement of purposes, content, learn-ing activities, and evaluation. Such questions as the following should be gener
Comprehensive Curriculum Evaluation 389
ated as the curriculum is brought under the scrutiny of evaluators: What is the
theoretical rationale for the document? Is the document a coherent whole? Flow were curriculum purposes derived? Is the content (learning activities, evalua¬tion) consistent with purposes? Since the curriculum document represents the plan for learning, judging its merit is a critical first step in the evaluation process.
But the less tangible functional curriculum, as it is field tested with students and teachers in an instructional setting, is also a source of data crucial For effective curriculum evaluation. Unlike the curriculum document, however, the functional curriculum cannot be directly studied; it has to be observed as it operates through people, materials, and environments. In such a setting, there¬fore, teachers become a valuable source of information for curriculum evaluators. Information ordinarily is secured from teachers by means of interviews, ques¬tionnaires, and a wide variety of other structured and unstructured devices, both written and oral. (Perhaps the most valid information is acquired when evalu¬ators have developed sound interpersonal relationships with teachers and teach¬ers' feelings and perceptions are freely and honestly expressed in informal discussions.) Teachers' perceptions of content, instructional materials, learning activities, relevance, student enthusiasm, and the like can yield valuable insights when compared (or contrasted) with the evaluators' perceptions of the function¬ing curriculum in these areas.
Students, too, are an important source of evaluation data. As with teachers, information is secured from both structured and unstructured instruments. Par-ticularly if they are sure that their responses to evaluation instruments will not be used to determine individual grades, students will respond with a candor that can quite accurately reflect the flavor of the experience they are having as a result of their interaction with the curriculum. Again, comparing and con¬trasting students' perceptions with those of teachers and curriculum planners can provide entirely new (and even startling) appreciations of the dynamics of the functioning curriculum.
A third source of information for evaluating the functioning curriculum is
- the curriculum material utilized in instruction, including texts, paperbacks, films, slides, periodicals, and the like. Evaluators, of course, need to determine the appropriateness of these in terms of the intent and character of the cur¬riculum plan; but shared observations of how they are used in the functional curriculum may very well provide the more important assessment of their value. Curriculum materials do not constitute a curriculum, and it is common knowl¬edge that teachers using identical materials can, in effect, produce radically dif¬ferent operating curricula in terms of learning activities, students' experiences. and eventual outcomes. This relatedness of curriculum materials and the way they are used demonstrates once again the complexity of the curriculum enter¬prise and the diffulculties encountered specifying procedures for comprehensive
curriculum evaluation. •
Space does not permit elaboration on all possible sources of curriculum evalu- ation data. For example, follow-up studies of graduates of the curriculum and/or
390 Evaluation
their associates is very important if a summative evaluation is to have any real validity. The seven sources discussed, however, would appear to be minimal if a reasonably comprehensive evaluation is desired. In addition, it should be noted that as a matter of policy, Unsolicited evidence from whatever source should be given serious attention: letters, phone calls, or visits from parents; testimony from school personnel, students, or lay people; or even "letters to the editor" in the local paper often constitute important data.
VALIDITY OF EVALUATIONS
An evaluation will be valid only to the extent that the evidence it employs accu¬rately describes what it claims to describe. For example, if a measuring instru¬ment purports to yield a score reflecting the degree to which students can inter¬pret historical data, but requires in its questions only the recall of data, it is not describing what it claims to describe. As a consequence, an evaluation of curric¬ulum effectiveness in the area of historical interpretation would not be valid in this instance since there is no basis for such an evaluation. Clearly, curriculum evaluators need to ensure the validity of the data they use in judging curriculum quality.
A second aspect of the problem of validity was touched on in a previous section, "Consistency of Purposes and Evaluation." In that section, it will be recalled, an example was given of an evaluation that was distorted because it was based only on some (rather than all) of the curriculum objectives to be achieved. Validity was impaired in this case because the evaluation lacked scope or comprehensiveness.
A third factor that needs to be considered in order to ensure the validity of curriculum evaluation is the incidence and nature of unintended outcomes. The issue of unintended outcomes was touched on briefly in Chapter 13. There, it was noted that a whole range of consequences, in addition to those outcomes we intend to reach, are ushered in as students interact with the functional curric¬ulum. But because evaluators have a (quite natural) tendency to Focus on the extent to which stated objectives are achieved, and because a high degree of awareness and sophistication is required for the identification of all outcomes, this aspect of comprehensive evaluation—the identification and evaluation of unintended outcomes—is perhaps the most difficult to achieve.
Taba reports an incident that points up this need to be aware of the "total pattern of educational outcomes":
a school which was greatly concerned with the development of scientific objectivity and critical thinking had stressed the use of reliable and de¬pendable materials of unquestioned objectivity. After administering a battery of tests on thinking, the staff discovered to its amazement that the students were highly gullible. They had a tendency to accept as true almost anything in print because they had no opportunity to compare poor and good sources. An exclusive diet of excellent and dependable ideas cultivated an unquestioning attitude (Taba 1962, pp. 314, 315).
Comprehensive Curriculum Evaluation 391
Taba's example demonstrates that validity in comprehensive curriculum evalu
tion invogves not only assessing the attainment of all stated goals and objectives, but of identifying and judging the merit of the full range of consequences issu¬ing from the implemented curriculum. Clearly, unless the evaluation takes into account all curriculum outcomes, it will not provide an accurate picture of curriculum quality.
A fourth and final factor in evaluation validity has to do with the propor¬tional weighting of product evaluation and other areas of curriculum assessment in the total evaluation design. We have noted previously, for example, that curriculum evaluation has traditionally consisted almost exclusively of product (or "payoff") evaluation. But we have argued in this section on comprehensive curriculum evaluation that while product evaluation is a necessary part of cur¬riculum evaluation, it is not in itself sufficient. Thus, we have stressed the need to give attention to the direct evaluation of such curriculum elements and char¬acteristics as the objectives themselves, content, learning activities, component consistency, and even evaluation. Scriven (1967, p. 53 ff.) has classified these procedures as "intrinsic evaluation" to distinguish them as a class from "product evaluation." The validity question to be dealt with, then, becomes: How much weight in the evaluation design is to be accorded product evaluation as opposed to intrinsic evaluation?
Although it is obviously impossible to prescribe a desirable mix of product and intrinsic evaluation that will prove optimally valid for all curricula, it seems clear that something approaching a 50:50 proportion might be a good place tc start in terms of developing an evaluation rationale for a particular curriculum Clearly, product evaluation is essential if we are at all interested in what the curriculum actually does. But product evaluation, because of its essentially sum mative nature, provides little help with the questions that get raised at formative stages of development. For example: Is there a good match between content ant goals? Is the (product) evaluation consistent with the content? Are student' responding negatively to the content because it is not contemporary or because it is poorly organized for learning efficiency? What is the reason for a particular (undesirable) unintended outcome? If goal attainment turns out to be relatively good, how adequate are goals in terms of the possibilities for optimal studen development?
Such questions require that curriculum evaluators go beyond the relatively precise, empirical procedures of product evaluation and assess curriculum effec tiveness in the far more value-loaded and nebulous areas of intrinsic evaluation Of course, we should expect the reliability (i.e., the accuracy) of a compre hensive evaluation (as compared with a purely product evaluation) to suffer somewhat from the inclusion of considerations that are not ordinarily susceptibly to operationally defined criteria; but it seems clearly more desirable for the( evaluation to reflect a balanced, if somewhat opaque, view of the total curriculun than to insist on clarity and precision to the extent that we get an accurate pic cure of only segments of it. An analogue of this point would be the argumen that a rough pen-and-ink sketch of a man provides a more valid impression o
392 Evaluation
his physical nature than an extremely clear color photograph of a hand, a nose, and an ear. By taking the sketch and photograph together, however (as we would combine an intrinsic and a product evaluation), validity and reliability are improved.
The Four aspects of curriculum evaluation validity that we have discussed above suggest that validity is a quality essential for an evaluation to possess. To the degree that an evaluation lacks validity, it is of no use whatever.
References
Ahmann, J. Stanley, and Marvin D. Clock. 1967. Evaluating Pupil Growth. 3d ed. Boston: Allyn & Bacon.
Bloom, Benjamin S., J. Thomas Hastings, and George F. Madaus. 1971. Handbook on Formative and Summative Evaluation of Student Learning. New York: Mc
Graw-Hill Book Company.
Hoover, Kenneth H. 1968. Learning and Teaching in the Secondary School. 2d ed. Boston: Allyn & Bacon.
Pace, C. Robert. 1958. "Educational Objectives." In The Integration of Educational Experiences, edited by Nelson B. Henry. Fifty-seventh Yearbook of the National Society for the Study of Education. Chicago: University of Chicago Press, chap. Iv.
Scriven, Michael. 1967. 'The Methodology of Evaluation." In Ralph W. Tyler, Roger M. Gagne, and Michael Scriven, Perspectives of Curriculum Evaluation. AERA Monograph. Series on Curriculum Evaluation, No. 1. Chicago: Rand McNally & Co.
Tabs, Hilda. 1962. Curriculum Development: Theory and Practice. New York: Harcourt Brace Jovanovich.

No comments:

Post a Comment