Friday, November 22, 2024

Wiggins, Grant. (1998). Educative Assessment: Designing Assessments to Inform and Improve Student Performance. Reviewed by Susan M. Brookhart

 


Wiggins, Grant. (1998). Educative Assessment: Designing Assessments to Inform and Improve Student Performance. San Francisco: Jossey-Bass.

$34.95           ISBN 0-7879- 0848-7.

Pp. xxi + 361.

Reviewed by Susan M. Brookhart
Duquesne University

February 17, 1999

          Educative Assessment adds to the classroom assessment literature a sound, readable and passionate argument for designing assessments that both inform and improve student learning and performance. The information function of assessment is part of its definition, and readers will be familiar with this purpose for assessment. Some readers may want to argue about the improvement function of assessment (hence the term "educative assessment"). I agree with Wiggins and with others (Black and Wiliam, 1998; National Forum on Assessment, 1995) that in classroom assessment, improvement is a legitimate and important purpose for assessments. Further, I think that educative assessment leads naturally to some validity issues that make this book intended primarily for practitioners also one in which measurement researchers might be interested.
          The book's intended audience is educators who work with classroom assessment, primarily school teachers and administrators. The author writes with a clear sense of the classroom context; he has not only "been there" himself, but he has obviously spent time listening to teachers, as well. The author's literate but readable style will help make the material accessible to teachers who are relative novices in assessment and enjoyable for those who are quite familiar with it. I would urge those familiar with assessment who, like me (Brookhart, 1995), were unhappy with Wiggins's previous book, Assessing Student Performance, to give the author another chance. Wiggins's impassioned and crusading tone remains, but his measurement theory errors are (in my opinion) gone. One of the strengths of this book is its clear treatment of the classroom assessment applications of validity and reliability issues. This was a surprise to me given the weaknesses of the first book, and one of my motivations for writing this review is to alert readers to a good resource that they might have been inclined to ignore.
          In the space of this review, I hope to accomplish two things. First, I will describe the scope and contents of the book so that readers will be able to decide whether they want to read it. Second, I will discuss some of what I feel are important issues implied by the concept of "educative" assessment. This discussion will necessarily be brief but is meant to show readers that the book is worth reading in part because of its potential as a catalyst for stimulating thinking in the very worthy field of classroom assessment.

Scope of the Book

          Wiggins states his purpose in the first sentence of his preface (p. xi): "This book presents a rationale for learning-centered assessment in our schools and an overview of the tools, techniques, and issues that educators should consider as they design and use assessments focused on learner needs." Thus the book is in part an argument for a certain view of assessment, in part a "how-to" presentation of performance assessment methods, and in part suggestions for systemic reforms that would support this kind of assessment better than current education systems do.
          The book begins with a chapter titled, "Educative Assessment: A Vision," in which Wiggins outlines his argument for educative assessment and describes a vision for the future. In this vision all students understand what they are expected to learn and possess the criteria and application skills necessary to assess their own performances, and all teachers talk in terms of students levels of performance and talk more about whether performance gains are appropriate than whether they are good or bad.
          Part One, "Essential Elements of Assessment," contains Chapters 2 through 4: "Ensuring Authentic Performance," "Providing Ongoing Feedback," and "Promoting Student Understanding." In this section are compelling descriptions of validity couched in terms of examples challenging readers to think about what student performance would produce in the way of evidence with which to judge understanding. In this section, too, is a description of effective feedback as information that students can use to adjust and improve their own performances and its implications for assessment design.
          Part Two, "Designing Assessments," contains Chapters 5 through 7: "Standards and Criteria," "Individual Performance Tasks," and "Scoring Rubrics." This section contains lots of good examples as well as good solid "how-to-do-it" information. The author's familiarity with the classroom context shows in his understanding of the kinds of explanations and examples that would interest teachers. His selection of examples and his strongly worded explanations continue the argument for "educative assessment," persuading by example and piquing interest.
          Part Three, "Applications and Implications," contains Chapters 8 through 11, "Portfolio as Evidence," "Curriculum and Instruction," "Grading and Reporting," and "Teaching and Accountability." The intent of this section is to describe for the reader how Wiggins would work out his vision of educative assessment, clear goals, and students' development of self-assessment capabilities in the larger context of education than the designs of single assessments. The implications Wiggins suggests would be difficult but not impossible to accomplish in the current public school scene. As I read this section, I remembered the old adage, "Anything worth doing is worth doing well."
          Perhaps my favorite suggestion from Part Three comes from the chapter on grading and reporting. Wiggins's argument about self-assessment has been based on several premises, including the fact that without clear feedback and information about the quality of one's performance, no student can improve. He therefore proposes that report cards carry at least two different kinds of information, which he terms "scores" and "grades." "Scores" would carry criterion- referenced information about students' performances on important learning goals. "Grades" would carry teacher judgments about those performances in context, taking into account to what extent students met reasonable expectations given their prior work, whether they worked to capacity, and what is normal progress for students at their developmental levels. This feels to me much more like adding criterion-referenced information to grades than like adding effort-based information to grades, a strategy sometimes suggested that I feel is sure to fail (Brookhart, 1994).
          Part Four, "Changing the System," contains Chapters 12 and 13, "Feasibility: Real and Imagined," and "Next Steps." The "Feasibility" chapter stresses that assessment reform requires new ways of thinking about the use of time. The "Next Steps" chapter is cookbook in format, built around a list of 16 strategies, but it is substantively grounded. Some of the suggestions are quite practical, for example Strategy 2: "Begin to develop a few authentic assessment tasks, where there is most evidence or agreement of need" (p. 330). But the substance of validity and reliability are solidly built into these strategies as well. For example, Strategy 7 reads, "Redefine passing to ensure that (at least some portion of) a grade is standard-based" (p. 331). Strategy 9 reads, "Go for scoring consistency" (p. 333).

The Classroom Assessment Conversation

          I submit that this book serves two important purposes. First, as the outline above shows, this very readable and persuasive book has the potential to influence classroom teachers' assessment practices in positive ways. Wiggins not only argues persuasively for that, he also gives practitioners clear directions that they can follow to design assessments and feedback mechanisms that educate their students. Second, I think that the measurement community involved in a more theoretical conversation about classroom assessment will find that some of the issues underlying Wiggins's concept of "educative" assessment will help advance the scholarly consideration about classroom assessment. In this section, I will consider two of these admittedly related issues: the application of validity theory to classroom assessment and the difference between formative and summative assessment. In this discussion I will be speaking beyond the boundaries in Wiggins's book, although I will I think be supported in my discussion by what is written in or implied by the arguments in Educative Assessment. At any rate, readers should know that the responsibility for what follows is my own and not author Wiggins's.

Validity Evidence

          A recent article in Educational Researcher (Terwilliger, 1997, see also 1998) suggested that "authentic" assessments relied on the face validity of an assessment, a practice that measurement theorists would caution against. In an exchange of views, Newmann, Brandt, and Wiggins (1998) countered with descriptions of how "authentic" assessment accomplishes the purpose of demonstrating to students what work in a discipline looks like, thereby imputing some purpose for their learning. I would add that "authenticity" is a relative term--authentic to what? From a student's perspective, academic work that is authentic to the kind of work they will have to do in future study is very important. For example, many a high school student justifies even the most abstract study with "I'll need this for college."
          In the exchange, the question of validity was raised. Terwilliger made the point that empirical evidence for validity of authentic assessments and other performance assessments (these terms, while related, are not synonymous) is often lacking, detracting from the arguments of its supporters. Newmann (1998) countered by citing some evidence, and there is some other work in the literature about the validity of performance assessment, often with mixed results. One place to look for consequential evidence of the validity of performance assessments is in the effects of its use on student learning, its "educative" results. Shepard and her colleagues concluded a study of the effects of using classroom performance assessments (Shepard, Flexer, Hiebert, Marion, Mayfield, & Weston, 1996) with some discussion about related elements like professional development opportunities for teachers, familiarity with and time for assessment innovation, and so on.
          So what evidence does it take to show that a classroom performance assessment is valid? Here, I think, is where the work needs to be done. In my opinion, much of the trouble with the validity of performance assessment starts with construct definition. Terwilliger (1997) used analytical reasoning ability as an example of a construct. This probably is a construct more amenable to a test than a performance assessment. But complex, performance-based assessment (Linn, Baker & Dunbar, 1991) is usually meant to tap more task-related constructs. One of the points of disagreement in the Terwilliger-Wiggins exchange was the apparent importance each ascribed to basic knowledge. Basic knowledge, often well assessed with paper and pencil tests, is necessary but not usually sufficient for good performance on complex performance tasks. Application skills in the context of working with the knowledge usually comprise the kind of constructs performance tasks are designed to elicit. Thus the first evidence for the validity of performance assessment might include a thoughtful reflection on what students' work on a particular task might be expected to show.
          I think Wiggins gives us all a clear reminder of that in this advice aimed at practitioners. In this example, Wiggins is describing for teachers why putting the Socrates of Plato's Apology on trial in the classroom would make an illuminating instructional activity but not a very good assessment of students' understanding of Socrates and Plato's Apology (p. 31):
Although the desired achievement involves the text and its implications, the activity can be done engagingly and effectively by each student with only limited insight into the entire text and its context. If a student merely has to play an aggrieved aristocrat or playwright, he or she can study for that role with only a limited grasp of the text. Also, the student's trial performance need not have much to do with Greek life and philosophy. The question of assessment validity (Does it measure what we want it to measure?) works differently, requiring us to consider whether success or failure at the proposed task depends on the targeted knowledge (as opposed to fortunate personal talents): the performance of the student playing, say, one of the lawyers may be better or worse relative to his or her debating and lawyering skills rather than relative to his or her knowledge of the text....It is highly unlikely that we will derive apt and sufficient evidence of understanding of the text from each individual student through this activity, even if we can hear an understanding of the text in some comments by some students. In fact, in the heat of a debate or mock trial, students may forget or not be able to use what they understood about the text, depriving themselves and us of needed assessment evidence.

          It seems that one important first step in assuring the validity of classroom performance assessments is reasoning like this at the planning stage, and this quote from Wiggins's book gives us a good example of how teachers might think about this. It also seems that documentation of thoughtful reflection of this sort might be the first item of validity evidence for classroom performance assessments.
          If classroom assessment is supposed to be educative, then it follows that it has served its purpose if students do learn from their participation in it. Thus another piece of validity evidence is suggested if one follows Wiggins's concept to its logical end: there should be evidence of further student learning. This kind of validity evidence is consequential evidence for validity (Messick, 1989) and, as Messick's framework reminds readers, positive evidence of intended consequences is only part of the picture. There should also be evidence of an absence of negative, unintended consequences. Working out how to collect good evidence of future learning, of the "educative" nature of classroom performance assessments, is an area for development, I think.

Formative and Summative

          Another of the issues that has not been handled completely yet in the classroom assessment field is the difference between formative and summative assessment. As the saw goes, "When the cook tastes the soup, that's formative assessment; when the customer tastes the soup, that's summative assessment." Black (1998, Black & Wiliam, 1998) emphasizes that in formative assessment for classroom learning, the key is that the learner perceives a gap between a desired learning goal and the state of his or her knowledge or performance and then acts to close that gap. Black uses formative assessment to mean assessment in classrooms, and summative assessment to mean large-scale standardized measures. This theme about the need for information that students understand and can use, preferably generated by self-assessment as well as teacher assessment of performance, runs through Wiggins's book and is central to the concept of "educative" assessment.
          But the distinction between formative and summative assessment is not nearly as clear in practice as it is in the literature -- at least not yet. In classrooms, students' assessments are certainly used in a formative manner, but most also "count" in a grade or summative judgment of some kind. The student is expected to be both the cook and the customer here. Some classroom assessments, for example tests at the end of units of instruction, are more summative than formative, but teachers and students alike expect that they will provide evidence of effective study and highlight areas of strength and weakness to file away for self- understanding. In schooling as currently practiced, the formative/summative distinction is a blur.
          Thus there is room for some work on the distinctions, purposes, designs, and (full circle!) validity issues between formative and summative in classroom assessment. I think Wiggins begins that work by suggesting that both criterion-referenced scores and expectation- referenced grading judgments be given to students. Each score serves different formative and summative purposes.
          Another issue hiding in the fuzziness between formative and summative assessment is whether the validity evidence necessary to show that a classroom assessment is "educative" is necessary or even desirable for a large-scale performance assessment. Should large-scale assessments for selection and placement decisions be "educative" in the same way as classroom assessment? If so, then the doing of them should accrue to the education of the students, but defining how and where that should show itself is not straightforward logic. To be "educative" is by definition to be formative, but there are some large-scale assessments whose purpose can be construed as purely summative. If not, should performance assessment be used at all in a large-scale assessment? That is, if a large-scale, purely summative assessment is not supposed to be "educative" or formative in any way, then would it not be sufficient to use a well-constructed test for checking an individual's status on some construct(s)? Then conventional evidence for test validity should suffice. Even if consequential evidence for validity were collected, it would be evidence of the social and programmatic results of test use, not evidence of further learning for the test takers.
          I feel only slightly guilty for raising these issues without working them through thoroughly for readers. These issues have been on my mind, and the minds of at least some others, already. What Wiggins's "educative assessment" does is act as a catalyst. It becomes easier to demonstrate and discuss these issues with other educators because he has presented such a clear and solid treatment of what classroom assessment should be.

Conclusion

          Wiggins's book Educative Assessment describes a vision for classroom assessment that contributes to student learning. It accomplishes its purpose admirably, and I commend it to readers. Wiggins has provided practitioners with a useful, readable tool.
          The book also serves another purpose, for which I commend it to the measurement research community, and that is inspiring thinking about the nature of validity and of the formative - summative distinction in classroom assessment. Our field would benefit from some more solid theoretical thinking and empirical study in these areas. While such work is already happening to some degree, Wiggins's book does the whole educational community the service of raising these issues in terms that will inspire thoughtfulness among both practitioners and researchers, to the ultimate benefit of the students who are central to all of our work.

References

Black, P. (1998). Testing: Friend or foe? Theory and practice of assessment and testing. London: Falmer Press.

Black, P., & Wiliam, D. (1998). Assessment and classroom learning. Assessment in Education, 5, 7-74.

Brookhart, S. M. (1994). Teachers' grading: Practice and theory. Applied Measurement in Education, 7, 279-301.

Brookhart, S. M. (1995). Book review: Assessing student performance: Exploring the purpose and limits of testing by Grant P. Wiggins. Educational Measurement: Issues and Practice, 14(2), 29-30.

Linn, R. L., Baker, E. L., & Dunbar, S. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15-21.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13- 103). New York: Macmillan.

National Forum on Assessment. (1995). Principles and indicators for student assessment systems. Cambridge, MA: FairTest.

Newman, F., Brandt, R., & Wiggins, G. (1998) An exchange of views on "Semantics, psychometrics, and assessment reforem: A close look at ‘authentic' assessments." Educational Researcher, 27(6), 19-22.

Shepard, L. A., Flexer, R. J., Hiebert, E. H., Marion, S. F., Mayfield, V., & Weston, T. J. (1996). Effects of introducing classroom performance assessments on student learning. Educational Measurement: Issues & Practice, 15(3), 7-18.

Terwilliger, J. (1997). Semantics, psychometrics, and assessment reforem: A close look at "authentic" assessments. Educational Researcher, 26(8), 24-27.

Terwilliger, J. (1998). Rejoinder: Response to Wiggins and Newmann. Educational Researcher, 27(6), 22-23.

About the Reviewer

Susan M. Brookhart

Susan Brookhart is an Associate Professor in the School of Education at Duquesne University, Pittsburgh, PA 15282. She holds a Ph.D. in Educational Research and Evaluation from The Ohio State University. Her research specialty is classroom assessment, and she is a past chair of the AERA Special Interest Group on Classroom Assessment. She is the author or co-author of over 40 articles on classroom assessment and teacher education and the author of a forthcoming monograph, "The Art and Science of Student Assessment," for the ERIC Clearinghouse on Higher Education. She serves on the editorial boards of Applied Measurement in Education and Teachers College Record and is a current columnist on Education and Academics for National Forum.

No comments:

Post a Comment

Reply to Allison Halpern's Review of Coulson's <cite>Market Education</cite> By Andrew J. Coulson

Reply to Allison Halpern's Review of Coulson's Market Education Andrew J. Coulson Editor, www.Sc...