Wiggins, Grant. (1998). Educative Assessment: Designing
Assessments to Inform and Improve
Student Performance. San Francisco: Jossey-Bass.
$34.95 ISBN 0-7879-
0848-7.
Pp. xxi + 361.
Reviewed by Susan M. Brookhart Duquesne University
February 17, 1999
Educative
Assessment adds to the classroom assessment literature a
sound, readable and passionate argument for designing assessments
that both inform and improve student learning and performance.
The information function of assessment is part of its definition,
and readers will be familiar with this purpose for assessment.
Some readers may want to argue about the improvement function of
assessment (hence the term "educative assessment"). I agree with
Wiggins and with others (Black and Wiliam, 1998; National Forum
on Assessment, 1995) that in classroom assessment, improvement is
a legitimate and important purpose for assessments. Further, I
think that educative assessment leads naturally to some validity
issues that make this book intended primarily for practitioners
also one in which measurement researchers might be interested.
The book's intended
audience is educators who work with classroom assessment,
primarily school teachers and administrators. The author writes
with a clear sense of the classroom context; he has not only
"been there" himself, but he has obviously spent time listening
to teachers, as well. The author's literate but readable style
will help make the material accessible to teachers who are
relative novices in assessment and enjoyable for those who are
quite familiar with it. I would urge those familiar with
assessment who, like me (Brookhart, 1995), were unhappy with
Wiggins's previous book, Assessing Student Performance, to
give the author another chance. Wiggins's impassioned and
crusading tone remains, but his measurement theory errors are (in
my opinion) gone. One of the strengths of this book is its clear
treatment of the classroom assessment applications of validity
and reliability issues. This was a surprise to me given the
weaknesses of the first book, and one of my motivations for
writing this review is to alert readers to a good resource that
they might have been inclined to ignore.
In the space of this
review, I hope to accomplish two things. First, I will describe
the scope and contents of the book so that readers will be able
to decide whether they want to read it. Second, I will discuss
some of what I feel are important issues implied by the concept
of "educative" assessment. This discussion will necessarily be
brief but is meant to show readers that the book is worth reading
in part because of its potential as a catalyst for stimulating
thinking in the very worthy field of classroom assessment.
Scope of the Book
Wiggins states his
purpose in the first sentence of his preface (p. xi): "This book
presents a rationale for learning-centered assessment in our
schools and an overview of the tools, techniques, and issues that
educators should consider as they design and use assessments
focused on learner needs." Thus the book is in part an argument
for a certain view of assessment, in part a "how-to" presentation
of performance assessment methods, and in part suggestions for
systemic reforms that would support this kind of assessment
better than current education systems do.
The book begins with
a chapter titled, "Educative Assessment: A Vision," in which
Wiggins outlines his argument for educative assessment and
describes a vision for the future. In this vision all students
understand what they are expected to learn and possess the
criteria and application skills necessary to assess their own
performances, and all teachers talk in terms of students levels
of performance and talk more about whether performance gains are
appropriate than whether they are good or bad.
Part One, "Essential
Elements of Assessment," contains Chapters 2 through 4:
"Ensuring Authentic Performance," "Providing Ongoing Feedback,"
and "Promoting Student Understanding." In this section are
compelling descriptions of validity couched in terms of examples
challenging readers to think about what student performance would
produce in the way of evidence with which to judge understanding.
In this section, too, is a description of effective feedback as
information that students can use to adjust and improve their own
performances and its implications for assessment design.
Part Two, "Designing
Assessments," contains Chapters 5 through 7: "Standards and
Criteria," "Individual Performance Tasks," and "Scoring Rubrics."
This section contains lots of good examples as well as good solid
"how-to-do-it" information. The author's familiarity with the
classroom context shows in his understanding of the kinds of
explanations and examples that would interest teachers. His
selection of examples and his strongly worded explanations
continue the argument for "educative assessment," persuading by
example and piquing interest.
Part Three,
"Applications and Implications," contains Chapters 8 through 11,
"Portfolio as Evidence," "Curriculum and Instruction," "Grading
and Reporting," and "Teaching and Accountability." The intent of
this section is to describe for the reader how Wiggins would work
out his vision of educative assessment, clear goals, and
students' development of self-assessment capabilities in the
larger context of education than the designs of single
assessments. The implications Wiggins suggests would be
difficult but not impossible to accomplish in the current public
school scene. As I read this section, I remembered the old
adage, "Anything worth doing is worth doing well."
Perhaps my favorite
suggestion from Part Three comes from the chapter on grading and
reporting. Wiggins's argument about self-assessment has been
based on several premises, including the fact that without clear
feedback and information about the quality of one's performance,
no student can improve. He therefore proposes that report cards
carry at least two different kinds of information, which he terms
"scores" and "grades." "Scores" would carry criterion-
referenced information about students' performances on important
learning goals. "Grades" would carry teacher judgments about
those performances in context, taking into account to what extent
students met reasonable expectations given their prior work,
whether they worked to capacity, and what is normal progress for
students at their developmental levels. This feels to me much
more like adding criterion-referenced information to grades than
like adding effort-based information to grades, a strategy
sometimes suggested that I feel is sure to fail (Brookhart,
1994).
Part Four, "Changing
the System," contains Chapters 12 and 13, "Feasibility: Real and
Imagined," and "Next Steps." The "Feasibility" chapter stresses
that assessment reform requires new ways of thinking about the
use of time. The "Next Steps" chapter is cookbook in format,
built around a list of 16 strategies, but it is substantively
grounded. Some of the suggestions are quite practical, for
example Strategy 2: "Begin to develop a few authentic assessment
tasks, where there is most evidence or agreement of need" (p.
330). But the substance of validity and reliability are solidly
built into these strategies as well. For example, Strategy 7
reads, "Redefine passing to ensure that (at least some portion
of) a grade is standard-based" (p. 331). Strategy 9 reads, "Go
for scoring consistency" (p. 333).
The Classroom Assessment Conversation
I submit that this book
serves two important purposes. First, as the outline above
shows, this very readable and persuasive book has the potential
to influence classroom teachers' assessment practices in positive
ways. Wiggins not only argues persuasively for that, he also
gives practitioners clear directions that they can follow to
design assessments and feedback mechanisms that educate their
students. Second, I think that the measurement community
involved in a more theoretical conversation about classroom
assessment will find that some of the issues underlying Wiggins's
concept of "educative" assessment will help advance the scholarly
consideration about classroom assessment. In this section, I
will consider two of these admittedly related issues: the
application of validity theory to classroom assessment and the
difference between formative and summative assessment. In this
discussion I will be speaking beyond the boundaries in Wiggins's
book, although I will I think be supported in my discussion by
what is written in or implied by the arguments in Educative
Assessment. At any rate, readers should know that the
responsibility for what follows is my own and not author
Wiggins's.
Validity Evidence
A recent article in
Educational Researcher (Terwilliger, 1997, see also 1998)
suggested that "authentic" assessments relied on the face
validity of an assessment, a practice that measurement theorists
would caution against. In an exchange of views, Newmann, Brandt,
and Wiggins (1998) countered with descriptions of how "authentic"
assessment accomplishes the purpose of demonstrating to students
what work in a discipline looks like, thereby imputing some
purpose for their learning. I would add that "authenticity" is a
relative term--authentic to what? From a student's perspective,
academic work that is authentic to the kind of work they will
have to do in future study is very important. For example, many
a high school student justifies even the most abstract study with
"I'll need this for college."
In the exchange, the
question of validity was raised. Terwilliger made the point that
empirical evidence for validity of authentic assessments and
other performance assessments (these terms, while related, are
not synonymous) is often lacking, detracting from the arguments
of its supporters. Newmann (1998) countered by citing some
evidence, and there is some other work in the literature about
the validity of performance assessment, often with mixed results.
One place to look for consequential evidence of the validity of
performance assessments is in the effects of its use on student
learning, its "educative" results. Shepard and her colleagues
concluded a study of the effects of using classroom performance
assessments (Shepard, Flexer, Hiebert, Marion, Mayfield, &
Weston, 1996) with some discussion about related elements like
professional development opportunities for teachers, familiarity
with and time for assessment innovation, and so on.
So what evidence
does it take to show that a classroom performance assessment is
valid? Here, I think, is where the work needs to be done. In my
opinion, much of the trouble with the validity of performance
assessment starts with construct definition. Terwilliger (1997)
used analytical reasoning ability as an example of a construct.
This probably is a construct more amenable to a test than a
performance assessment. But complex, performance-based
assessment (Linn, Baker & Dunbar, 1991) is usually meant to tap
more task-related constructs. One of the points of disagreement
in the Terwilliger-Wiggins exchange was the apparent importance
each ascribed to basic knowledge. Basic knowledge, often well
assessed with paper and pencil tests, is necessary but not
usually sufficient for good performance on complex performance
tasks. Application skills in the context of working with the
knowledge usually comprise the kind of constructs performance
tasks are designed to elicit. Thus the first evidence for the
validity of performance assessment might include a thoughtful
reflection on what students' work on a particular task might be
expected to show.
I think Wiggins
gives us all a clear reminder of that in this advice aimed at
practitioners. In this example, Wiggins is describing for
teachers why putting the Socrates of Plato's Apology on
trial in the classroom would make an illuminating instructional
activity but not a very good assessment of students'
understanding of Socrates and Plato's Apology (p. 31):
Although the desired achievement involves the text and
its implications, the activity can be done engagingly
and effectively by each student with only limited
insight into the entire text and its context. If a
student merely has to play an aggrieved aristocrat or
playwright, he or she can study for that role with only
a limited grasp of the text. Also, the student's trial
performance need not have much to do with Greek life
and philosophy. The question of assessment validity
(Does it measure what we want it to measure?) works
differently, requiring us to consider whether success
or failure at the proposed task depends on the targeted
knowledge (as opposed to fortunate personal talents):
the performance of the student playing, say, one of the
lawyers may be better or worse relative to his or her
debating and lawyering skills rather than relative to
his or her knowledge of the text....It is highly
unlikely that we will derive apt and sufficient
evidence of understanding of the text from each
individual student through this activity, even if we
can hear an understanding of the text in some comments
by some students. In fact, in the heat of a debate or
mock trial, students may forget or not be able to use
what they understood about the text, depriving
themselves and us of needed assessment evidence.
It seems that one
important first step in assuring the validity of classroom
performance assessments is reasoning like this at the planning
stage, and this quote from Wiggins's book gives us a good example
of how teachers might think about this. It also seems that
documentation of thoughtful reflection of this sort might be the
first item of validity evidence for classroom performance
assessments.
If classroom
assessment is supposed to be educative, then it follows that it
has served its purpose if students do learn from their
participation in it. Thus another piece of validity evidence is
suggested if one follows Wiggins's concept to its logical end:
there should be evidence of further student learning. This kind
of validity evidence is consequential evidence for validity
(Messick, 1989) and, as Messick's framework reminds readers,
positive evidence of intended consequences is only part of the
picture. There should also be evidence of an absence of
negative, unintended consequences. Working out how to collect
good evidence of future learning, of the "educative" nature of
classroom performance assessments, is an area for development, I
think.
Formative and Summative
Another of the issues
that has not been handled completely yet in the classroom
assessment field is the difference between formative and
summative assessment. As the saw goes, "When the cook tastes the
soup, that's formative assessment; when the customer tastes the
soup, that's summative assessment." Black (1998, Black &
Wiliam, 1998) emphasizes that in formative assessment for
classroom learning, the key is that the learner perceives a gap
between a desired learning goal and the state of his or her
knowledge or performance and then acts to close that gap. Black
uses formative assessment to mean assessment in classrooms, and
summative assessment to mean large-scale standardized measures.
This theme about the need for information that students
understand and can use, preferably generated by self-assessment
as well as teacher assessment of performance, runs through
Wiggins's book and is central to the concept of "educative"
assessment.
But the distinction
between formative and summative assessment is not nearly as clear
in practice as it is in the literature -- at least not yet. In
classrooms, students' assessments are certainly used in a
formative manner, but most also "count" in a grade or summative
judgment of some kind. The student is expected to be both the
cook and the customer here. Some classroom assessments, for
example tests at the end of units of instruction, are more
summative than formative, but teachers and students alike expect
that they will provide evidence of effective study and highlight
areas of strength and weakness to file away for self-
understanding. In schooling as currently practiced, the
formative/summative distinction is a blur.
Thus there is room
for some work on the distinctions, purposes, designs, and (full
circle!) validity issues between formative and summative in
classroom assessment. I think Wiggins begins that work by
suggesting that both criterion-referenced scores and expectation-
referenced grading judgments be given to students. Each score
serves different formative and summative purposes.
Another issue hiding
in the fuzziness between formative and summative assessment is
whether the validity evidence necessary to show that a classroom
assessment is "educative" is necessary or even desirable for a
large-scale performance assessment. Should large-scale
assessments for selection and placement decisions be "educative"
in the same way as classroom assessment? If so, then the doing
of them should accrue to the education of the students, but
defining how and where that should show itself is not
straightforward logic. To be "educative" is by definition to be
formative, but there are some large-scale assessments whose
purpose can be construed as purely summative. If not, should
performance assessment be used at all in a large-scale
assessment? That is, if a large-scale, purely summative
assessment is not supposed to be "educative" or formative in any
way, then would it not be sufficient to use a well-constructed
test for checking an individual's status on some construct(s)?
Then conventional evidence for test validity should suffice.
Even if consequential evidence for validity were collected, it
would be evidence of the social and programmatic results of test
use, not evidence of further learning for the test takers.
I feel only slightly
guilty for raising these issues without working them through
thoroughly for readers. These issues have been on my mind, and
the minds of at least some others, already. What Wiggins's
"educative assessment" does is act as a catalyst. It becomes
easier to demonstrate and discuss these issues with other
educators because he has presented such a clear and solid
treatment of what classroom assessment should be.
Conclusion
Wiggins's book
Educative Assessment describes a vision for classroom
assessment that contributes to student learning. It accomplishes
its purpose admirably, and I commend it to readers. Wiggins has
provided practitioners with a useful, readable tool.
The book also serves
another purpose, for which I commend it to the measurement
research community, and that is inspiring thinking about the
nature of validity and of the formative - summative distinction
in classroom assessment. Our field would benefit from some more
solid theoretical thinking and empirical study in these areas.
While such work is already happening to some degree, Wiggins's
book does the whole educational community the service of raising
these issues in terms that will inspire thoughtfulness among both
practitioners and researchers, to the ultimate benefit of the
students who are central to all of our work.
References
Black, P. (1998). Testing: Friend or foe? Theory and practice
of assessment and testing. London:
Falmer Press.
Black, P., & Wiliam, D. (1998). Assessment and classroom
learning. Assessment in Education,
5, 7-74.
Brookhart, S. M. (1994). Teachers' grading: Practice and theory.
Applied Measurement
in Education, 7, 279-301.
Brookhart, S. M. (1995). Book review: Assessing student
performance: Exploring the purpose
and limits of testing by Grant P. Wiggins.
Educational Measurement: Issues and Practice, 14(2),
29-30.
Linn, R. L., Baker, E. L., & Dunbar, S. (1991). Complex,
performance-based assessment:
Expectations and validation criteria. Educational
Researcher, 20(8), 15-21.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational
measurement (3rd ed., pp. 13-
103). New York: Macmillan.
National Forum on Assessment. (1995). Principles and
indicators for student assessment systems.
Cambridge, MA: FairTest.
Newman, F., Brandt, R., & Wiggins, G. (1998) An exchange of views
on "Semantics,
psychometrics, and assessment reforem: A close look at
‘authentic' assessments." Educational Researcher,
27(6), 19-22.
Shepard, L. A., Flexer, R. J., Hiebert, E. H., Marion, S. F.,
Mayfield, V., & Weston, T. J.
(1996). Effects of introducing classroom performance
assessments on student learning. Educational Measurement:
Issues & Practice, 15(3), 7-18.
Terwilliger, J. (1997). Semantics, psychometrics, and assessment
reforem: A close look at
"authentic" assessments. Educational Researcher,
26(8), 24-27.
Terwilliger, J. (1998). Rejoinder: Response to Wiggins and
Newmann. Educational Researcher,
27(6), 22-23.
About the Reviewer
Susan M. Brookhart
Susan Brookhart
is an Associate Professor in the School of Education
at Duquesne University, Pittsburgh, PA 15282. She holds a Ph.D. in
Educational Research and Evaluation from The Ohio State University. Her
research specialty is classroom assessment, and she is a past chair of the
AERA Special Interest Group on Classroom Assessment. She is the author or
co-author of over 40 articles on classroom assessment and teacher education
and the author of a forthcoming monograph, "The Art and Science of Student
Assessment," for the ERIC Clearinghouse on Higher Education. She serves on
the editorial boards of Applied Measurement in Education and Teachers
College Record and is a current columnist on Education and Academics for
National Forum.
|
No comments:
Post a Comment