Millman, Jason (Ed.). (1997). Grading Teachers, Grading
Schools: Is Student Achievement a Valid Evaluation Measure?
Thousand Oaks, CA: Corwin Press.
Pp. xviii + 283
$69 (Cloth) ISBN 0-8039-6401-3.
Reviewed by Marie Miller-Whitehead
Tennessee Valley Educators for Excellence
November 6, 2001
Grading Teachers, Grading Schools has been in publication
for four years now and it is perhaps an appropriate time for a
retrospective examination of the information and perspectives shared
by its various contributors, all of whom have many years of
experience in school evaluation at the national, state, or district
level. Millman has done a creditable job of gathering together some
of the foremost authorities on instructional effectiveness and the
use of student achievement and performance as components of teacher
and school evaluation, several of whom are widely published and
regularly consulted by legislative policymakers. Contributors include
Peter Airasian, Anthony Bryk, Ronald Hambleton, Linda
Darling-Hammond, James Popham, William Sanders, Daniel Stufflebeam,
and William Webster, to name only a few. Other contributors provide
interpretation or examples of the application of statistical models,
as this book addresses both theoretical and practical issues related
to the measurement of student achievement in teacher and school
evaluation. Some of the viewpoints and techniques have proven to be
controversial with teachers and administrators alike, but the book is
well-balanced and the chapters are thoughtfully and carefully written
with commentary designed to place the whole in its proper
perspective. Additionally, the book is readable: several
statistical models are presented, ranging from simple correlation
to complex regression models, but one need not be a statistician
to find this book both interesting and useful as well as a
valuable resource. However, a knowledge of multiple regression
and mixed-model equations will prove most helpful in
understanding of the Dallas and Tennessee models. This is indeed
not a book that one will read with a vague sense of detachment
and mild interest: one will either strongly agree or strongly
disagree with most of what is contained within.
Each of the
teacher and school effectiveness models presented in this book has
been developed at least partially in response to legislative mandate.
Several are high stakes systems, meaning that results may be tied to
either incentives or sanctions for schools (or both). The
accountability and evaluation initiatives of Kentucky and Tennessee,
discussed at length, are at least in part the tangible results of
state educational equity litigation aimed at assuring that all
children within those states be provided with at least a minimally
adequate education. This is one of the reasons that debate over the
development and implementation of the fairest and most
methodologically sound measure of student achievement and progress
has often been contentious, not to say outright heated, as failure to
satisfy the demands of the various constituencies involved has
generally resulted in further intervention by the courts of those
respective states. In Tennessee, teachers in less wealthy school
districts where salaries are substantially below those of more
wealthy districts have continued to seek means to redress salary
inequity, which also means demonstrating accountability for
instructional effectiveness and student progress and
achievement.
Although mentioned briefly in their TVAAS
discussion by Patricia Ceperly and Kip Reel, the purpose of
Millman's book is not to investigate the legal intricacies that have
led to the development of some quite sophisticated evaluation models,
but rather to provide a forum for their developers and advocates to
share empirical evidence of the merits of their particular model as a
valid evaluation measure. A major bone of contention for educators
has been, and remains, the issue of whether teachers and schools
should be accountable for holding all students to the same standards
regardless of ability or background, or whether they should be
accountable for assuring that all students achieve at least a year's
growth in learning each year they are in school. This book provides
the proponents of each of these various viewpoints an opportunity to
present their case and they do it well. Each of the
accountability systems and evaluation models presented describes
the philosophy and methodology used by its developers to assure
equity and fairness in school and teacher evaluation, taking into
account the diversity that exists in communities that are served
by public schools.
Specifically, Grading Teachers, Grading Schools is
organized into a preface and six parts, with the preface providing a
brief biographical sketch of each of the thirty-two authors. Part
one is an introduction by Jason Millman and H. D. Schalock. Part two
is devoted to Oregon's teacher work sample model (TWSM), part three
to the Dallas Public Schools value-added accountability system, part
four to Tennessee's statewide value-added assessment system (TVAAS),
part five to the Kentucky instructional results information system
(KIRIS), and part six provides a synthesis with perspectives by
Millman, Linda Darling-Hammond, and James Popham. With the combined
experience and insights of scholars such as these, this is a book
that deserves a place on the bookshelf of anyone who has an interest
in high stakes testing, educational accountability, or the evaluation
of instructional effectiveness. Although the book is limited to the
models of Oregon, Dallas, Tennessee, and Kentucky, the discussion and
rationale that underlie these diverse accountability models have
fundamental commonalities with many of those in use or being
considered by states and school districts involved in reform of their
systems of educational accountability.
The Oregon
TWSM is essentially a qualitative method of authentic performance
assessment consisting of complex scoring rubrics for the
classification of community, school, teacher, and student
characteristics and outcomes. The five chapters devoted to the
Oregon model present a rationale, descriptors of the measures,
methodology for computing the IPG (Index of Pupil Growth), and
linkages of measures to desired outcomes. The Oregon initiative,
used primarily in teacher certification programs, was developed
in an effort to address the concerns of the education community
that the use of multiple choice standardized tests as measures of
school and teacher effectiveness fails to assess a large part (or
even the greatest part) of what is actually taught in the
classroom. Thus, part of the model evolved from a project-based
learning assessment developed at Western Oregon State College
that identified a variety of classroom tasks ranging from low
complexity, low demand to high complexity, high demand. Because
the Oregon TWSM does not rely on standardized test results it
receives high marks on providing a mechanism for local autonomy
in making decisions about student achievement and progress.
However, a methodology for assuring the portability of the model
from district to district was still under development at the time
this book was printed and the model had not been widely used for
making decisions about teacher tenure or merit pay.
William
Webster and his research and evaluation team at the Dallas Public
Schools have developed a complex HLM model to identify effective
schools that they have used successfully since 1984, controlling
for many school, community, and student characteristics that
effect student learning. The system uses student scores on
standardized tests (such as ITBS) as the primary dependent
variable although schools are also accountable for attendance,
dropout, promotion rates, enrollment in advanced placement and
honors courses, and graduation rates. Although both measure
continuous improvement and use statistical modeling to compute
value-added school effectiveness indices, there are fundamental
and important differences between the Tennessee TVAAS and the
Dallas models. The basic formulas for both models are provided
in the various chapters that discuss each. However, briefly, the
Dallas model provides a priori controls for what it terms
"fairness variables," such as ethnicity, native
language, and socioeconomic status in addition to ability (as
identified by prior achievement). At the school level, the model
accounts for school size and crowdedness, student mobility,
percent of minority students, and SES, with school level outcome
variables based on the two prior years of data for each of the
variables. The model developed by William L. Sanders for
Tennessee does not control for exogenous variables, with a
value-added gain score for each student computed on three years
of test score data. Thus, each student serves as his own
control, with three years of data on each child in a school
serving as a blocking factor or within subjects design that
controls for ability, SES, ethnicity, and other confounding
variables that effect student achievement. However, both of these
school effectiveness models have faced difficulties due to
shrinkage, particularly in communities with high mobility rates,
where there may not be sufficient data to compute a value-added
index for many students. For a variety of reasons, an
effectiveness index is not computed for approximately 30 to 40
percent of Dallas teachers. There must be a least six students in
a class for whom value-added scores can be computed for a teacher
index to be computed. A similar issue has been raised in
Tennessee, where three years of student scores on the TCAP tests
(the CTBS or its replacement) are necessary for a student's
value-added score to be computed. However, Walberg and
Paik's review of the TVAAS system based on the joint committee
standards for personnel evaluation found that 19 of 21 standards for
propriety, utility, feasibility, and accuracy were addressed
adequately (Stufflebeam, 1988).
Kentucky has
a high stakes accountability system that went the extra mile and
actually implemented financial incentives to schools that
demonstrated high levels of improvement in student achievement
while issuing sanctions for schools that failed to meet
standards. The Kentucky accountability system is rather complex,
with schools measured against progress on their own baseline data
for percent of students proficient in all measured areas. Schools
are expected to close this gap by 10 percent at the end of each
accountability cycle with the ultimate goal of having 100 percent
of each school's students proficient in all measured areas of
achievement or performance. Initially, the Kentucky accountability
system was intended to be based exclusively on portfolio assessment
and authentic assessments of student performance rather than on
standardized test scores. Performance-based assessments are expensive
to implement and are labor intensive compared to multiple choice
standardized tests. The KIRIS system was subsequently redesigned so
that while many of the measures of student achievement remain
performance-based assessments in the humanities and arts,
standardized test score data is also used as an accountability
indicator. The chapter by Ronald Hambleton provides a comprehensive
review of the KIRIS.
The
concluding chapters of this book by Linda Darling-Hammond and
James Popham are little gems of insight based on years of
experience by each in the field of teacher evaluation.
Darling-Hammond asks two key questions that should be answered by
any assessment system that purports to measure school or teacher
effectiveness and then applies each question to the four systems
described in this book.
Popham, an authority on criterion-referenced testing and evaluation,
describes himself as "in recovery" from teacher evaluation, but
tempted to backslide beyond his strength to refuse by being
requested to comment on the school and teacher evaluation models
of Oregon, Dallas, Tennessee, and Kentucky. He has strong
opinions on several of these models and shares them with his
usual wit and candor.
Assessment
and evaluation of schools and teachers is a fact of life.
Whether one is a proponent of accountability via standardized
tests and statistical models or of scoring rubrics and
performance assessments, Grading Teachers, Grading
Schools provides a brief glimpse of legislative mandates that
often drive education reform efforts and a rather close
examination of the process of developing an accountability model
as well as the hurdles that will of necessity be faced by those
who do so. It should be emphasized that each of these
accountability models mandate school effectiveness outcomes in
addition to student achievement, such as graduation rates,
promotion, and attendance; however, the book does not address
those issues, but rather seeks to answer the question, "Is
student achievement a valid evaluation measure?" The
consensus answer of the contributors seems to be, "It is if
you do it right."
References
Stufflebeam, D. L. (1988). Personnel evaluation standards: How to
assess systems for evaluating educators. Newbury Park, CA: SAGE.
About the Reviewer
Marie Miller-Whitehead, Ph.D.
Director, Tennessee Valley Educators for Excellence
TVEE.ORG
PO Box 2882
Muscle Shoals, AL 35662
Research interests:
program evaluation and research, school district accountability
indicators, computer assisted learning, educational politics and
policy, educational equity for minorities and underserved
populations.
|
No comments:
Post a Comment