Wednesday, December 4, 2024

Millman, Jason (Ed.). (1997). Grading Teachers, Grading Schools: Is Student Achievement a Valid Evaluation Measure? Reviewed by Marie Miller-Whitehead, Tennessee Valley Educators for Excellence

 

Millman, Jason (Ed.). (1997). Grading Teachers, Grading Schools: Is Student Achievement a Valid Evaluation Measure? Thousand Oaks, CA: Corwin Press.

Pp. xviii + 283

$69 (Cloth)       ISBN 0-8039-6401-3.

Reviewed by Marie Miller-Whitehead
Tennessee Valley Educators for Excellence

November 6, 2001

Grading Teachers, Grading Schools has been in publication for four years now and it is perhaps an appropriate time for a retrospective examination of the information and perspectives shared by its various contributors, all of whom have many years of experience in school evaluation at the national, state, or district level. Millman has done a creditable job of gathering together some of the foremost authorities on instructional effectiveness and the use of student achievement and performance as components of teacher and school evaluation, several of whom are widely published and regularly consulted by legislative policymakers. Contributors include Peter Airasian, Anthony Bryk, Ronald Hambleton, Linda Darling-Hammond, James Popham, William Sanders, Daniel Stufflebeam, and William Webster, to name only a few. Other contributors provide interpretation or examples of the application of statistical models, as this book addresses both theoretical and practical issues related to the measurement of student achievement in teacher and school evaluation. Some of the viewpoints and techniques have proven to be controversial with teachers and administrators alike, but the book is well-balanced and the chapters are thoughtfully and carefully written with commentary designed to place the whole in its proper perspective. Additionally, the book is readable: several statistical models are presented, ranging from simple correlation to complex regression models, but one need not be a statistician to find this book both interesting and useful as well as a valuable resource. However, a knowledge of multiple regression and mixed-model equations will prove most helpful in understanding of the Dallas and Tennessee models. This is indeed not a book that one will read with a vague sense of detachment and mild interest: one will either strongly agree or strongly disagree with most of what is contained within.

Each of the teacher and school effectiveness models presented in this book has been developed at least partially in response to legislative mandate. Several are high stakes systems, meaning that results may be tied to either incentives or sanctions for schools (or both). The accountability and evaluation initiatives of Kentucky and Tennessee, discussed at length, are at least in part the tangible results of state educational equity litigation aimed at assuring that all children within those states be provided with at least a minimally adequate education. This is one of the reasons that debate over the development and implementation of the fairest and most methodologically sound measure of student achievement and progress has often been contentious, not to say outright heated, as failure to satisfy the demands of the various constituencies involved has generally resulted in further intervention by the courts of those respective states. In Tennessee, teachers in less wealthy school districts where salaries are substantially below those of more wealthy districts have continued to seek means to redress salary inequity, which also means demonstrating accountability for instructional effectiveness and student progress and achievement.

Although mentioned briefly in their TVAAS discussion by Patricia Ceperly and Kip Reel, the purpose of Millman's book is not to investigate the legal intricacies that have led to the development of some quite sophisticated evaluation models, but rather to provide a forum for their developers and advocates to share empirical evidence of the merits of their particular model as a valid evaluation measure. A major bone of contention for educators has been, and remains, the issue of whether teachers and schools should be accountable for holding all students to the same standards regardless of ability or background, or whether they should be accountable for assuring that all students achieve at least a year's growth in learning each year they are in school. This book provides the proponents of each of these various viewpoints an opportunity to present their case and they do it well. Each of the accountability systems and evaluation models presented describes the philosophy and methodology used by its developers to assure equity and fairness in school and teacher evaluation, taking into account the diversity that exists in communities that are served by public schools.

Specifically, Grading Teachers, Grading Schools is organized into a preface and six parts, with the preface providing a brief biographical sketch of each of the thirty-two authors. Part one is an introduction by Jason Millman and H. D. Schalock. Part two is devoted to Oregon's teacher work sample model (TWSM), part three to the Dallas Public Schools value-added accountability system, part four to Tennessee's statewide value-added assessment system (TVAAS), part five to the Kentucky instructional results information system (KIRIS), and part six provides a synthesis with perspectives by Millman, Linda Darling-Hammond, and James Popham. With the combined experience and insights of scholars such as these, this is a book that deserves a place on the bookshelf of anyone who has an interest in high stakes testing, educational accountability, or the evaluation of instructional effectiveness. Although the book is limited to the models of Oregon, Dallas, Tennessee, and Kentucky, the discussion and rationale that underlie these diverse accountability models have fundamental commonalities with many of those in use or being considered by states and school districts involved in reform of their systems of educational accountability.

The Oregon TWSM is essentially a qualitative method of authentic performance assessment consisting of complex scoring rubrics for the classification of community, school, teacher, and student characteristics and outcomes. The five chapters devoted to the Oregon model present a rationale, descriptors of the measures, methodology for computing the IPG (Index of Pupil Growth), and linkages of measures to desired outcomes. The Oregon initiative, used primarily in teacher certification programs, was developed in an effort to address the concerns of the education community that the use of multiple choice standardized tests as measures of school and teacher effectiveness fails to assess a large part (or even the greatest part) of what is actually taught in the classroom. Thus, part of the model evolved from a project-based learning assessment developed at Western Oregon State College that identified a variety of classroom tasks ranging from low complexity, low demand to high complexity, high demand. Because the Oregon TWSM does not rely on standardized test results it receives high marks on providing a mechanism for local autonomy in making decisions about student achievement and progress. However, a methodology for assuring the portability of the model from district to district was still under development at the time this book was printed and the model had not been widely used for making decisions about teacher tenure or merit pay.

William Webster and his research and evaluation team at the Dallas Public Schools have developed a complex HLM model to identify effective schools that they have used successfully since 1984, controlling for many school, community, and student characteristics that effect student learning. The system uses student scores on standardized tests (such as ITBS) as the primary dependent variable although schools are also accountable for attendance, dropout, promotion rates, enrollment in advanced placement and honors courses, and graduation rates. Although both measure continuous improvement and use statistical modeling to compute value-added school effectiveness indices, there are fundamental and important differences between the Tennessee TVAAS and the Dallas models. The basic formulas for both models are provided in the various chapters that discuss each. However, briefly, the Dallas model provides a priori controls for what it terms "fairness variables," such as ethnicity, native language, and socioeconomic status in addition to ability (as identified by prior achievement). At the school level, the model accounts for school size and crowdedness, student mobility, percent of minority students, and SES, with school level outcome variables based on the two prior years of data for each of the variables. The model developed by William L. Sanders for Tennessee does not control for exogenous variables, with a value-added gain score for each student computed on three years of test score data. Thus, each student serves as his own control, with three years of data on each child in a school serving as a blocking factor or within subjects design that controls for ability, SES, ethnicity, and other confounding variables that effect student achievement. However, both of these school effectiveness models have faced difficulties due to shrinkage, particularly in communities with high mobility rates, where there may not be sufficient data to compute a value-added index for many students. For a variety of reasons, an effectiveness index is not computed for approximately 30 to 40 percent of Dallas teachers. There must be a least six students in a class for whom value-added scores can be computed for a teacher index to be computed. A similar issue has been raised in Tennessee, where three years of student scores on the TCAP tests (the CTBS or its replacement) are necessary for a student's value-added score to be computed. However, Walberg and Paik's review of the TVAAS system based on the joint committee standards for personnel evaluation found that 19 of 21 standards for propriety, utility, feasibility, and accuracy were addressed adequately (Stufflebeam, 1988).

Kentucky has a high stakes accountability system that went the extra mile and actually implemented financial incentives to schools that demonstrated high levels of improvement in student achievement while issuing sanctions for schools that failed to meet standards. The Kentucky accountability system is rather complex, with schools measured against progress on their own baseline data for percent of students proficient in all measured areas. Schools are expected to close this gap by 10 percent at the end of each accountability cycle with the ultimate goal of having 100 percent of each school's students proficient in all measured areas of achievement or performance. Initially, the Kentucky accountability system was intended to be based exclusively on portfolio assessment and authentic assessments of student performance rather than on standardized test scores. Performance-based assessments are expensive to implement and are labor intensive compared to multiple choice standardized tests. The KIRIS system was subsequently redesigned so that while many of the measures of student achievement remain performance-based assessments in the humanities and arts, standardized test score data is also used as an accountability indicator. The chapter by Ronald Hambleton provides a comprehensive review of the KIRIS.

The concluding chapters of this book by Linda Darling-Hammond and James Popham are little gems of insight based on years of experience by each in the field of teacher evaluation. Darling-Hammond asks two key questions that should be answered by any assessment system that purports to measure school or teacher effectiveness and then applies each question to the four systems described in this book. Popham, an authority on criterion-referenced testing and evaluation, describes himself as "in recovery" from teacher evaluation, but tempted to backslide beyond his strength to refuse by being requested to comment on the school and teacher evaluation models of Oregon, Dallas, Tennessee, and Kentucky. He has strong opinions on several of these models and shares them with his usual wit and candor.

Assessment and evaluation of schools and teachers is a fact of life. Whether one is a proponent of accountability via standardized tests and statistical models or of scoring rubrics and performance assessments, Grading Teachers, Grading Schools provides a brief glimpse of legislative mandates that often drive education reform efforts and a rather close examination of the process of developing an accountability model as well as the hurdles that will of necessity be faced by those who do so. It should be emphasized that each of these accountability models mandate school effectiveness outcomes in addition to student achievement, such as graduation rates, promotion, and attendance; however, the book does not address those issues, but rather seeks to answer the question, "Is student achievement a valid evaluation measure?" The consensus answer of the contributors seems to be, "It is if you do it right."

References

Stufflebeam, D. L. (1988). Personnel evaluation standards: How to assess systems for evaluating educators. Newbury Park, CA: SAGE.

About the Reviewer

Marie Miller-Whitehead, Ph.D.
Director, Tennessee Valley Educators for Excellence
TVEE.ORG
PO Box 2882
Muscle Shoals, AL 35662

Research interests: program evaluation and research, school district accountability indicators, computer assisted learning, educational politics and policy, educational equity for minorities and underserved populations.

No comments:

Post a Comment

Spillane, James P. (2004). <cite>Standards deviation: How schools misunderstand education policy.</cite> Reviewed by Adam Lefstein, King's College, London

  Education Review/Reseñas Educativas/Resenhas Educativas Spillane, James P. (2004). Standards deviati...