A Cascade of Errors in Interim “Summative” Assessments

I have been working on a paper investigating teachers interpreting student performance data. The data come from their district’s interim summative assessments, tests given every 6-8 weeks to help them predict how students will perform on the high stakes end of year tests. These interim assessments have taken a very important place in these schools, who are under the threat of AYP sanctions.

The teachers are all working so hard to do right by their kids, but there is a cascade of errors in the whole system.

First, the assessments were internally constructed. Although they match the state standards and have been thoughtfully designed, they have not been psychometrically validated. That means that when they are used to measure, say, a student’s understanding of addition of fractions, it has not been determined over repeated revision that this is what is actually being tested.

Second, the proficiency cut points are arbitrary, yet NCLB has everybody worried about percentage of students above proficiency. This is a national problem, as was so eloquently laid out in Andrew Ho’s 2008 article in Educational Researcher.

In the end, we are sacrificing validity for precision. We think these data reports tell us with great accuracy about who is learning what and to what degree. But there is reason to believe that this cascade of errors is just another sorting and labeling mechanism interfering with real teaching and learning.


My Opinion on Common Core State Standards in Mathematics

I have had the luxury of taking time to form my opinion on the new Common Core Standards.

There are three issues to consider, all of which get discussed when we talk about them.

1. The content of the standards themselves.

2. The nature of the assessments used to hold schools accountable for them.

3. The implementation of them, from curricular support, professional development and accountability processes.

My take on Issue 1 is that they are a strong first draft. The practice standards are the boldest and most important innovation, since they press on higher order thinking. Nonetheless they have some flaws. For instance, a teacher friend told me one grade asks that students learn to make box-and-whiskers plots while the subsequent grade asks for students to compare them to look at differences in measures of central tendency. Well, making those plots without looking comparatively is a silly exercise since the whole point is that they make measures of central tendency and spread visible. Goofs like this could be tweaked in field testing, but the authors did not have that chance.

2. I had some hope that the ‘second generation’ assessments developed for CCSS-M would be a step up from a lot of what we have seen. The release items I have seen so far have not carried out that promise.

3. The biggest problem, in my mind, is the rush of implementation and the lack of resources to make this ambitious goal feasible. Perhaps the most fatal aspect of implementation is that CCSS-M is getting put into the very flawed infrastructure of NCLB/RTTT. On the ground, it ends up feeling like a turning of the screws in the already problematic accountability pressures schools and teachers are facing.

What does research on teaching have to say to teachers?

image from Washington Post Answer Sheet Blog

image from Washington Post Answer Sheet Blog

Recently, I got in a twitter conversation with a practicing teacher who expressed skepticism that research on teaching might have anything to say to his work in the classroom.

I explained that, unlike the imagined randomized controlled experiment he suggested, research on teaching takes different forms. Some of it aims to be prescriptive about the nature of “best practice,” but it has other aims as well.

Here is an incomplete list of genres of research on teaching and how they might inform practicing teachers. My examples naturally reflect my own experience as a mathematics education researcher, but hopefully, you can imagine other examples as well.

International comparison studies.

International comparison studies give us a range of images of what can happen in the name of “schooling” and “education.” Because teaching often demands imagination on the part of practitioners, these studies can be generative by providing well-drawn images of possibility.

The culturally simplistic interpretation of international comparison studies is, “Country X does Y and succeeds on international comparisons, so we should do Y too!” Often there is a whole bunch of things that make Y possible in Country X.

For instance, Japanese lesson study is so fundamental to the profession in that country, it is nearly impossible for teachers to imagine themselves not doing it. When we try to use this format with U.S. teachers, who have often experienced many fads of professional development, it is sometimes difficult to overcome their doubts about the process.

Historical analysis.

Some fundamental questions of teaching have been asked for as long as there have been teachers and students, from Meno’s paradox to issues of who should be educated in a democratic society to the very nature and conditions of teachers’ work.

Seeing connections to current educational debates their analogs in earlier times helps teachers sort through rhetoric and imagine new answers to age-old questions.

Sociological analysis.

School is one of the major socializing institutions in any society. Understanding how schools and school systems act as machines for social reproduction can help teachers be more deliberate about working around those forces.

For instance, recognizing the bias enacted when students’ self-select into courses might press a teacher to counter this source of social reproduction. Perhaps the teacher will make sure students from social groups underrepresented in a subject or course are invited and feel valued once they enroll.

Observational analysis.
Some of the most complex moments of teaching happen during the course of instruction. One of the oldest observational studies of teaching, Philip Jackson’s Life in Classrooms, pointed out such useful things as how classrooms are the most crowded institutional settings we have (looking at persons/square foot) and that important themes of classrooms are crowds, praise, and power. This book also gave us the useful construct of hidden curriculum, or the unintended teaching that goes on as students learn to be successful in school.

Other seminal observational work include the codifying of patterns of classroom discourse, especially the default I-R-E pattern. IRE stands for initiation, response, evaluation, describing the dominant form of teacher and student interactions. This insight helped us uncover some of the limitations of instruction for deeper learning, as well as some cultural mismatches between instructional discourse and students’ home cultures.
Teaching experiments.
Teaching experiments involve putting forth a concept of how teaching and learning might unfold, using innovative frameworks. The most famous of these in my field come from Magdalene Lampert and Deborah Ball, who asked what elementary instruction might look like if it took mathematics content very seriously. The power of these studies is not their generalizability per se, but rather as extremely well-considered existence proofs, showing what is possible. Like the international comparisons, these can help teachers reimagine their classrooms in ways that might provide more students with access to powerful learning.