A Cascade of Errors in Interim “Summative” Assessments

I have been working on a paper investigating teachers interpreting student performance data. The data come from their district’s interim summative assessments, tests given every 6-8 weeks to help them predict how students will perform on the high stakes end of year tests. These interim assessments have taken a very important place in these schools, who are under the threat of AYP sanctions.

The teachers are all working so hard to do right by their kids, but there is a cascade of errors in the whole system.

First, the assessments were internally constructed. Although they match the state standards and have been thoughtfully designed, they have not been psychometrically validated. That means that when they are used to measure, say, a student’s understanding of addition of fractions, it has not been determined over repeated revision that this is what is actually being tested.

Second, the proficiency cut points are arbitrary, yet NCLB has everybody worried about percentage of students above proficiency. This is a national problem, as was so eloquently laid out in Andrew Ho’s 2008 article in Educational Researcher.

In the end, we are sacrificing validity for precision. We think these data reports tell us with great accuracy about who is learning what and to what degree. But there is reason to believe that this cascade of errors is just another sorting and labeling mechanism interfering with real teaching and learning.

My Opinion on Common Core State Standards in Mathematics

I have had the luxury of taking time to form my opinion on the new Common Core Standards.

There are three issues to consider, all of which get discussed when we talk about them.

1. The content of the standards themselves.

2. The nature of the assessments used to hold schools accountable for them.

3. The implementation of them, from curricular support, professional development and accountability processes.

My take on Issue 1 is that they are a strong first draft. The practice standards are the boldest and most important innovation, since they press on higher order thinking. Nonetheless they have some flaws. For instance, a teacher friend told me one grade asks that students learn to make box-and-whiskers plots while the subsequent grade asks for students to compare them to look at differences in measures of central tendency. Well, making those plots without looking comparatively is a silly exercise since the whole point is that they make measures of central tendency and spread visible. Goofs like this could be tweaked in field testing, but the authors did not have that chance.

2. I had some hope that the ‘second generation’ assessments developed for CCSS-M would be a step up from a lot of what we have seen. The release items I have seen so far have not carried out that promise.

3. The biggest problem, in my mind, is the rush of implementation and the lack of resources to make this ambitious goal feasible. Perhaps the most fatal aspect of implementation is that CCSS-M is getting put into the very flawed infrastructure of NCLB/RTTT. On the ground, it ends up feeling like a turning of the screws in the already problematic accountability pressures schools and teachers are facing.