Public schools in DC have gained ground on national tests over the past 15 years, but much of that gain is due to the changing demographic composition of DC’s student body. Parents’ income and education are primary determinants of student performance, and as the income and education of DC residents has improved, so have incoming students’ scores. Comparing NAEP scores over time without accounting for incoming class scores is mere jackaNAEPery, and comparing proficiency levels is no better.

**The best measure of school quality is how much students learn**: their improvement from incoming to outgoing scores, not how well they test at just one point in time. Measuring how much students learn over time is difficult, and measures of teacher impact and school-specific test score growth compare only how well DC public school students do relative to other DC public schools students. This leads to relative rankings only, not measures of how DC public schools as a whole are improving, as do most other measures of value added by schools.

So how can we gauge how much a given school improves students’ scores relative to other schools? Using the median growth percentile (MGP) may be the best bet. Each student’s growth percentile score is equal to the percentage of comparable students districtwide who performed worse on a later math and reading exam. The school’s score is then the typical student’s score.

For example, imagine Nicole earns a 45 on a math exam at the end of 4th grade. Nicole’s comparison group is students who also scored around 45 on that exam. Now imagine that at the end of 5th grade Nicole scores a 52. If that 52 is better than the scores of 70 percent of her comparison-group peers, then Nicole’s growth percentile is 70, which is a measurement of her relative growth. From there we can figure out each school’s MGP, the middle growth percentile of all of that school’s students.

A typical school will have a median growth percentile of 50, while 70 means a school is doing substantially better than a typical school and 30 means a school is doing substantially worse. So we can say one school is better than another if its MGP is higher, meaning that at least the typical growth level is higher at one school than the other (it could still be the case that the lower-ranked school serves a specific subgroup much better than the higher-ranked school).

**The main problem is that these scores are fraught with measurement error**, so rarely can we say with confidence that one school is better than another school. For example, Bancroft was in the top third on math MGP for 2011-12 and H.D. Cooke was in the bottom third, but the two math MGP scores are statistically indistinguishable (their scores’ margins of error overlap).

The graph below shows each school’s math versus reading MGP scores for 2011 and 2012. A box around each dot indicates the range of statistically indistinguishable values for both math and reading in that year (2013 data were not released with the ranges). This way, we can compare the score both to other schools and to the typical school with a score of 50 (whose dot would be at the intersection of the math and reading lines at 50). Each school’s box overlaps with the boxes of many other schools, and any two schools whose boxes touch are statistically indistinguishable. In many cases, many dots representing best guesses for other schools will be included in the range for a given school, for example Barnard ES (Lincoln Hill) in the graphic below. Whenever we compare schools, we should remain appropriately skeptical about the relative strengths of signal and noise in these data.

So what *can* we say? **Charter schools (in blue) tend to have higher growth scores**, but traditional public schools (in dark gray) are overrepresented at both the highest and lowest ranks. The main difference between the sectors is that charters tend not to be observed among the lowest-performing schools, suggesting that the worst charters are better than the worst traditional public schools.

**But there are surprises in both sectors**. For example, what is the best high school in town? Ranked by the MGP for math, it is Thurgood Marshall Academy (a charter), east of the Anacostia, with McKinley Technology High School (DCPS) in Eckington running a close second. Only a handful of other high schools are statistically better than average on math in both years, but the sought-after high schools Wilson and School Without Walls are not on that list.

Among elementary schools, the top performers are scattered around the city, for example Ross in Dupont, Stanton in far Southeast DC, Tubman in Columbia Heights, Watkins on Capitol Hill, and Stoddert in Glover Park. The top middle schools are the KIPP DC AIM and KEY academies, with Cesar Chavez Prep and DC Prep’s Edgewood Middle charters not far behind. The highly regarded Deal Middle School in Ward 3 barely squeaks out a statistical advantage over the typical school.

*Interested in learning more about how education in DC has evolved? Explore the schools chapter of Our Changing City, an interactive web feature that uses data to tell the story of change in the District of Columbia.*

From a policy standpoint, charter schools’ push-out/counsel-out privilege must always be mentioned before any comparison is attempted. DC’s Thurgood Marshall Charter HS is a classic example. Between the Oct gr9 enrollment count and 10th grade DC CAS test an average of 40% of the cohort has been transferred – and that’s from a school that may not have repeat 9th graders.

See WP letter: http://tinyurl.com/9wck4y5

For comparison purposes, where shd one draw the line on % of students transferred beyond which comparisons are statistically invalid?

True enough, and offloading students not making good progress is only one kind of selection that can be employed. High mobility, in both charters and DCPS, makes it hard to assign changes in test scores to only one school. OSSE should report the median GP scores of students not enrolled for the full academic year as well, and enough detail to construct “selection corrections” for schools with many students leaving midyear.

An interesting measure but perhaps not reliable. Checking Watkins I find see that Watkins in 2011 appear s well above average but only slightly above average in 2012. Moreover suppose that a school has an outstanding early education program bringing most of its children to full potential. After that the additional value added may be only average. Another school gets dominantly children with poor early childhood education a year or two in only an average program may show substantial progress.

Ask parents what the parent believes constitutes a good school? Parents will vote with their feet. If we want to keep middle class parents with middle school children in the district we will have to provide them with the schools that provide the children with the classrooms that the parents believe provides a good education.

This type of analysis of school performance is needed and helpful. Over time, I expect this analysis would be more fine-grained. For example, at the level of the school, it seems likely that some schools will be out-performers for kids at proficient v. advanced levels, or kids with different learning styles. Do you think teachers should be evaluated using the MGP metric? We all should worry about needless testing, but I’ve always thought that ‘teaching to the test’ is okay if the test is well-designed. Love to hear your thoughts.

Expanding Cliff’s point some children will thrive in a tightly disciplined environment. Others not so much. Thirty years ago here in DC some parents in my cohort were choosing two or perhaps more schools for their children based on the child. The MGP metric is a valid measure of teacher performance only to the extent that the results are independent of the interaction between teaching and the skills and backgrounds of the students.

On reliability, some year-to-year movement is due to changing students and teachers, and the productivity of matches thereof, and some due to random error (the huge boxes around estimates are supposed to capture the random error part but not compositional changes), so it is always a good idea to look at scores drawn from several years.

I agree with Cliff Kellogg that some teachers and schools tend to serve students of a particular starting level better than others, and there are problems with the tests currently used (lack of “vertical integration” and sophisticated computer adaptive testing to correctly identify fine gradations in achievement at all levels, without making the tests needlessly long).

It’s also important to bear in mind that reading and math test score gains capture only two of the dimensions that matter, and many parents would be happy to sacrifice a modicum of test score gain for more arts, foreign language, sports, or other programs.

The blog’s point was a simpler one: we should never compare schools based on proficiency levels, because a point-in-time measure ignores the starting points of incoming students. A school can have low test scores but be doing more to raise test scores than another school with high test scores, if the second school simply attracts students with high test scores and does little to raise them.