“It’s time we changed – converting effect sizes to months of learning is seriously flawed”

Anyone with any kind of passing interest in evidence-informed practice in schools will be aware that effect sizes are often used to report on the effects of educational interventions, programmes and policies. These results are then summarised in meta-analyses and meta-meta analyses and are often then translated into more “understandable” units, such as years or months of learning. Accordingly, John Hattie writes about an effect size of 0.4 SD being equivalent to a year’s worth of learning. Elsewhere the Education Endowment Foundation in their Teaching and Learning Toolkit have developed a table which converts different effect sizes with months of additional progress being made by pupils. For example, an effect size of 0.44SD is deemed to be worth an additional five months of learning, or an effect of size of 0.96SD representing 12 months additional learning.

However, this approach of converting effect sizes into periods of time of learning would appear to be seriously flawed. In an article recently published in Educational Research – Matthew Baird and John Pane conclude

Although converting standardized effects sizes in educations to years (or months, weeks or days) of learning has a potential advantage of easy interpretations, it comes with many serious limitations that can lead to unreasonable results, misinterpretations or even cherry picking from among implementation variants that can produce substantially inconsistent results. We recommend avoiding this translation in all cases, and that consumers of research results look with scepticism towards research translated into units of times. P227 (Baird and Pane 2019)

Instead, Baird and Pane argue that when trying to convert standardised effect sizes – which by their very nature are measured on an abstract scale – the best way in which to judge where a programme/intervention effect is meaningful is to look at what would have been the impact on the median student in the control group, if they had received the treatment/intervention. For example, assuming a normal distribution in both the intervention and control groups, the median pupil in the control group – let’s say the 13th ranked pupil in a group of 25 – if they had received the treatment and the standardised effect size was 0.4 SD the pupil would now be ranked 9th in the control group.

Well what are the implications of this for anyone working with and in schools and who are interested in evidence-informed school improvement?

• Baird and Pane’s analysis does not mean that the work of Hattie or the Education Endowment Foundation is invalid and no longer helpful. Rather it means we should be extremely careful about any claims about interventions providing benefits in terms of months or years of additional progress.

• There are additional problems with the “converting effect sizes to months of learning” approach. For example, the rate of progress of pupils’ achievement varies throughout school and across subjects (see https://onlinelibrary.wiley.com/doi/full/10.1111/j.1750-8606.2008.00061.x) and the translation doesn’t make sense for non-cognitive measures (eg, of pupils’ well-being or motivation).

• There’s an interesting balancing act to be had. On the one hand, given their knowledge and understanding of research teachers and school leaders are going to have to rely on trusted sources to help them make the most of research evidence in bringing about school improvement. On the other hand, no matter how ‘big the name’ they may well have got something wrong, so at all times some form of professional scepticism is required.

• Effect sizes and whether they can be reliably converted into some kind of more interpretable metric may be neither here nor there. What matters is whether there is a causal relationship between intervention X and outcome Y and what are the support factors necessary for that causal relationship to work, (Kvernbekk 2015).

• Given the importance that teachers and school leaders give to sources of evidence other than research – say from colleagues and other schools – when making decisions, then we probably need to spend more time helping teachers and school leaders engage in critical yet constructive appraisal of the practical reasoning of colleagues.

• Any of us involved in trying to support the use of evidence in bringing about school improvement may need to be a little more honest with our colleagues. Well if not a little more honest, maybe we need to show them a little more professional respect. Let’s no longer try and turn the complex process of education into overly simplistic measures of learning just because those same measures are easy to communicate and interpret. Let’s be upfront with colleagues and say – this stuff is not simple, is not easy, and there are no off-the shelf answers, and when using research it’s going to take extremely hard work to make a real difference to pupils’ learning – and you know what – it’ll probably not be that easy to measure

And finally

It’s worth remembering no matter what precautions you take when trying to convert an effect size into something more understandable, this does not take away any of the problems associated with effect sizes in themselves. See (Simpson 2018) for an extended discussion of these issues.

References

Baird, Matthew D, and John F Pane. 2019. “Translating Standardized Effects of Education Programs Into More Interpretable Metrics.” Educational Researcher 48(4): 217–28. https://doi.org/10.3102/0013189X19848729.

Hattie, J. A. 2008. Visible Learning. London: Routledge

Higgins, S., Katsipataki, M., Coleman, R., Henderson, P., Major, L. and Coe, R. (2015). The Sutton Trust-Education Endowment Foundation Teaching and Learning Toolkit. London. Education Endownment Foundation.

Kvernbekk, Tone. 2015. Evidence-Based Practice in Education: Functions of Evidence and Causal Presuppositions. Routledge.

Simpson, Adrian. 2018. “Princesses Are Bigger than Elephants: Effect Size as a Category Error in Evidence‐based Education.” British Educational Research Journal 44(5): 897–913.