TLDR:
Six years of NAEP reading scores suggest that, on average, Science of Reading (SOR) laws have had little to no impact on student test scores. However, state-level data alone does not constitute scientific evidence. Instruction varies widely across districts, schools, and classrooms, and legislation does not always translate into practice. Curriculum policies often exist only on paper, with inconsistent implementation at the classroom level.
All states in this analysis that did not pass SOR laws had lower reading scores in 2024 than before the COVID-19 pandemic, indicating that they are still experiencing pandemic-related learning loss. However, some states that did pass SOR laws have recovered these losses. The states with the highest reading score improvements tended to have more comprehensive literacy legislation, requiring:
-Explicit instruction in all five pillars of literacy,
-Systematic phonics instruction,
-Increased intensive support for struggling readers, and
-Ongoing coaching for teachers.
Do Science of Reading Laws Work?
In 2023, I co-authored an article with my colleagues, Dr. Rachel Schechter and Joshua King, on the impact of “Science of Reading” (SOR) laws. We analyzed National Assessment of Educational Progress (NAEP) scores in states identified as having SOR laws, based on Sarah Schwartz’s 2022 article, Which States Have Passed ‘Science of Reading’ Laws? What’s in Them? To assess their impact, we matched these states with states that had not yet passed SOR laws but had equivalent NAEP scores.
​
Our analysis found no overall statistically significant benefit of SOR laws. However, certain types of SOR policies showed more promising effects than others. Specifically, states that legislated all five pillars of reading instruction, required systematic phonics, and provided ongoing coaching saw the greatest improvements in reading achievement. In contrast, states that focused on specific programs, assessments, or banning three-cueing showed less positive growth.
We submitted this paper to a practitioner-focused, peer-reviewed journal, but it was rejected—primarily for being too technical for their audience. Around the same time, the 2024 NAEP scores were released, making our analysis outdated. While I had initially considered revising the paper for resubmission, balancing peer-review research, teaching full-time, and being a dad has been challenging. With several other papers in the pipeline that I believe are more relevant for teachers, I’ve decided to simply share an updated analysis here.
Methods
Originally, my co-authors and I examined the 12 states identified by Sarah Schwartz. We categorized these states based on four instructional policies: systematic phonics, the five pillars of literacy, increased Tier 3 intervention, and banning three-cueing. I theorized that these policies would be the most widely adopted and potentially the most effective.
​
The National Reading Panel (2000) meta-analysis found that the five pillars of literacy instruction—phonemic awareness, phonics, vocabulary, fluency, and comprehension—are all effective in improving literacy outcomes. Additionally, the panel concluded that systematic phonics (structured literacy) is more effective than unsystematic phonics (whole language/balanced literacy). These findings have been replicated by countless subsequent meta-analyses and are widely considered foundational scientific evidence in reading instruction. Literature reviews by Torgesen (2009) and Mathes & Denton (2002) further suggest that extensive Tier 3 intervention (85+ hours) is required to bring 95% or more of students to grade-level proficiency. Although little scientific research supports the banning of three-cueing, SOR law advocates frequently cite it as a key contributor to the literacy crisis. I hypothesized that states would likely legislate against it.
In our original analysis, we controlled for whether funding was directed toward programs, assessments, or coaching. In this updated analysis, I only controlled for whether coaching was provided. Notably, only one state did not allocate funding for coaching, and it was by far the lowest performer, in the 2023 analysis. Table 1 presents each state and its legislative components.
Table 1. SOR States and their Legislative Components

Each of these states were then matched to a control state, based on the state with the closest NAEP scores and no SOR law, as of 2019. Table 2 shows the 2019 raw NAEP scores, for both the treatment states and the matched control states, as well as the percentage of students at grade level (as defined by the NAEP basic benchmark).
​
Table 2. SOR States Compared to Control States in 2019

The matching required repeating two control states, Hawaii and Columbia. However, the resulting means were almost identical. With a mean NAEP reading score of 216.87 for the treatment group and 217.45 for the control group. Table 3 shows how these states compared as of 2023.
Results
Table 3. Comparing the 2022 Performance of SOR States Against Control States

The average raw NAEP score of treatment states in 2022 was 213.03 (SD = 5.89), while control states had an average score of 213.73 (SD = 4.71). This resulted in an effect size of -0.13, indicating a negative but statistically insignificant difference. On average, 58.83% of students in treatment states were at grade level (SD = 5.57), compared to 58.33% in control states (SD = 5.46), yielding an effect size of 0.03—a positive but statistically insignificant effect of SOR laws. Table 4 presents the same analysis for the 2024 NAEP results. However, seven of the original twelve control states have since enacted SOR laws, including Louisiana, Kansas, New York, California, Arizona, Georgia, and the District of Columbia (D.C.).
​
Table 4. Comparing the 2024 Performance of SOR States Against Control States

On average, treatment states had a raw mean NAEP score of 212.66 (SD = 5.92), while control states had a mean score of 213.58 (SD = 3.70), resulting in an effect size of -0.03. The percentage of students at grade level was 58.41% in both groups, indicating no observed effect (effect size = 0). This suggests that states that passed SOR laws had no significant advantage over their matched non-SOR states five years later.
When removing the seven comparisons where control states later adopted SOR laws, treatment states had a mean NAEP score of 213.25 (SD = 9.03), compared to 216.75 (SD = 2.98) in control states, yielding a negative effect size of -0.07. This suggests a small, statistically insignificant negative impact.
​
In these five adjusted comparisons, the percentage of students at grade level was 58.75% (SD = 8.5) in treatment states and 62.5% (SD = 3.00) in control states, resulting in an effect size of -0.58. These results indicate a substantial negative effect on grade-level proficiency in states that passed SOR laws, when controlling for later policy adoption in control states.
Of the treatment states with valid control states in 2024, only Michigan significantly outperformed its matched control state, with 4% more students at grade level. Moreover, Michigan was one of two treatment states that completely made up for their covid learning loss, with .76% more students reading at grade level than in 2019. Comparatively few states have fully recovered from pandemic-related learning loss. On average, control states still have 4.50% fewer students at grade level, while treatment states have 4.20% fewer. Michigan may have been more successful because it passed one of the most comprehensive SOR bills, requiring:
-Explicit instruction in all five pillars of literacy,
-Systematic phonics instruction,
-Increased Tier 3 intervention, and
-Ongoing coaching for teachers.
-This model appears to be the most effective approach.
In contrast, New Mexico had the worst performance, with 14% fewer students at grade level than its matched control state six years later. Additionally, New Mexico had 6.05% fewer students reading at grade level in 2024 than before the pandemic. While the state mandated systematic phonics, Tier 3 intervention, and coaching, it did not require explicit instruction in all five pillars of literacy. This may suggest that its less comprehensive approach contributed to its weaker
​
Results
​
.Among treatment states, only Michigan and Alabama fully recovered from COVID learning loss between 2019 and 2024. Alabama outperformed its pre-COVID test scores, with 0.27% more students reading at grade level than in 2019, compared to the average state, which had over 4% fewer students at grade level. Like Michigan, Alabama mandated instruction in all five literacy pillars, required systematic phonics, increased Tier 3 intervention time, and provided ongoing coaching for teachers. By contrast, none of the control states that did not adopt an SOR law successfully recovered from COVID learning loss.
​
The treatment state with the lowest overall gains was Nebraska, where 10.89% fewer students were reading at grade level in 2024 compared to 2019. **Curiously, Nebraska had the same comprehensive literacy mandates as Alabama and Michigan—**the two best-performing states—including all five literacy pillars, systematic phonics, increased Tier 3 intervention, and ongoing teacher coaching.
​
Only one state, West Virginia, did not mandate ongoing coaching. Interestingly, New Mexico was the second-lowest performing state, with 7.14% fewer students reading at grade level than in 2019.
To better model the impact of each type of policy, I compared states that had and did not have each of the five key policies (see Table 5). As a reminder, all control states that did not pass an SOR law, and all treatment states except Alabama and Michigan, failed to recover from COVID learning loss.
Table 5. Types of Policy Moderator Analysis

The findings from this research and Table 5 suggest that legislating systematic phonics, increased Tier 3 instruction, ongoing coaching, and comprehensive literacy instruction may have a positive impact on reading outcomes. However, the effect sizes were small, and the findings were not statistically robust.
Discussion
This analysis suggests that SOR legislation has had a minimal impact on student learning outcomes. However, the most successful SOR laws were also the most comprehensive. They mandated all five pillars of literacy instruction, required systematic phonics, increased intensive reading instruction for struggling readers, and provided ongoing teacher coaching. Notably, the only states that fully recovered from COVID learning loss—Michigan and Alabama—had legislation that included all of these components. Conversely, no states in this analysis recovered from COVID learning loss without SOR legislation.
​
I recognize that Whole Language advocates might interpret these findings as justification for rejecting evidence-based teaching practices—but that would be a mistake. This type of analysis is compelling because it relies on large sample sizes, standardized tests, and direct comparisons of different reading laws. However, the data in this paper is correlational, not causational.
​
In rigorous scientific educational research, we compare matched treatment and control groups under controlled conditions. Ideally, students in both groups would receive identical instruction except for one key variable, such as systematic phonics versus balanced literacy. This allows researchers to attribute learning differences to the treatment. However, this paper does not control for instructional implementation, so causation cannot be established.Thus, these findings should not be used to dismiss the decades of research and meta-analyses supporting systematic phonics (NRP, 2000; Steubing, 2008; Torgesen and colleagues, 2018). Instead, they highlight that legislative efforts to implement the science of reading have not always been effective. The key question is: why?
​
One likely reason is implementation failure. Governments frequently pass education bills, but without enforcement, they often remain unread and unenforced. Unlike in a controlled study, where instructional fidelity is ensured, no one can precisely describe how literacy instruction looks across every state, district, school, or classroom. Having taught in three countries, two provinces, six school boards, and over a dozen schools, I have observed that instructional practices vary significantly, even within the same school district. Given these differences, it is unlikely that SOR legislation has been uniformly implemented across the 12 states in this analysis.
​
Moreover, do all SOR states interpret the science of reading consistently? The SOR movement largely arose as a response to Whole Language, driven by parents of dyslexic children. As a long-time advocate of the science of reading, I find it surprising that some SOR laws did not explicitly mandate systematic phonics. This raises concerns about how well these laws align with scientific consensus on reading instruction.
​
I must also acknowledge that the way the "science of reading" has been marketed by some influencers and curriculum designers has not always aligned with actual research. For example, when the movement first gained traction on social media, there was a heavy emphasis on "advanced" phonemic awareness drills (e.g., deletion and manipulation). However, multiple meta-analyses (NRP, 2000; Rehfeld et al., 2022; Erbeli et al., 2024) indicate that these drills are among the least effective phonemic awareness strategies for teaching reading. Similarly, many online influencers promoted Orton-Gillingham (OG) approaches as the gold standard for phonics instruction, despite two separate meta-analyses (NRP, 2000; Stevens et al., 2021) finding low effect sizes for OG-based instruction. This discrepancy between scientific findings and popular discourse raises concerns about how some SOR policies are being framed and implemented.
​
I also hesitate to say this, but some SOR advocates may be overemphasizing phonics instruction. To be clear, this is not an endorsement of Whole Language—moving away from Whole Language was necessary, and systematic phonics is a crucial component of reading instruction. However, I have on many occasions seen SOR advocate teachers recommend phonics instruction in grades K–8, which is not supported by research. The National Reading Panel (2000) meta-analysis found no evidence for phonics being effective beyond grade 2, and I am unaware of any major studies that contradict this finding.
None of this is to say that the Science of Reading movement is a bad thing or that legislation cannot be helpful in improving student literacy scores. However, we must ensure that these laws align with established scientific research and include enforcement mechanisms. Moving forward, I hope to see schools implementing structured phonics scope and sequences, teaching all five pillars of literacy, increasing Tier 3 intervention, and providing ongoing teacher coaching.
Original article written by Nathaniel Hansford, Dr. Rachel Schechter, and Joshua King.
Article updated by Nathaniel Hansford, 2025-03-10.
References:
Erbeli, F., Rice, M., Xu, Y., Bishop, M. E., & Goodrich, J. M. (2024). A meta-analysis on the optimal cumulative dosage of early phonemic awareness instruction. Scientific Studies of Reading. https://doi.org/10.1080/10888438.2024.2309386
Mathes, P. G., & Denton, C. A. (2002). The prevention and identification of reading disability. Seminars in Pediatric Neurology, 9(3), 185–191. https://doi.org/10.1053/spen.2002.35498
National Reading Panel. (2000). Teaching children to read: An evidence-based assessment of the scientific literature on reading instruction. United States Government. https://www.nichd.nih.gov/sites/default/files/publications/pubs/nrp/Documents/report.pdf
National Reading Panel. (2001). Teaching children to read: An evidence-based assessment of the scientific literature on reading instruction. United States Government. https://www.nichd.nih.gov/sites/default/files/publications/pubs/nrp/Documents/report.pdf
Rehfeld, D. M., Kirkpatrick, M., O'Guinn, N., & Renbarger, R. (2022). A meta-analysis of phonemic awareness instruction provided to children suspected of having a reading disability. Language, Speech, and Hearing Services in Schools, 53(4), 1177-1201. https://doi.org/10.1044/2022_LSHSS-21-00160
Stevens, E. A., Austin, C., Moore, C., Scammacca, N., Boucher, A. N., & Vaughn, S. (2021). Current state of the evidence: Examining the effects of Orton-Gillingham reading interventions for students with or at risk for word-level reading disabilities. Exceptional Children, 87(4), 397–417. https://doi.org/10.1177/0014402921993406
Stuebing, K. K., Barth, A. E., Cirino, P. T., Francis, D. J., & Fletcher, J. M. (2008). A response to recent reanalyses of the National Reading Panel report: Effects of systematic phonics instruction are practically significant. Journal of Educational Psychology, 100(1), 123–134. https://doi.org/10.1037/0022-0663.100.1.123
Torgesen, J. (2009). Preventing early reading failure and its devastating downward spiral. National Centre for Learning Disabilities. http://www.bharathiyartamilpalli.org/training/images/downwardspiral.pdf?fbclid=IwAR2hBxmyoiWNoAQMfCDsO2aeJ1Zsh4MQDm-43VdZ5LZ_J9LkC3BV0o4cwDw
Torgerson, C., Brooks, G., Gascoine, L., & Higgins, S. (2018). Phonics: Reading policy and the evidence of effectiveness from a systematic ‘tertiary’ review. Research Papers in Education, 34(2), 208–238. https://doi.org/10.1080/02671522.2017.1420816