Training to Failure vs Reps in Reserve: What Research Shows

Summary Refalo et al. (2023) in Sports Medicine meta-analyzed 15 studies on proximity to failure and reported a small hypertrophy edge for sets taken closer to failure (effect size 0.15 to 0.21). Refalo et al. (2024) in the Journal of Sports Sciences randomized 26 resistance-trained adults to failure training or 1 to 2 reps in reserve for 8 weeks. Quadriceps hypertrophy was similar in both groups. Robinson et al. (2024) in Sports Medicine meta-regressed 55 hypertrophy and 67 strength studies. Closer proximity helped growth modestly. It did not help strength. Davies et al. (2016) in Sports Medicine pooled 8 trials and found training to failure did not improve strength gains. The practical rule: most hypertrophy sets at 1 to 3 RIR, most strength sets at 3 to 5 RIR. Save true failure for the last set of the last exercise, if at all.

Conceptual illustration contrasting maximum effort training with controlled effort training, showing two paths leading to similar muscle growth outcomes — Most hypertrophy gains come from being close to failure, not at failure. The trial evidence shows the gap between 1 RIR and 0 RIR is smaller than most lifters think.

Walk into any gym and you'll hear the same advice. "Push it. Last rep should be ugly. No pain, no gain." That advice is older than the research, and the research has caught up. Stopping a few reps short of failure builds almost as much muscle as grinding to failure, and for pure strength, leaving reps in the tank actually wins. The cost of grinding shows up in fatigue, recovery, and the next session's quality. The benefit of grinding, when it exists, is small.

This isn't a "failure is bad" article. Failure has its place. But the default mode for most working sets, in most weeks, in most programs, should be a calibrated stop short of failure. The data on this is unusually well-aligned across meta-analyses, randomized trials, and meta-regressions. So let's walk through what the studies actually measured.

The same body of research underpins our piece on whether light weights build muscle and the practical takeaways in our minimum effective dose piece. Effort matters, and so does where you spend it.

The Research: What Studies Show

Refalo 2023: The Meta-Analysis That Reframed the Debate

The cleanest synthesis of the evidence is Refalo, Helms, Trexler, Hamilton, and Fyfe (2023) in Sports Medicine. The team pooled 15 studies that compared training at different proximities to failure for muscle hypertrophy. The pooled effect favored closer-to-failure training, but the magnitude was small. Effect sizes landed between 0.15 and 0.21, depending on how the comparison was structured. To put that in context, going from 2 sets to 3 sets per exercise often yields a larger hypertrophy gain than going from 3 RIR to 0 RIR.

The authors framed this carefully. Closer to failure helps. The slope is not steep. And the relationship was not strictly linear. Some of the trials at very close proximity (0 to 1 RIR) showed diminishing returns, suggesting hypertrophy peaks somewhere in the 0 to 3 RIR band rather than improving monotonically as effort climbs.

Important caveat. Most of the included studies used trained males in their 20s and 30s. Effects in beginners, women, and older lifters could plausibly differ. The general direction of the finding has held up in subsequent work, but the precise magnitude is still being mapped.

Refalo 2024: The 8-Week Trial in Trained Adults

The most direct test came from Refalo, Nuckols, Galpin, Gallagher, Hamilton, and Fyfe (2024) in the Journal of Sports Sciences. The team randomized 26 resistance-trained adults to two unilateral leg conditions, so each participant trained one leg to momentary muscular failure and the other leg with 1 to 2 reps in reserve. Volume and load were equated across conditions. Training ran for 8 weeks.

The result. Quadriceps thickness increased similarly in both legs. The failure leg showed slightly higher reported fatigue and rate of perceived exertion, but no measurable hypertrophy advantage. For lifters already past the beginner stage, this is the cleanest within-subject evidence yet that 1 to 2 RIR is not leaving meaningful muscle on the table.

The within-subject design is what makes this study unusually strong. Genetics, recovery, sleep, nutrition, and motivation are all held constant when one person trains both legs. Differences across legs can only be explained by the training variable. That's a much harder bar than between-subjects designs, and the failure-vs-RIR contrast still didn't move.

Robinson 2024: The Meta-Regression That Split Strength From Size

The most important paper in this entire literature might be Robinson, Pelland, Remmert, Refalo, Jukic, Steele, and Zourdos (2024) in Sports Medicine. The team ran a series of meta-regressions on 55 hypertrophy studies and 67 strength studies, modeling proximity to failure as a continuous predictor rather than a binary "failure vs not failure" comparison. This is methodologically the right way to ask the question.

The hypertrophy regression showed a positive but flattening slope. Closer proximity helped growth. The slope tapered off past 2 RIR, consistent with Refalo's 2023 finding that the relationship is non-linear and that the high end of effort offers diminishing returns. Average gain across the proximity range was small.

The strength regression told a different story. Closer proximity to failure was associated with slightly smaller strength gains. Not statistically zero in some sub-analyses. Slightly negative. The authors interpreted this as fatigue interference. Failure-rich training compromises the velocity and quality of subsequent reps and sessions, and strength is more velocity-sensitive than size.

Practical implication. Hypertrophy and strength have separate optimal proximity zones. Hypertrophy lives at 0 to 3 RIR with a slight preference for the closer end. Strength lives at 3 to 5 RIR with a slight preference for the further end. The reason most "blanket" advice gets this wrong is that the two adaptations are different physiology. Same exercise, different stimulus, different optimal effort.

Davies 2016: The Earlier Strength Meta-Analysis

The Robinson finding was foreshadowed by Davies, Orr, Halaki, and Hackett (2016) in Sports Medicine. The team pooled 8 randomized trials comparing failure to non-failure training for muscular strength. The pooled effect favored neither. Some trials showed a tiny advantage for failure. Some showed an advantage for non-failure. The aggregate effect was statistically indistinguishable from zero.

This was the first major meta-analytic signal that the bro-science default was wrong on strength. Eight years later, with more data and better methods, the picture sharpened: failure isn't neutral for strength, it's slightly worse. But Davies got there first, and the conclusion has only firmed up since.

Sampson and Groeller 2016: One Underrated Trial

An interesting hypertrophy-specific trial: Sampson and Groeller (2016) in the Scandinavian Journal of Medicine and Science in Sports randomized 28 untrained men to elbow flexion training in three conditions. Failure with controlled tempo. Non-failure with controlled tempo. Failure with fast tempo. After 12 weeks, all three groups added similar elbow flexor cross-sectional area. The non-failure group showed equal gains to the failure group, despite stopping reps short.

The implication for newer lifters. You don't have to grind. You don't even have to chase the burn on every set. The structure of the program (load, sets, reps, frequency) does most of the work, and effort within a wide band gets similar outcomes.

Conceptual visualization of the reps-in-reserve scale showing different effort levels from RIR 5 down to RIR 0 with abstract progression markers — The RIR scale gauges effort by how many reps you could still do at the moment you stopped. The trial evidence converges around 1 to 3 RIR for hypertrophy and 3 to 5 RIR for strength.

Why This Matters for Your Training

Three practical implications fall out of the literature.

First, hypertrophy and strength need different proximity prescriptions. If you're chasing size, 1 to 3 RIR is the working zone. If you're chasing strength on big compound lifts, 3 to 5 RIR keeps bar speed and quality high enough that the next set isn't compromised. Lumping both adaptations into one effort prescription leaves gains on the table for whichever goal you actually have.

Second, the marginal benefit of going from 1 RIR to 0 RIR is small, and the marginal cost (fatigue, longer recovery, worse next session) is real. So the question is not "can I push to failure" but "is failure cheap enough to be worth it on this set." On a final set of curls, sure. On the third set of squats with two more exercises to come, no. Save your failure budget for movements where the recovery cost is low.

Third, RIR estimation is a skill you build. Steele and colleagues found new lifters routinely underestimated reps left by 4 to 5 when they thought they were close to failure. Translation: when a beginner says "1 RIR," they're often actually at 4 or 5 RIR. The fix is not to stop using RIR. It's to occasionally take an isolation set to true failure (low risk, high signal), then recalibrate. After a few weeks of doing this once every 4 to 6 weeks, RIR estimates tighten up. We covered the broader skill of training intuition in our piece on consistency over intensity: the lifters who keep showing up are the ones learning to read their own effort.

How Effort Drives Adaptation in Practice

The body adapts to a stimulus, not to suffering. The stimulus for hypertrophy is mechanical tension across enough effective reps. The stimulus for strength is high-quality, high-velocity practice with heavy loads. Failure is one way to reach those stimuli. It's not the only way, and it's not always the best way.

The Effective Reps Concept

Most hypertrophy researchers think of "effective reps" as the reps performed at high motor unit recruitment, which begins to engage as a set approaches failure. The last 5 to 6 reps of a set taken to failure are largely effective. Stopping at 1 RIR captures most of those reps. Stopping at 3 RIR captures fewer. Stopping at 6 RIR captures very few. The Refalo and Robinson data are consistent with this model: returns flatten past 2 to 3 RIR because most effective reps are already happening.

For strength, the model is different. Strength gains are velocity-dependent, and bar speed drops sharply in the last 1 to 3 reps before failure. The "effective" reps for strength are the ones still moving fast under heavy load. Going to failure means doing slow, ugly reps, which is almost the opposite of what strength training rewards.

Recovery Cost Compounds

One failure-rich session is not the issue. A program full of failure sessions is. Every failure set extends the recovery curve. Multiply across a 4-day-a-week program over months, and the cumulative fatigue starts compromising volume, frequency, and progression. We mapped the same dynamic in our piece on sleep and muscle growth: training stimulus is one input, recovery is the equal partner, and unsustainable effort levels collapse the whole system.

Practical RIR Targets

Hypertrophy work (8 to 15 rep range): 1 to 3 RIR. Last set of the last exercise can flirt with 0 if you feel like it.
Strength work (3 to 6 rep range): 3 to 5 RIR on most sets. Heavier singles and doubles can run closer, but rarely to failure.
Power and explosive work: well away from failure. Velocity is the variable, and failure kills velocity.
Isolation work for smaller muscles (calves, biceps, lateral raises): can run closer to failure (0 to 2 RIR) without much recovery cost.
Compound lifts that wreck you (squats, deadlifts, rows): stay further from failure, especially mid-program. Save aggressive efforts for testing weeks.

None of this requires a tracker. The discipline is mental. Stop the set when you have 1 to 3 clean reps left. Resist the gym-rat instinct to grind. The next session will be better, the cumulative volume will be higher, and the muscle will get built either way.

Enjoying the article?

Start a new fitness routine today, written by me and coached by the AI I designed.

Take the Free Assessment Free • 2 minutes • No credit card

Common Misconceptions

Misconception: "If you're not training to failure, you're not really training"

This one comes from old-school bodybuilding culture, and the data hasn't supported it for a decade. Refalo's 2023 meta-analysis put the failure-vs-non-failure hypertrophy edge at an effect size of 0.15 to 0.21. That's small. Sampson and Groeller's 2016 trial showed equal arm hypertrophy with non-failure work over 12 weeks. You're training. You're just not paying the recovery tax.

Misconception: "Failure is required to recruit all motor units"

Maximum motor unit recruitment happens before failure, not at failure. The last 5 to 6 reps of a set taken close to failure already recruit nearly all available motor units. Going from 1 RIR to 0 RIR adds maybe one more rep at full recruitment, and even that rep is degraded by accumulated fatigue. The diminishing returns Refalo and Robinson saw in the data make sense at the physiological level.

Misconception: "RIR is just guessing"

RIR estimation is a calibrated skill, not a guess. Trained lifters with experience using the scale tend to estimate within 1 to 2 reps of actual failure on big compound lifts and within 0 to 1 reps on isolation work. Helms and colleagues (2016) formalized the scale precisely because experienced coaches were already using it intuitively with good agreement. The fix for poor RIR estimation is more practice, not abandoning the framework.

What the Research Suggests Going Forward

A few honest caveats worth flagging.

First, almost all of the cited trials use young, predominantly male, resistance-trained subjects. The effect of proximity-to-failure on muscle gain in beginners, women, older adults, or people training under physical or psychological stress could plausibly differ. Most working hypotheses say the direction holds but the magnitude varies.

Second, "failure" itself is variably defined across studies. Some studies use volitional failure (subject decides). Some use technical failure (form breakdown). Some use velocity-loss thresholds. These aren't identical, and the differences matter at the margins. Robinson's 2024 meta-regression handled this by treating proximity as continuous, which is methodologically cleaner than the binary failure-vs-not-failure comparisons in older work.

Third, the literature is heavier on hypertrophy and strength than on other adaptations. Endurance, power, and motor-skill outcomes have not been mapped at the same density. Most of what's been published on power suggests staying well away from failure, but the sample of studies is smaller.

Fourth, individual variation in fatigue tolerance is real. A few lifters genuinely recover well from failure-rich training. Most do not. The way to know is to track strength and recovery over a multi-week training block at 0 RIR vs 2 RIR and see which one your physiology prefers. The data will tell you.

References

Refalo MC, Helms ER, Trexler ET, Hamilton DL, Fyfe JJ. "Influence of Resistance Training Proximity-to-Failure on Skeletal Muscle Hypertrophy: A Systematic Review with Meta-analysis." Sports Med. 2023;53(3):649-665. doi:10.1007/s40279-022-01784-y
Refalo MC, Nuckols G, Galpin AJ, Gallagher IJ, Hamilton DL, Fyfe JJ. "Similar muscle hypertrophy following eight weeks of resistance training to momentary muscular failure or with repetitions-in-reserve in resistance-trained individuals." J Sports Sci. 2024;42(10):908-919. doi:10.1080/02640414.2024.2321021
Robinson ZP, Pelland JC, Remmert JF, Refalo MC, Jukic I, Steele J, Zourdos MC. "Exploring the Dose-Response Relationship Between Estimated Resistance Training Proximity to Failure, Strength Gain, and Muscle Hypertrophy: A Series of Meta-Regressions." Sports Med. 2024;54(9):2209-2231. doi:10.1007/s40279-024-02047-8
Davies T, Orr R, Halaki M, Hackett D. "Effect of Training Leading to Repetition Failure on Muscular Strength: A Systematic Review and Meta-Analysis." Sports Med. 2016;46(4):487-502. doi:10.1007/s40279-015-0451-3
Sampson JA, Groeller H. "Is repetition failure critical for the development of muscle hypertrophy and strength?" Scand J Med Sci Sports. 2016;26(4):375-383. doi:10.1111/sms.12445
Helms ER, Cronin J, Storey A, Zourdos MC. "Application of the Repetitions in Reserve-Based Rating of Perceived Exertion Scale for Resistance Training." Strength Cond J. 2016;38(4):42-49. doi:10.1519/SSC.0000000000000218

Medical disclaimer: Resistance training to or near failure imposes real mechanical and metabolic stress on joints, tendons, and the cardiovascular system. The effect estimates discussed here come from healthy, predominantly young adult populations. This article is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare provider before starting any new exercise program, especially heavy resistance training, especially if you have cardiovascular disease, uncontrolled hypertension, recent cardiac symptoms, joint or tendon injuries, prior fragility fractures, recovery from surgery, are pregnant or postpartum, or have been sedentary for an extended period. Form quality matters more than effort proximity, and a qualified coach is the right resource if you are unsure how a given lift should look.

Frequently Asked Questions

Do you have to train to failure to build muscle?

No. Refalo and colleagues (2024) randomized 26 resistance-trained adults to either failure training or 1 to 2 reps in reserve for 8 weeks and found similar quadriceps hypertrophy in both groups. The 2023 meta-analysis by the same group across 15 studies showed only a small edge for closer-to-failure training (effect size 0.15 to 0.21). Stopping 1 to 3 reps short of failure produces nearly all of the muscle growth without the recovery cost.

What does reps in reserve (RIR) actually mean?

RIR is the number of reps you could still do if you kept going. RIR 0 means failure. RIR 1 means you could have done one more clean rep. RIR 3 means you stopped with three good reps left. The scale was formalized by Helms, Cronin, Storey, and Zourdos (2016) in the Strength and Conditioning Journal. It works as a self-rated effort gauge that calibrates over time. Most lifters underestimate proximity to failure when they first start using it.

Does training to failure make you stronger?

Not really. Davies, Orr, Halaki, and Hackett (2016) pooled 8 randomized trials and found no significant strength advantage for failure training. Robinson and colleagues (2024) meta-regressed 67 strength studies and found a slightly negative slope: closer proximity to failure was associated with smaller strength gains. The reason is fatigue. Failure-rich sets compromise the bar speed and quality of subsequent sets, which is what actually drives strength adaptations.

How close to failure should hypertrophy training be?

1 to 3 reps in reserve is the practical sweet spot. Robinson et al. (2024) found a positive but flattening slope: closer to failure helps growth, but the benefit is small past 2 RIR. Refalo's 2023 meta-analysis put the closer-to-failure edge at 0.15 to 0.21 effect-size units, smaller than the gain you get from adding a set per session. Most hypertrophy work belongs at 1 to 3 RIR. The last set of the last exercise can drift to 0 RIR if you want, but it does not have to.

Should beginners train to failure?

Beginners are usually worse at estimating RIR than they think. Steele and colleagues (2017) found novice trainees underestimated reps left by an average of 4 to 5 reps when they thought they were close to failure. The practical workaround is to take a few isolation sets to true failure occasionally, just to calibrate. Then keep most working sets at 1 to 3 RIR and rely on the load and reps progressing over time as the signal that the program is working.

Conceptual illustration showing two divergent paths for strength and muscle size, suggesting different effort prescriptions for each goal — Robinson 2024: hypertrophy benefits modestly from closer proximity to failure. Strength does not. The two adaptations live in different effort zones.