Beyond Streaks and Goals

How product narrative shapes user identity and behavior

Why product narrative outlasts goals in driving behavior change, the four conditions the research keeps pointing to, and where the evidence pushed back on what we walked in believing.

Behavioral DesignMay 202624 min read

Your user downloads a budgeting app on a Sunday night. They make it through onboarding, connect a bank account, and feel a small wave of resolve looking at the first screen. They have a plan. Two weeks later, they have not opened the app once. The app is not broken. They still want to save money. They quit because the product never made them feel like a saver, and without that feeling, the app was just another thing on their phone asking for their attention.

You have watched this happen in your data. You have probably already shipped a re-engagement notification to bring them back. You have A/B tested the onboarding flow three times. None of it worked. The reason none of it worked is that you have been treating the wrong problem. The user did not quit because of friction. They quit because your product asked them to spend money differently, and never asked them to become someone different. It might sound like marketing-speak, but it is not. It is a measurable design failure with a research-backed mechanism, and you have been competing against friction this whole time when the real problem was identity.

This article is about what you should be competing against instead. The research keeps pointing at four themes, which this article will address.

For sustained behavior change

01

Identity congruence.

02

Context-cued automaticity.

03

Autonomous motivation.

04

Identity transformation.

All four.  Or it doesn’t hold.

Why willpower-driven products fail

Before we walk through the four conditions, it is worth being clear about what they are an alternative to. Most behavior change products are running on willpower, and willpower is the wrong engine for the job.

Self-control is not a personality trait. It is a resource, and it depletes through the day. By the time your user has finished work, fed their kids, and sat down on the couch, the self-control they would need to log a meal or hit a meditation streak is mostly used up.

This is not opinion. It is one of the most replicated findings in behavioral psychology, and the study that matters most to product teams came from Wilhelm Hofmann and Roy Baumeister.

They studied

205

adults across 7,827 real-time self-reports of desire and resistance

They found

depletes

willpower wanes hour by hour; by evening, resistance fails far more often

Source

Hofmann, Baumeister, Förster & Vohs (2012)

Journal of Personality and Social Psychology, 102(6)

The headline finding is the one that should keep you up at night: by evening, your user’s ability to resist any kind of urge is sharply lower than it was that morning. The urges people fought most often were not the ones you might expect. They were urges for sleep and leisure, not for tobacco or alcohol. Tobacco and alcohol were actually among the urges people experienced least often. The popular image of an addict fighting one big craving has it backwards. The hardest urges to resist are the ones that feel most ordinary, and the moment of weakness is usually 9 p.m., not 9 a.m.

Now look at your product. When does the engagement notification fire? When does the streak threaten to break? When does the user have to log the meal, hit the meditation goal, complete the lesson? If your answer is “evening,” you are running on willpower at the exact moment your user has the least of it. That is not a notification timing problem you can solve by moving the prompt from 9 p.m. to 7 p.m. That is a structural problem: you are asking the user for willpower, and the only fix is to stop asking for willpower altogether.

If your product runs on willpower, the user has already lost the fight by the time they open it.

So if not willpower, then what? The four conditions named at the top of the article are what the research keeps converging on. Each one is a separate mechanism. Each one is doing different work. And the durability of any behavior change your product produces is bounded by the weakest of the four.

01

Identity congruence

The behavior you are asking for has to fit who the user is becoming inside your product. People act in line with who they think they are. If the action feels out of character for the person they are becoming, they will not do it.

Oyserman, identity-based motivation

02

Context-cued automaticity

The behavior gets tied to a stable cue: a time of day, a place, or a ritual that comes before it. Once the cue runs the behavior, willpower is no longer in the loop.

Wood, habits in everyday life

03

Autonomous motivation

The user has to feel like they are choosing the behavior themselves, not being pushed into it. The version of the behavior that runs on autonomy outlasts the one that runs on guilt by years.

Deci & Ryan, self-determination theory

04

Identity transformation

The user’s sense of who they are has to actually shift, not just their stated intentions. Once it shifts, the cognitive cost of the behavior collapses.

Caldwell et al., Maintain IT model

Identity congruence: the user must see themselves becoming the kind of person who does the behavior

The same person who could not summon willpower for your budgeting app at 10 p.m. will, at 6 a.m. on a rainy Saturday, put on a pair of shoes and run three miles. They are not magically more disciplined at dawn. Something else is doing the work.

What is doing the work is the way they see themselves. The runner runs because they are a runner. That is not a tagline; it is the actual mechanism. They are not deciding to run any more than you are deciding to brush your teeth. The behavior is downstream of the identity, and the identity does not require willpower to maintain.

Daphna Oyserman’s identity-based motivation research is the source most product teams have not read but should. Her finding, across two decades of studies, is that people do what feels consistent with who they think they are, and find it almost impossibly hard to do anything that does not. The implication for your product is sharper than it sounds. Your user does not yet see themselves as a saver. So every act of saving feels expensive: the user has to spend real effort now in the hope of becoming someone different later. That gap between who they are now and who they hope to become is the design problem. Until your product helps the user feel like a saver while they are using it, every save costs willpower.

The trap most product teams fall into is treating identity as marketing copy. Slogans that name the identity do not produce identity congruence. The word “saver” on a hero image is doing nothing. The product has to give the user moments where being the saver is easier than not being the saver. That moment might be a default that fits the saver-self, a confirmation message after a small action, or a small win that the system reflects back as evidence that someone like them is the kind of person who saves. That is what identity congruence looks like in product. It is rarely a single feature. It is almost always a thousand small details across the product that quietly agree about who the user is.

And the product narrative is doing this work whether you intended it or not. The avatars, the default settings, the empty states, the milestone language, the way the system addresses the user: all of it is telling the user who they are while they use the product. Some teams shape this deliberately. Most do it by accident. Either way, your product is making a proposal to the user about who they will become if they keep using it.

This is where most products quietly fail. The identity your product is shaping, whether you meant to or not, has to match the identity the user is actually willing to step into. When those two are aligned, the behavior feels natural and the product earns durability. When they are misaligned, every screen becomes a small friction the user cannot quite name. They do not say “this product is asking me to be someone I am not.” They just stop opening the app. That misalignment is one of the most common causes of long-term churn, and it does not show up in any of the metrics most product teams trust. Onboarding completion looks fine. Week-one engagement looks fine. The damage shows up much later, as a retention curve that flatlines for no reason your funnel can explain.

So the choice is not whether your product does identity work. The choice is whether you are doing it on purpose, and whether the identity you are handing the user is one they are actually willing to accept.

Your product hands the user an identity whether you meant to or not. If they will not accept it, no amount of engagement work makes up for the mismatch.

Context-cued automaticity: the cue does the work, not willpower

You do not decide to brush your teeth in the morning. You walk into the bathroom and the brush is in your hand. The cue is the bathroom, the time of day, and the act that came right before, all of it triggering the behavior at once. Willpower never enters the loop, and the brushing happens on its own.

Wendy Wood’s research at USC put a number on this. About 43% of daily behavior runs this way: cued by the environment, executed without conscious decision. That percentage is the upper limit on how much of the user’s behavior your product can move from willpower to autopilot. Every behavior you can move from goal-driven to context-driven is a behavior that survives a bad day, a depleted evening, and an abandoned New Year’s resolution.

Automaticity, in plain English

Behavior that happens without conscious decision because a cue triggers it. Wendy Wood’s 2002 research found that about 43% of daily behavior runs this way. The two giveaways are: it runs from a recurring cue (location, time, the act before it), and it does not care what your goals say in the moment. Brushing your teeth is automatic. Choosing to floss is not. The first one happens on its own. The second one stays a New Year’s resolution.

Context cue, in plain English

A repeating signal in the environment that triggers the behavior without you thinking about it. The cue might be the moment after your morning coffee, the sight of running shoes by the door, or the act of opening your laptop. The cue is what lets the habit run on its own, instead of running on willpower. If you cannot point to the cue your product is training, you do not have a habit-formation feature. You have a notification.

The reason most product teams get this wrong is that they have absorbed a piece of folk wisdom that turned out to be invented. You have heard the “it takes 21 days to form a habit” line approximately 4,000 times. Productivity blogs say it. Onboarding decks say it. Maybe one of your meetings has said it this quarter. It is not true.

The 21-day claim traces to a single anecdote in a 1960 self-help book by a plastic surgeon who noticed that his patients took roughly that long to adjust to their new face. It was not a study or a sample. It was a single observation that got copied into a different field and treated as a fact for sixty years. The actual research on habit formation came from Phillippa Lally and her colleagues at University College London in 2010, and the numbers are very different from what you have been told.

Days to reach automaticity

18
66
254
Fastest
Average
Slowest

96 volunteers tracked daily for 12 weeks. The fastest hit automaticity in 18 days. The slowest took 254. The average was 66. And missing one day along the way did not affect the outcome at all.

Source: Lally, van Jaarsveld, Potts & Wardle (2010), European Journal of Social Psychology, n=96

If your product punishes a missed day, the data is not on your side

Lally’s data is explicit on this. Missing a single opportunity did not affect habit formation. The user who broke a streak on day 27 had not lost the habit. Streaks that catastrophize one missed day are working against the research, not with it.

The implication for your product is concrete. If you cannot name the cue you are training, you do not have a habit-formation feature. The cue is the load-bearing piece of automaticity. It might be a time of day, a physical location, or the act that comes immediately before. Once the cue is doing the work, the behavior survives a bad week. When the cue is missing, the behavior dies the moment willpower runs out.

In practical product terms, this means designing for the moment, not the goal. Most products ask the user to commit to a behavior in the abstract, like “log every meal” or “meditate daily,” which is exactly the kind of commitment that runs on willpower. The product that wants automaticity picks a specific cue and designs around it. That might mean asking the user to log meals only after dinner, not “sometime today.” It might mean designing the home screen so that opening the app is itself the cue that kicks off the next action. It might mean a notification timed to a real ritual the user already has, like sitting down with their morning coffee. None of these are clever features. They are decisions to anchor the behavior to something stable in the user’s life, and they are the difference between a product that survives the user’s bad weeks and a product that gets uninstalled the first time the user is too tired to remember it.

Autonomous motivation: the user has to choose the behavior themselves

Watch what happens to a user who keeps a 90-day streak going on a habit app, and then misses one day. Most of them do not return. The behavior was never theirs. It belonged to the streak. The moment the streak broke, the engine that had been driving them broke with it.

This is the canonical failure mode of broken autonomy. The user was not running the behavior; the pressure was running the user. Edward Deci and Richard Ryan’s self-determination theory has been the most heavily replicated motivation framework in psychology for forty years, and the core finding is uncomfortable for most product teams to read: behavior driven by external pressure dies the moment the pressure lets up.

Autonomous motivation, in plain English

Doing the thing because you want to, not because you feel pushed into it. From self-determination theory: when the user is acting out of their own values, the behavior lasts. When they are acting out of pressure, guilt, or external reward, the behavior dies the moment the pressure lets up. This is why streak-loss anxiety can hit week one’s metrics and tank year-one retention at the same time.

This is why the engagement metrics that look the best in week one are often the metrics that rot first in year one. Streak anxiety, loss-aversion notifications, daily-quota guilt: all of them work in the short term, all of them produce visible behavior change for a quarter, and almost all of them collapse the moment the user has a reasonable excuse to stop. The behavior was never theirs. It belonged to your product’s pressure.

Autonomy in product is harder to design for than streaks because it does not look like a feature. It looks like the absence of pressure at the moments where most apps would lean in. It looks like a reset that happens silently instead of with a guilt notification. It looks like a milestone the system reflects back without asking the user to renew their commitment. It looks like default options that fit who the user has decided they want to become, instead of nudges trying to talk them into it.

The product that gets this right is, paradoxically, easier to leave and easier to come back to. Autonomy is the condition that makes the user’s return feel like a choice rather than an obligation. You can measure the difference. Pressure-driven engagement looks like a flat line of usage that drops off a cliff. Autonomous engagement looks like a wave with troughs, where each return is the user choosing the behavior again.

Pressure-driven engagement looks great in week one. Autonomy looks like a user who chose to come back.

Identity transformation: when the user’s sense of who they are actually changes

The first three conditions are what set up the fourth. Identity congruence makes the behavior feel possible for the user. Context-cued automaticity makes it routine. Autonomous motivation keeps the user choosing it. But the durability of the behavior, the part that lets your product’s job get easier over time instead of harder, only kicks in once the user’s sense of who they are has actually shifted. That shift is identity transformation. It is the rare and structurally important outcome the research keeps describing as the goal.

Cleo Caldwell’s Maintain IT model, drawing on long-term behavior change research, points at a specific window. The first 60 to 90 days is when identity work is doing the heaviest lifting. After that, if it has worked, the user’s self-concept has shifted. The cue has stabilized. The behavior runs itself. From that point on, your product’s job is to do less, not more. The user who has become the runner does not need your motivational notification. They have running shoes by the door.

The grim part of this for most behavior-change products is the math. AppsFlyer’s 2025 report on 1.3 billion installs found that 46.1% of installed mobile apps are uninstalled inside the first 30 days. Almost half of any cohort never reaches the lower bound of habit formation, let alone the average. Your 60-to-90-day window is fighting against an uninstall curve that is at its steepest before you have done any of the heavy identity work.

Two implications follow. First, the design problem you are actually solving is not how to keep the user engaged for 21 days. It is how to keep them engaged for 60 to 90, with most of your cohort sitting on the longer half of the curve. The streak mechanic that you built and that probably feels essential is, on the research, doing harm in this window. The user who misses Tuesday is not less likely to form the habit. They are more likely to feel ashamed and uninstall.

Second, once the identity has taken hold, the appropriate move is the opposite of what most engagement playbooks recommend. Stop pushing. Keep showing up, keep reflecting back the identity the user has stepped into, but trust the cue you helped build. Doubling down on engagement features after identity transformation is asking a runner to please remember to put on their shoes.

Once the identity takes hold, your product’s job is to back off, not double down.

All four have to be there. A product that names the identity in copy but never builds the cue, never gives the user any sense of choosing the behavior on their own, and never produces a real shift in self-concept will get a polite tap and an uninstall. So will a product that builds context cues beautifully but hands the user an identity they do not actually want to step into. The four conditions are not a menu. They are a stack, and the stack is only as durable as its weakest layer.

Identity is necessary but not sufficient. The whole stack has to be coherent.

What products look like when all four conditions are present

The cleanest evidence on identity-driven product design comes from a small group of products that have actually published the numbers. They are not all perfect. The causal story is not always airtight. But the pattern across them is the same, and the pattern is not what most product teams would guess. Each of the four cases below is doing more than one of the four conditions at the same time, which is what makes them work. Read them with the four conditions in mind: which ones can you spot in each?

Apple Watch + Vitality

+34%

sustained increase in physical activity vs. matched controls

The rings are a daily identity unit. The user becomes “someone who closes their rings.” RAND Europe study, n=422,643 across UK, US, South Africa.

RAND / Vitality 2018

Apple Heart & Movement Study

140k+

participants in the activity-rings analysis

Frequent ring-closers were 48% less likely to have poor sleep quality and 73% less likely to have elevated resting heart rates than infrequent closers. Correlational, not causal. Still, the pattern is hard to ignore.

Apple, Brigham & Women’s Hospital, AHA, 2025

Strava kudos network

caused

runners to run more often (longitudinal network study)

Getting kudos on a recorded run caused the runner to run more often, and athletes drifted toward the behavior of their kudos-friends. This is identity reinforcement at the peer-recognition layer, with measurable downstream behavior.

Social Networks (2022)

Walton & Cohen, belonging intervention

4 yrs

of GPA gains from one well-timed identity moment

A brief identity-reframing intervention raised African-American students’ GPAs across the entire four-year college career. n=92. Translation for product: a single well-placed moment can run for years.

Science (2011)

The thing all four of these have in common is not a feature. It is that each one hands the user a clear identity to step into. Apple gave the user a ring to close. Strava gave the user an athlete identity, confirmed every time another athlete tapped a kudos. Walton and Cohen gave their students a self-concept of belonging that the system reinforced rather than fought. The features are scaffolding, and the identity is what those features are scaffolding for. The Apple rings carry identity congruence and a daily context cue at the same time. The Strava kudos network adds autonomous motivation, because each kudos is the user’s peers reflecting an identity the user already chose. Walton and Cohen’s belonging intervention is identity transformation at its purest: a single moment that runs for four years.

The thing they do not have in common, and this is the part most product teams will not want to hear, is a streak. Three of the four products on this list do not punish missed days. The Apple rings reset every morning without penalty. Walton and Cohen’s intervention is a single moment, not a maintained chain. Strava’s kudos system rewards what you did, not what you committed to do. The streak is the thing the industry has agreed is the answer. The research is pointing somewhere else.

A ring is an identity unit. A streak is a feature. The research backs the first one.

Where the research pushes back

The article has been writing as if the four conditions, taken together, produce something close to effortless behavior. The phrases we have used to describe what happens after identity transformation, like the behavior “running on its own,” the cognitive cost “collapsing,” and your product’s job “getting easier over time,” all imply the same promise. The research is more cautious than that, and pushes back in three ways. The article would be dishonest without naming each one.

The most repeated identity-language study did not replicate

In 2011, Bryan, Walton, Rogers and Dweck published a PNAS study showing that asking voters “how important is it to you to be a voter ” instead of “to vote ” meaningfully raised turnout. It got cited everywhere. In 2016, a much larger field replication by Gerber and colleagues did not find the effect. The original authors argued the replication used low-salience elections where the effect could not plausibly emerge.

What this means for your product: swapping “track your spending” for “become a saver” in your copy is not the magic move. Identity language is a real tool. It just does not work in isolation, and the rest of the system has to support it.

Replication, in plain English

When other researchers run the same study and check whether the original finding still holds. The last decade of psychology has been brutal on findings that everyone thought were settled. If a study has not been replicated independently, it is a hypothesis. If it has been replicated and failed, it is a hypothesis that needs more context. Either way, you should not bet your product on it.

The first thing the research pushed back on was the implied promise of effortlessness itself. Once identity has shifted, the cost of maintaining the behavior drops sharply, but it does not drop to zero. The mechanism moves the cost from effortful self-control, which depletes, to context-cued automaticity, which does not. That is a real and meaningful shift, but it is not the same as effortless. The behavior still has a cost, just a much smaller and more predictable one, and your product copy should not promise the user something the research will not back up.

The second thing it pushed back on was the universality of identity framing. Identity-narrative interventions are context-dependent in a way that is easy to miss. The Bryan-to-Gerber replication is the clearest example, but the same pattern shows up across the literature. Naming the identity is the easiest part of the work, and the part most likely to fail in isolation. Everything around it is the hard part.

The third thing it pushed back on was the “fresh start” mechanism. The 2014 Dai, Milkman and Riis study made “temporal landmarks” (a new year, a birthday, a new month) into a popular product-design move. A 2025 replication by Milyavskaya and colleagues found no significant effect on most goal characteristics when the analysis included goals of all ages, not just brand new ones. Translation for your product: the new-year energy you see in your January cohort is real, but the durability of that spike is much weaker than the original framing implied.

The honest synthesis is this. Identity is necessary but not sufficient. The product narrative is doing identity work whether you intended it or not. When the four conditions are all in place, the product earns durability. When only some of them are in place, willpower is still partially load-bearing, and the durability of the behavior matches whichever of the four is weakest. Your product is only as durable as its weakest condition.

The question worth sitting with

What identity is your product handing the user, whether you designed it that way or not?

The narrative inside your product is doing identity work either way. The avatar you give them, the language you use, the milestones you celebrate, and the things you call “success” in the empty state are all telling the user who they are while they use the product, and that self-concept follows them out of the session. The choice is not whether you do this work. The choice is whether you do it on purpose, with all four conditions in place.

The user who quit the budgeting app at the start of this article did not fail. Your product never gave them an identity worth stepping into, never trained the cue that would have let the behavior run on its own, never gave them the sense that any of this was their choice, and never produced the self-concept shift that would have made saving feel like the natural thing for someone like them to do. That is the design problem worth solving, and it is upstream of every retention metric you have on a dashboard.