Articles

July 3, 2025

Addressing flaws in standard testing methodologies with AI-powered playtesting

Even with careful planning, a solid setup, and thorough analysis, many mobile game playtests deliver results that are inconclusive or, worse, misleading. There are no breakthrough insights, no retention lift, and no clarity. Instead, studios are left with more questions than answers

This all-too-common story isn’t always about execution. The problem often lies in the systemic biases baked into traditional playtesting. These hidden constraints quietly distort and narrow what you can actually learn.

In this article, we’ll discuss why traditional playtesting is ineffective and what you can do to overcome its limitations.

Traditional playtesting often falls short

Planning game updates on surveys alone is risky. Test players often rush through the session for rewards, and even sincere feedback can fail to capture the deeper emotional experiences. Psychophysiology shows why.

#1 Declarative feedback is fragmented, as people tend to forget even important details

People’s tendency to forget, known as recall bias, results in patchy feedback. Playtesters may simply forget to mention critical parts of the gameplay that helped them build confidence and progress further. Because of that, developers risk removing or altering those key elements. Such changes can unintentionally make the game confusing and kill retention.

#2 What players remember most isn’t always what matters most

Due to a psychological phenomenon called recency bias, playtesters often highlight certain gameplay elements not because they’re the best, but simply because they’re the latest new thing they got their hands on. As a result, developers may go all in on strengthening those while sidelining the main features that make the game enjoyable.

“Players are so busy playing the game, they don’t always form memories about things that are not immediately relevant to the goal of the game.”

— Jesse Schell, The Art of Game Design: A Book of Lenses, 1st ed.

#3 With most answers shaped to sound more “acceptable,” it’s easy to chase the version of the player they want to be, not the one actually playing your game

And there’s another feedback distorter: social desirability bias. Just because players say they care about rankings doesn’t mean, say, a new meta layer like a competitive leaderboard will actually lift up live players’ engagement (especially in hyper-casual games, where most players are there for escapism, not competition). People naturally want to sound more goal-driven than they really are.

Why emotion-driven AI playtesting changes everything

As people forget, sugar-coat, or say what sounds right, self-reported feedback is a shaky foundation for game decisions, even when analyzing a large respondent group. Traditional testing distorts genuine feedback by looking at it through the filters of self-awareness and verbal processing. That’s why reading the raw emotional data players can’t manipulate is key to measuring the true sentiment behind their in-game experiences.

Sensemitter’s emotion-driven AI playtesting uses the science of facial expression identification and the Facial Action Coding System (FACS) to capture and analyze players’ authentic emotional responses in real time.

We apply a stack of pre-trained neural networks (NNs), each handling a specific task like face detection, gaze estimation, or emotion classification across seven emotions: joy, surprise, anger, fear, sadness, disgust, and a neutral state.

The distinct value of our approach unfolds after NNs gather data. We’ve built and continuously refine proprietary post-processing algorithms designed to:

Filter out noise and false readings
Correct inconsistencies
Aggregate raw data into metrics

Over time, we’ve accumulated many rules based on thousands of real research cases to make the results as precise as possible. In contrast to most available playtesting solutions, where AI stops at summarizing manually-extracted insights, ours goes further, laying a solid foundation for game refinement decision-making.

During pre-production, our AI game testing solutions are used to test settings and styles, choose the most appealing characters, and refine the storyline and narrative. For launched titles, the possibilities range from new mechanics and tutorial analysis to monetization analysis.

The extracted insights are very easy to grasp and actually put to work (more on that in a bit). You can playtest your own game, see how you stack up against direct competitors, or even deconstruct the top-performing mechanics from the app store leaders.

Automated AI playtesting: how to conduct one?

To make the difference from traditional playtesting even clearer, we’ve laid out the full process behind our AI-powered game testing.

Getting AI-powered playtesting right

Gameplay is full of moving parts, too many to rely on surface-level checks. To uncover what really drives engagement or causes drop-offs, you need a focused, detailed analysis.

Our sample size in qualitative tests is up to 12 participants — this allows us to get reliable results quickly while keeping the cost reasonable for the client. We carefully select audience based on the criteria that matter most to the client, such as location, age, gender, game genre preferences, and more.

Here’s what a playtesting session looks like in practice:

Players go through the game on their own: exploring, reading dialogues, and progressing through levels. Moderators don’t interfere. This way, we capture authentic behavior without influencing it.
A custom-built plugin records screen activity and captures the player’s face, tracking facial expressions, head pose, gaze, emotions, and more.
We then thoroughly examine each screen of interest, from character control to shop navigation, to identify what players understood during their experience and where they encountered difficulties.
Using facial recognition, we map player emotions across the session and convert them into three key metrics: arousal, valence, and focus.
The playtesting results are illustrated on the clear dashboards backed by practical recommendations on what should be fixed.

Understanding and applying the results

The dashboard visualizes player engagement over time through what we call the Interest Curve. It’s built on two neurophysiological markers:

Arousal, measuring how intense the emotion is
Valence, showing whether the player’s engagement is positive or negative

We recently introduced a third metric called Focus. It serves as an “attention meter” that shows how concentrated a player is during key interactions, layering more context onto emotional analytics.

Now, an important note: we never look at these metrics in isolation. The power lies in how they interact.

Take Valence. It’s tempting to think a positive Valence is good and a negative one is bad. But not always. You need to see it alongside Arousal. A player might show high arousal but negative valence, as it always happens in gross-satisfying games like pimple poppers. The emotion is disgust (negative valence), but curiosity keeps players hooked (high arousal).

Focus metric is also analyzed alongside Arousal to reveal how *intensely* players concentrate during specific gameplay moments.

If it all sounds a bit advanced, don’t worry. You don’t need to be a neuroscientist to act on it. The dashboard is accompanied by clear, actionable recommendations.

Our client's success story

Does it actually work? Absolutely. Here’s our client’s story.

After testing the first play session in a turn-based strategy game, we discovered that players were experiencing cognitive overload. Too much visual noise made it hard to focus on the gameplay itself.

Based on the test results, our recommendation was to strip away unnecessary elements, namely decorative bushes, which boosted the game’s appeal and led to a 5% increase in player retention.

Subscribe to our Newsletters

Fill the form and BidMachine manager will contact you soon

Sensemitter welcomes Dr. John Hopson as VP of Research

Seven psychology-inspired tips for perfect video game design

A year of milestones and innovation

Start Now

Fill the form and BidMachine manager will contact you soon

Sensemitter Services LTD. Spyrou Araouzou, 2, Faysa House,
2nd floor, 201, 3036, Limassol, Cyprus

Addressing flaws in standard testing methodologies with AI-powered playtesting

Traditional playtesting often falls short

Why emotion-driven AI playtesting changes everything

Automated AI playtesting: how to conduct one?

Getting AI-powered playtesting right

Understanding and applying the results

Our client's success story

Subscribe to our Newsletters

Thank you for subscribing!

Related articles:

Sensemitter welcomes Dr. John Hopson as VP of Research

Seven psychology-inspired tips for perfect video game design

A year of milestones and innovation

Start Now

Thank you! Our manager will contact you soon.

Addressing flaws in standard testing methodologies with AI-powered playtesting

Traditional playtesting often falls short

Why emotion-driven AI playtesting changes everything

Automated AI playtesting: how to conduct one?

Getting AI-powered playtesting right

Understanding and applying the results

Our client's success story

Subscribe to our Newsletters

Thank you for subscribing!

Related articles:

Sensemitter welcomes Dr. John Hopson as VP of Research

Seven psychology-inspired tips for perfect video game design

A year of milestones and innovation

Start Now

Thank you! Our manager will contact you soon.

Sensemitter welcomes Dr. John Hopson as VP of Research