How We Test AI Companion Apps
Every app we review goes through the same rigorous testing protocol — 8 dimensions, 40+ specific tests, and hours of hands-on use. No AI-generated summaries. No paraphrasing other reviews. Just real testing, consistently applied.
What makes our testing different
Same tests, every app
We apply identical test scenarios to every app — the Marble memory test, the emotional response test, the photo consistency test. This makes scores comparable, not subjective.
Hours, not minutes
Each app gets at least 2 hours of active use with 50+ messages. We don't test for 5 minutes and write a "review." Real testing finds the edges.
Honest about our own app
Kissable is scored on the same rubric as every competitor. If a competitor genuinely does something better, we say so — and explain why.
The 8 Dimensions We Score
Every app receives a weighted score across these dimensions, producing an overall rating out of 10.
Conversation Quality
25%We measure response depth, emotional intelligence, roleplay capability, personality consistency, and how well the AI initiates and sustains conversations.
Memory & Continuity
20%We test how well the AI remembers facts, emotions, and shared experiences — both within a single conversation and across sessions.
Visual Features
15%We evaluate photo generation quality, consistency of the companion's appearance across images, customizability, and how well the AI understands specific photo requests.
Voice Features
10%We test voice calls for naturalness and latency, voice messages for emotional expressiveness, and the range of voice customization options.
Pricing & Value
10%We verify pricing, assess what the free tier actually includes, evaluate how aggressive the paywall is, and judge whether the subscription offers fair value.
UI/UX & Onboarding
10%We measure how many steps it takes to start chatting, evaluate the app's performance and stability, and assess how intuitive the interface is.
Safety & Privacy
5%We read privacy policies, test content filtering boundaries, and evaluate what data controls users have — including deletion and export options.
Platform & Availability
5%We note which platforms each app is available on, whether it syncs across devices, and if any offline functionality exists.
Scoring Rubric
We use a consistent 1-5 scale across all criteria. Here's what each score means:
| Score | Label | What It Means |
|---|---|---|
| 5/5 | Excellent | Best in class. No meaningful shortcomings. |
| 4/5 | Very Good | Strong with minor, situational limitations. |
| 3/5 | Good | Solid overall but with notable gaps in certain scenarios. |
| 2/5 | Below Average | Functional but significantly limited compared to peers. |
| 1/5 | Poor | Barely functional or essentially absent. |
The same rubric applies to every app — including Kissable. If our own app scores lower than a competitor on a specific criterion, we say so in the review.
When We Retest
Every 90 Days
We re-verify pricing, features, and platform availability quarterly. App stores and pricing pages change fast.
On Major Updates
When an app ships a significant feature (photos, voice calls, memory overhaul), we retest and update the review.
When Users Tell Us
If our readers report an experience that doesn't match our review, we investigate and update — for any app, including our own.
See our methodology in action
Browse our latest comparisons and guides — every one follows the protocol above. Best on iPhone.
Last updated: May 2026