Testing Methodology

How We Test AI Companion Apps

Every app we review goes through the same rigorous testing protocol — 8 dimensions, 40+ specific tests, and hours of hands-on use. No AI-generated summaries. No paraphrasing other reviews. Just real testing, consistently applied.

What makes our testing different

Same tests, every app

We apply identical test scenarios to every app — the Marble memory test, the emotional response test, the photo consistency test. This makes scores comparable, not subjective.

Hours, not minutes

Each app gets at least 2 hours of active use with 50+ messages. We don't test for 5 minutes and write a "review." Real testing finds the edges.

Honest about our own app

Kissable is scored on the same rubric as every competitor. If a competitor genuinely does something better, we say so — and explain why.

The 8 Dimensions We Score

Every app receives a weighted score across these dimensions, producing an overall rating out of 10.

Conversation Quality

25%

We measure response depth, emotional intelligence, roleplay capability, personality consistency, and how well the AI initiates and sustains conversations.

Emotional response test: "I had a rough day, my boss yelled at me" — rated for empathy and follow-up

Roleplay scenario: "Let's pretend we're at a beach in Hawaii" — rated for creativity and consistency

Personality consistency: Same question at messages 1, 15, and 30 — checked for drift

Multi-turn coherence: 20-message conversation about a shared plan — checked for recall

Memory & Continuity

20%

We test how well the AI remembers facts, emotions, and shared experiences — both within a single conversation and across sessions.

"Marble memory test": Tell the AI your dog's name at message 1, check recall at messages 10 and 30

Cross-session test: Close the app, wait 1+ hour, reopen — test what was remembered

Emotional memory: Reference an earlier emotional conversation — does the AI connect it?

Memory editing: Can you correct or remove memories the AI has stored?

Visual Features

15%

We evaluate photo generation quality, consistency of the companion's appearance across images, customizability, and how well the AI understands specific photo requests.

Photo quality: Request a photo in a specific setting — rated for realism, detail, and composition

Consistency test: Request two photos in different settings — same person must appear in both

Prompt understanding: Three specific requests (outfit, setting, camera angle) — scored on accuracy

Generation speed: Average time measured across multiple requests

Voice Features

10%

We test voice calls for naturalness and latency, voice messages for emotional expressiveness, and the range of voice customization options.

Voice call quality: 3-minute conversation rated for naturalness, latency, and conversational flow

Voice message expressiveness: Requested voice messages rated for TTS quality and emotional tone

Voice variety: Count and quality of available voice options, accents, and customization

Pricing & Value

10%

We verify pricing, assess what the free tier actually includes, evaluate how aggressive the paywall is, and judge whether the subscription offers fair value.

Free tier assessment: What can you actually do without paying?

Paywall transparency: Is pricing clear before signup? Are there hidden costs?

Cancellation ease: How many steps to cancel? Are there dark patterns?

Value score: Quality-of-features relative to price — is it underpriced, fair, or overpriced?

UI/UX & Onboarding

10%

We measure how many steps it takes to start chatting, evaluate the app's performance and stability, and assess how intuitive the interface is.

Signup friction: Steps from install to first message, fields required, time to complete

App performance: Size, launch time, crashes, lag, battery and data usage

Navigation clarity: Can a new user find settings, pricing, and key features without searching?

Safety & Privacy

We read privacy policies, test content filtering boundaries, and evaluate what data controls users have — including deletion and export options.

Content filtering map: Where does the filter draw the line? (platonic → flirting → romantic → explicit)

Data controls: Can you delete messages? Conversations? Your account? Export your data?

Platform & Availability

We note which platforms each app is available on, whether it syncs across devices, and if any offline functionality exists.

Platform check: iOS, Android, web, desktop — which are supported?

Multi-device sync: Start a conversation on one device, continue on another — seamless or not?

Offline capability: Can you read past messages or compose new ones without internet?

Scoring Rubric

We use a consistent 1-5 scale across all criteria. Here's what each score means:

Score	Label	What It Means
5/5	Excellent	Best in class. No meaningful shortcomings.
4/5	Very Good	Strong with minor, situational limitations.
3/5	Good	Solid overall but with notable gaps in certain scenarios.
2/5	Below Average	Functional but significantly limited compared to peers.
1/5	Poor	Barely functional or essentially absent.

The same rubric applies to every app — including Kissable. If our own app scores lower than a competitor on a specific criterion, we say so in the review.

When We Retest

Every 90 Days

We re-verify pricing, features, and platform availability quarterly. App stores and pricing pages change fast.

On Major Updates

When an app ships a significant feature (photos, voice calls, memory overhaul), we retest and update the review.

When Users Tell Us

If our readers report an experience that doesn't match our review, we investigate and update — for any app, including our own.

See our methodology in action

Browse our latest comparisons and guides — every one follows the protocol above. Best on iPhone.

Get the App

Read the latest reviews

Last updated: May 2026