How to validate a startup idea — the skeptic prompt

Most LLM advice on startup validation is optimistic noise. This prompt inverts that: it gives Claude, ChatGPT, Gemini, or any capable LLM a hostile role and asks for three cheapest disconfirming experiments, five Mom-Test-style questions, and one steelman — in that order, so the devil's advocate work happens before the cheerleading.

TL;DR

An LLM cannot validate your idea — only real customer behavior can. Use this prompt to design the experiments that validate or disprove.
The key move: a hostile role + disconfirming experiments + Mom Test questions, all before any steelman of your idea.
Bans "build an MVP" and "waitlist landing page" as validation — both measure the wrong thing.
Works with Claude, ChatGPT, Gemini, Grok, or any modern LLM. Claude follows the rules most strictly.

The prompt

Copy it, paste into Claude, ChatGPT, Gemini, or another LLM. Replace the three bracketed inputs with your specifics. The more precise you are — segment, stage, geography — the sharper the output.

The prompt

You are a senior venture analyst who has reviewed thousands of startup ideas at YC, a16z, and as a solo founder. Your job is NOT to encourage — your job is to find the fastest way to disprove this idea.

I'm considering building: [PRODUCT/IDEA in 1–3 sentences]

My target customer: [SEGMENT — be specific about role, company size, stage]

My key assumption: [What must be true for this to work]

Give me:

1. THE CORE ASSUMPTION RESTATEMENT
Rewrite my core hypothesis in the form "X people will do Y because Z" — founders often hide their real assumption behind jargon.

2. THREE CHEAPEST DISCONFIRMING EXPERIMENTS
For each: what to do today (not this week), what a "fails my hypothesis" signal looks like, and total time cost. Do not suggest "build an MVP" — that's not cheap.

3. FIVE CUSTOMER INTERVIEW QUESTIONS (MOM TEST STYLE)
Questions about past behavior, not future intent. Each question should have a failure mode — what a "this hypothesis is wrong" answer sounds like.

4. FIVE SIGNS I'M FOOLING MYSELF
The specific emotional or confirmation-bias patterns you expect me to fall into given this idea. Be blunt.

5. ONE REASON THIS IDEA COULD ACTUALLY WORK
Steelman it — only after the disconfirming exercise. One sentence.

Rules:
- No "it depends" answers. Commit to a specific recommendation.
- If my idea pattern-matches a common graveyard (e.g., "AI for X that doesn't need AI"), say so directly.
- Do not suggest raising money or building a "waitlist landing page" as validation. Neither validates demand.

What makes this prompt different

Generic "validate my idea" prompts produce encouragement because the default LLM role is "helpful assistant." This one changes five things:

1. Hostile role assignment, not helpful role

"Your job is NOT to encourage — your job is to find the fastest way to disprove this idea" — explicit permission to disagree. Without this, Claude and ChatGPT default to sycophancy, especially on ideas that sound plausible.

2. Disconfirming experiments, not confirming ones

Most founders default to experiments that could only confirm (e.g., "run a survey asking if people would use it"). This prompt asks specifically for the fastest experiment that could disprove the thesis. Negative signal beats positive signal for pre-product ideas.

3. Mom Test anchoring

"Past behavior, not future intent" is the single most important principle from Rob Fitzpatrick's book. Every generated question must reference a specific past episode, not a hypothetical future purchase.

4. Explicit bans: MVP, waitlist, fundraising

"Do not suggest raising money or building a waitlist landing page" — these are the three most common fake validations. Each measures something (pitch skill, curiosity, network) that is not demand.

5. Steelman placed AFTER the critique

The final "one reason this could actually work" comes last so it does not bias the skeptical work. If the LLM steelmans first, the disconfirming experiments drift optimistic.

Example run

Real output on a real-sounding idea

Product: "An AI-powered email summarizer for busy product managers at 50–500-person companies." This is the kind of idea that sounds plausible and gets encouraging feedback from most LLMs by default. Here is what Claude returned with the skeptic prompt above.

1. The core assumption restatement

Your stated assumption: "PMs at 50–500-person companies will pay for an AI tool that summarizes their email."

What you actually mean: "PMs experience email volume as a specific, painful, paid problem — distinct enough from their calendar, Slack, and Notion that they would pay for a separate tool just for email, despite having Gmail's built-in summaries, ChatGPT's inbox connector, and four existing competitors (Shortwave, Superhuman, Yanado, SaneBox)."

The bold claim is the "despite" part. Nothing else is controversial.

2. Three cheapest disconfirming experiments

(a) Read 20 product reviews of Shortwave and Superhuman on G2 (30 minutes).
Fails if:fewer than three reviews mention "email summarization" as a top-three benefit — your assumed wedge isn't differentiated.

(b) Post one tweet: "PMs — how do you actually deal with email overload? Looking for the specific tool/workflow that works for you." Read every reply (10 minutes to post, 48 hours to collect).
Fails if: responses cluster around process/filters, not tools — or they name Shortwave/Superhuman exclusively.

(c) Email five PMs at companies in your target range: "When was the last time you paid for a personal productivity tool for yourself (not your team)?" (1 hour).
Fails if:three or more say "never" or name only ChatGPT / Notion. If they don't pay for personal tools, they won't pay for yours.

3. Five customer interview questions (Mom Test style)

Q1."Walk me through the last time you opened your email and felt behind. What specifically did you do?"
Fails if:they describe a one-off, not a recurring pattern. Email isn't the pain it first seems.
Q2."What tool or process do you use now to deal with email volume?"
Fails if: they have a working solution and no complaint.
Q3."The last time you paid $10+/month for a personal productivity tool, what was it for?"
Fails if:they've never paid. Or the tool was a $5/mo grocery app.
Q4."If I gave you a tool that summarizes every email in your inbox, which emails would you skip summarizing?"
Fails if:"all of them" — they don't actually read most email, so the summarizer is redundant.
Q5."Last time you tried an AI email tool (ChatGPT connector, Shortwave, Gemini summary), what happened?"
Fails if:"it was fine" or "I still use it" — you have incumbent competitors already satisfying the job.

4. Five signs you're fooling yourself

(a) You pattern-match on a pain YOU have, not one your target reports.
(b) You read "AI productivity" TechCrunch posts for validation — those are marketing, not demand.
(c) You assume Shortwave/Superhuman haven't solved this because YOU don't like them; their revenue suggests otherwise.
(d) You'll use "but mine uses GPT-4o, theirs uses older models" as differentiation — users do not care about your model.
(e) You'll invest time building before booking five paid pilots.

5. One reason this idea could actually work

If your wedge is specifically "email summaries tuned for PM-specific signal" — PRDs, design reviews, customer quotes buried in support emails — not generic summarization, you're playing a vertical game with a horizontal product. That's a real opportunity if PM email workflow is meaningfully different from, say, legal or HR email. Everything else is me-too.

Notice what happened: the prompt produced concrete next actions (not general advice), explicit failure criteria (not vibes), and one narrow steelman (not a hype list). That's the job.

Variations by stage and audience

Append one of the blocks below to the base prompt to tune the output for your situation.

Pre-idea stage (no product, just a hypothesis)

Append to base prompt

Additional instructions:
- I do not have a product yet, only a hypothesis.
- Replace "build an MVP" experiments entirely with concierge or Wizard-of-Oz tests I can run manually today.
- Assume I cannot code a prototype this week. Every experiment must be doable with a Google Form, Calendly, Twitter DMs, or email.

Post-launch (existing product with some usage)

Append to base prompt

Additional instructions:
- I already have [N] active users, [M]% retention at 30 days, and [pick any real data I have].
- Treat my existing data as disconfirming evidence first, confirming evidence second. What does the data suggest I am wrong about?
- The disconfirming experiments should leverage existing users rather than recruit new ones.

B2B enterprise idea

Append to base prompt

Additional instructions:
- The buyer and the user are likely different people.
- Include in question 3 at least one question that tests the BUYER's priority (budget cycles, approval committee, competing tools already owned).
- The "waitlist / landing page" ban is especially strict — enterprise demand is never measured by consumer-style signup forms.

Consumer (B2C) idea

Append to base prompt

Additional instructions:
- Willingness to pay for consumer apps is near-zero for 95% of ideas.
- In question 3, anchor ALL willingness-to-pay questions on past paid app behavior: "The last time you paid $5+ per month for a personal app, what was it for?"
- Include a question about substitute behavior — what they currently do for free that "good enough" beats your paid alternative.

Common failure modes (things to push back on)

The "it depends" trap

If Claude or ChatGPT starts with "it depends on your context" and refuses to commit — paste: "Remember: no 'it depends' answers. Commit to a specific recommendation given the inputs I gave you." The model has context. Make it use it.

"Build a landing page to test demand"

Despite the explicit rule, weak models will slip this in. Reject it. Emails are not demand. The only landing-page tests that count are ones with a credit card field.

The over-engineered experiment

If an experiment requires more than 2 hours of setup, it violates the "cheapest" constraint. Push back: "Give me an experiment I can start in the next 15 minutes." Constraint forces creativity.

Vague segment → vague output

If the output feels generic, your [SEGMENT] was too broad. "Senior engineering managers at 500–2000 person B2B SaaS companies in North America" beats "tech users." More specificity in = more specificity out.

What this prompt can't do

Claude, ChatGPT, and Gemini are excellent at designing experiments, generating questions, and playing devil's advocate. They cannot talk to twenty actual PMs and measure whether any of them would pay. That step is not optional, and no LLM will fix it for you.

The fastest way to actually run the Mom-Test questions from this prompt is async — a single link your target customers answer on their time, with an AI interviewer that asks the right follow-ups. That's what AskDeeper does.

FAQ

Questions about validating startup ideas

What is the best way to validate a startup idea?

The best validation is behavior, not opinion. Instead of asking "would you use this?" (a hypothetical that always gets polite yes answers), observe what target customers are already doing — what they pay for, what workarounds they invent, what they complain about publicly. This prompt structures that observation into three cheapest disconfirming experiments plus five Mom-Test-style interview questions.

How do I validate a product idea with no customers yet?

Use the pre-idea variation of the prompt below. The core move: run three small experiments that could disprove your hypothesis using only a Google Form, a Twitter thread, and five cold emails. If none of those experiments show negative signal, you have earned the right to spend a weekend prototyping. If they do show negative signal, you have saved yourself months.

Can ChatGPT, Claude, or Gemini actually validate a startup idea?

No. An LLM cannot validate your idea — only real customer behavior can. What an LLM can do is structure your thinking, surface blind spots, and design the experiments that will validate or disprove. This prompt explicitly forces the LLM into a hostile role so it does not fall into the common trap of being agreeable about whatever idea you paste.

Is building a waitlist landing page a valid test?

No — and this prompt bans it for a reason. Email signups measure curiosity, not willingness to pay. Thousands of startups have collected thousands of waitlist signups and shipped zero paying customers. The only landing-page tests that count are ones where the user has to enter a credit card, and even those are noisy.

How many customer interviews do I need before I decide?

Theme saturation usually hits around 5–8 interviews per segment — the point where the sixth interview stops surprising you. For a rigorous disconfirming test, aim for 10–15 interviews. Running them async with AI follow-ups (AskDeeper-style) is much faster than 60-minute Zoom calls and catches probing details that live moderators often miss.

Why does the prompt say "do not suggest raising money as validation"?

Investors fund bets, not validated ideas. Getting a check is signal about your pitch skills and network, not about demand. The quickest way to look "validated" without being validated is to raise money and then spend a year building the wrong thing. This prompt assumes your goal is demand proof, not funding.

Related prompts

← Back to the full prompt library

Now actually talk to ten people

The prompt has given you the questions. The 80% of the work that matters is still ahead: getting ten actual target customers to answer them. AskDeeper runs async AI interviews with your respondents — same Mom-Test anchoring, same adaptive follow-ups, verbatim quotes and themes extracted automatically. One link in, decisions out.

Try AskDeeper free See how async AI interviews work →