How to Review a Vibe-Coded App Before Launch: The Complete Checklist
If you built your app with Cursor, Claude Code, or Bolt, there is a specific set of checks you must run before real users touch it. Automated tests pass — and apps still break in production. Here is the exact checklist, organized by failure category, to close those gaps before they cost you customers.
Key Takeaways
- AI-generated code contains 1.7x more issues than human-written code — most of them logic bugs, not syntax errors (ingram.tech, 2026).
- The five critical failure categories are: authentication & session, payment edge cases, permission bugs, broken error handling, and core user flows.
- Automated scanners miss runtime authorization flaws — User A seeing User B's data passes every static check.
- 45% of AI-generated apps contain at least one OWASP Top 10 vulnerability (Veracode, 2025).
- Most vibe-coded apps pass CI/CD but fail under real user traffic because CI/CD tests what was specified, not what was forgotten.
- A human reviewer reads your spec, then tries to break your app as an adversarial user — that is what catches the gaps.
The Launch That Broke in Production
You described your app in plain English, watched the AI build it, clicked through the happy path, everything worked. You pushed to production. Three days later you got an email from a user: "I can see someone else's account data on my dashboard."
This is not a hypothetical. In 2025, the Tea dating app — built on Firebase with AI-generated backend rules — left 72,000 verification photos, government IDs, and over one million private messages exposed because the default access rules were never configured. The app had passed its own internal testing. It worked fine in the demo.
Real Breach: Tea App (2025)
Firebase backend left on default open rules in a vibe-coded dating app. Exposed: 72,000 images including government IDs and 1M+ private messages. Root cause: AI generated the database reads but never generated the security rules. CI/CD passed. Static scans passed. The bug lived in configuration, not code.
The problem is structural. When you vibe-code, the AI optimizes for making the feature work on the happy path. It does not automatically audit every place where your security model needs to be enforced. That gap — between "works in demo" and "safe under real user traffic" — is exactly what a pre-launch review closes.
Why Automated Tests Are Not Enough
Here is the uncomfortable truth: your CI/CD pipeline is testing the code that was written, not the requirements that were forgotten. Automated tools operate on what exists. They cannot flag what is missing.
CodeRabbit, one of the leading AI code review bots, reports an accuracy rate of roughly 46% on logic bugs. That means more than half of logic-level issues slip through. For a deeper look at why, see our article on why AI code review bots miss bugs.
"The scanner came back clean. It always does. Because the vulnerability was not in the code. It was in how the code behaved at runtime." — getautonoma.com, Vibe Coding Risks for Founders (2025)
Sonar's research found that more than 90% of issues in AI-generated code fall into categories that static analysis tools are specifically weak at — logic errors, mismatched assumptions between layers, and authorization gaps. (MIT Technology Review, Dec 2025)
| Check type | What it catches | What it misses | Time cost |
|---|---|---|---|
| Unit tests | Function-level logic | Integration failures, auth gaps, cross-user data leaks | Low (automated) |
| CI/CD pipeline | Build integrity, known test regressions | Runtime behavior, missing validations, config errors | Low (automated) |
| SAST scanner (Snyk / Semgrep) | Known CVE patterns, hardcoded secrets | Business-logic flaws, missing server-side enforcement | Low (automated) |
| AI code review bot | ~46% of logic bugs, style issues | ~54% of logic bugs, spec-vs-code mismatches | Low (automated) |
| Manual human review | Runtime auth, plan limits, user flow gaps, spec mismatches | Scale / load issues (needs separate load testing) | 4–8 hours |
The 5-Category Pre-Launch Checklist
The following checklist is organized around the five failure categories that appear most consistently in post-mortem analyses of failed vibe-coded launches. Work through each section before you open your app to real users.
1. Authentication & Session Management
Authentication is the most common first-failure point. AI generates login flows correctly for the happy path but frequently misses edge cases that attackers find on day one.
- Log out, then directly visit a URL that requires login — does the app redirect you, or does it load the page anyway?
- Open a protected page in an incognito window without logging in — same test, different surface.
- Check that session tokens expire — log in, wait past expiry (or manually expire the cookie), confirm the next request forces re-auth.
- Verify password reset links expire after use — click the same reset link twice; the second click should be rejected.
- Confirm that "remember me" tokens are stored securely (httpOnly, Secure, SameSite=Lax or Strict cookies).
- Check your auth provider settings directly — if using Supabase, Firebase, or Auth0, log into the dashboard and verify email confirmation is required.
Real Breach: Base44 (2025)
A public app_id in a vibe-coded platform allowed attackers to bypass SSO entirely and register accounts that gained access to internal applications marked "private." The auth flow was AI-generated and worked correctly for normal users — the bypass existed in an unguarded registration endpoint the AI added as a convenience feature.
2. Payment Edge Cases
AI models are excellent at generating Stripe or Paddle integration code for the success case. They are much weaker on the full state machine: failed charges, expired cards, subscription downgrades, and refund handling.
- Use Stripe's test card
4000000000000002(card declined) — does your app handle this gracefully, or does it throw a generic 500 error? - Verify that plan limits are enforced in your API, not just in the React component — make a direct API call to a premium endpoint without a valid subscription.
- Test the webhook handler: simulate a
customer.subscription.deletedevent and confirm the user's access is revoked within seconds. - Check that failed webhook deliveries are retried — Stripe retries up to 72 hours; confirm your endpoint is idempotent (same event processed twice = same result).
- Confirm receipt emails are sent and contain accurate amounts and product descriptions.
- Test the free trial expiry path end-to-end without manually upgrading.
3. Authorization & Permission Bugs
This is the single most dangerous category in vibe-coded apps. AI generates UI-level permission checks reliably. It generates API-level enforcement inconsistently. The result: a direct API call bypasses every frontend guard.
- Create two test accounts (User A and User B). Log in as User A, copy the ID of a resource (document, record, project). Log out. Log in as User B. Manually craft a request to that resource ID — can User B read it?
- If using Supabase: open the dashboard, navigate to Authentication > Policies, and verify Row Level Security (RLS) is enabled on every table that stores user data.
- Check that admin-only actions (user deletion, plan upgrades) require a server-side role check — not just a hidden button in the UI.
- Verify that your database keys are not exposed in client-side bundles — search your built JS files for your database URL or service role key.
- Test that a free-tier user cannot access premium features by directly calling the API endpoint.
"We saw the HackerNews thread. Can we get on a call?" An investor email arrived at 8 AM on a Tuesday — users could see another user's data in the dashboard. The security scanner had cleared the code three days prior. — Founder story documented by getautonoma.com (2025)
For a full analysis of why this category is so hard to catch automatically, see our article on vibe coding security risks.
4. Error Handling & Information Leakage
AI-generated error handling tends to be optimistic — it handles the errors that were anticipated during generation. Real users find unanticipated error states immediately.
- Submit every form with empty fields — does each required field produce a clear validation message, or does the app crash silently?
- Submit forms with inputs that are too long, contain special characters (< > ' " ;), or contain SQL-like syntax — does the app sanitize gracefully?
- Trigger a network error (use browser DevTools to set the network to "offline") while submitting a form — does the app show a user-friendly message or hang indefinitely?
- Check that production error responses do not include stack traces, file paths, or database schema information — these leak architectural details to attackers.
- Verify that a 404 page exists and does not expose routing structure.
- Test loading states: simulate a slow network (DevTools > Network > Slow 3G) — does the UI show a spinner, or does it appear frozen?
5. Core User Flow Validation
The last category is also the one most founders feel confident skipping — "I've clicked through it myself." The key is to test as a new user who has never seen your app, on a device and network you do not normally use.
- Complete the entire onboarding flow from a fresh incognito window on a mobile device — does anything break or look wrong?
- Complete the primary value action (the thing your app is for) as a first-time user, without any prior knowledge of the UI.
- Test the email confirmation flow end-to-end — sign up with a real email address you control, receive the confirmation, click it, and verify you land in the correct state.
- Check that the app works without optional features — if a user skips the optional onboarding step, does the main flow still work?
- Run Google PageSpeed Insights on your main page and verify the score is above 70 — slow apps lose users before the auth bug can even hurt them.
- Check for console errors in browser DevTools during normal use — AI-generated code frequently leaves error boundaries missing, so runtime errors appear silently in the console.
Don't want to do this alone?
We review every push against your spec — auth, permissions, payments, error handling. Real humans, 24-hour turnaround.
Install Vibers — FreeHow to Run Each Check (Without Writing Code)
Most of the checks above require no engineering background. Here is the minimal toolset you need:
- Two browser profiles or devices. One as User A, one as User B. This is how you test cross-user permission bugs without any technical setup.
- Browser DevTools (F12). Network tab for slow network simulation, Console tab for silent errors, Application tab for inspecting cookies and storage.
- Stripe test cards. Stripe's test mode includes cards for declined, insufficient funds, expired, and 3DS-required scenarios — all documented at stripe.com/docs/testing.
- Your database dashboard. Supabase, Firebase, PlanetScale — every major backend has a UI where you can verify security rules directly without reading code.
- Google PageSpeed Insights. Free, no account required, gives you a concrete performance score and actionable fixes.
The Autonoma testing framework recommends spending at least 15–30 minutes on the "unhappy paths" immediately after each AI generation session — before the code gets deployed anywhere. Their four-phase framework (before generation, after generation, before deployment, after deployment) is worth reading in full at getautonoma.com.
What Human Review Adds (That Automation Cannot)
There is a class of bug that no static tool can find, because it requires reading your product spec and then comparing what the spec says against what the deployed app actually does.
Consider this scenario: your spec says "only paid users can export data." Your AI generated an export button that is hidden for free users. A human reviewer reads the spec, then directly calls your API export endpoint without a subscription. The button is hidden — but the endpoint is open. That is not a code vulnerability any scanner can flag, because the endpoint code itself is syntactically correct. The bug is a missing requirement, not a code defect.
Human review adds three things that automation fundamentally cannot provide:
- Spec comparison. A human can read what you intended and verify it is what shipped. AI tools review only what is in the code.
- Adversarial user simulation. A human reviewer thinks like a user who is trying to get something they should not have. Fuzz tests hit known patterns; humans probe novel paths.
- Context about your specific product. A generic scanner does not know that your "admin" role should only be assigned by other admins, not by users editing their own profiles. A reviewer who has read your spec does.
For a deeper comparison of automated bots versus human reviewers, see CodeRabbit alternative: human review.
Before You Launch: The 10-Minute Escape Hatch
If you have limited time and need to ship today, here are the five checks most likely to prevent a critical incident in week one:
- Test cross-user data access — two accounts, try to read each other's data via direct URL or API call.
- Verify RLS / security rules are on in your database dashboard — not assumed to be on.
- Search your deployed JS bundle for your database URL, service key, or API secret.
- Test the payment declined path end-to-end with a Stripe test card.
- Submit every form with empty data and confirm no 500 errors or exposed stack traces.
These five checks take roughly 45 minutes and will catch the majority of week-one incidents. The full checklist above is what you run when you have a full day.
Frequently Asked Questions
We do this review for you
Install the Vibers GitHub App. We check every push against your spec — auth, payments, permissions, error handling. Human reviewers, 24-hour turnaround, free to start.
Install Vibers on GitHub