Vibe Coded to Pwned: A Red Team Wake-Up Call for the AI Development Era
By Matthew Martin, Cybersecurity Advisor
It took a team of developers four hours to build the application. It took our red team twenty minutes to break it.
That asymmetry should make every executive, product owner, and engineer pause. In the race to capitalize on the promise of AI-assisted development (commonly called vibe coding) organizations are shipping software faster than ever. What they are not shipping, in many cases, is security, resilience, or long-term maintainability. Our recent red team exercise against an internally developed, vibe-coded application made that painfully clear.
What Happened: Twenty Minutes to Compromise
Our engagement was straightforward: assess a business application built in roughly four hours using an AI coding assistant, following a typical vibe coding workflow of natural language prompts, minimal manual review, and fast iteration toward a working product.
The application worked. It launched. Users could interact with it. Executives were pleased with the speed of delivery.
It was also riddled with critical vulnerabilities.
Within twenty minutes, we had identified and exploited authentication weaknesses, bypassed access controls, and exfiltrated data the application was never supposed to expose. There was no exotic tradecraft required. These were textbook vulnerabilities, the kind that appear consistently in industry benchmarks of AI-generated code.
When we sat down with the development team for the debrief, the explanation was disarmingly honest: they knew. They knew the security controls were incomplete. But they had been told to move fast. There were no formal security requirements attached to the project. Executive pressure and user enthusiasm for the rapid delivery model had pushed governance, threat modeling, and secure code review to the side. The vibe was great. The security posture was not.
The Research Agrees: This Is Not an Isolated Incident
What our team experienced is not an edge case. It is the dominant pattern.
Security startup Tenzai assessed five of the best-known vibe coding platforms (Claude Code, OpenAI Codex, Cursor, Replit, and Devin) by building the same three test applications across each. Across all fifteen resulting applications, they found 69 vulnerabilities, with many rated “high” severity and roughly half a dozen rated “critical.” The conclusion: these tools handle generic, well-defined security patterns reasonably well, but fail where context determines the difference between safe and dangerous — precisely the nuanced judgment that experienced human engineers provide.
The numbers from broader research are starker still. A survey of developers who had shipped AI-generated code found that 53% later discovered security issues that had passed initial review, not hypothetical risks but real vulnerabilities already running in production. Georgetown’s Center for Security and Emerging Technology found that 86% of AI-generated code failed XSS defense mechanisms. CodeRabbit’s analysis found AI-generated code is 2.74 times more likely to introduce cross-site scripting vulnerabilities than human-written code. Researchers at Escape.tech scanned over 5,600 publicly available vibe-coded applications and found more than 2,000 vulnerabilities, over 400 exposed secrets (API keys, credentials, and tokens sitting in accessible code), and 175 instances of personally identifiable information, including medical records, IBANs, and phone numbers, exposed through application endpoints.
Georgia Tech’s Systems Software and Security Lab launched the Vibe Security Radar in 2025 specifically to track CVEs directly introduced by AI coding tools into real production systems. The founder noted: “Everyone is saying AI code is insecure, but nobody is actually tracking it. We want real numbers. Not benchmarks, not hypotheticals — real vulnerabilities affecting real users.” Their early findings suggest the true number of attributable CVEs is likely five to ten times higher than what metadata alone can confirm.
The Wiz research organization found that one in five vibe-coded applications has serious vulnerabilities or configuration errors serious enough to warrant immediate remediation. Studies consistently show that between 68 and 73 percent of code samples from popular AI tools contain security flaws when manually reviewed. Even the OWASP Top 10, the most basic benchmark of application security hygiene, is violated in roughly 45% of AI-generated code samples, a figure that has seen little improvement over two years of AI coding tool evolution.
There is also a deeply uncomfortable irony in the confidence gap: 59% of developers express security concerns about AI-generated code, yet 76% believe AI tools produce more secure code than humans. That misplaced confidence is itself a vulnerability.
The Governance Gap: Speed as an Excuse to Skip Process
The root cause in our red team engagement was not the AI tool. It was the governance vacuum that surrounded it.
Vibe coding is seductive precisely because it eliminates friction. But friction in software development is not always waste. Much of it is risk mitigation. Threat modeling, security requirements, code review, static and dynamic analysis, and secure SDLC checkpoints are the processes that transform functional code into trustworthy software. When executives and users celebrate speed and implicitly (or explicitly) deprioritize everything that slows delivery, the result is not a faster, leaner process. It is a process with the safety removed.
Security cannot be bolted on after the fact. As one industry expert noted, the idea that humans can meaningfully review AI-generated code after it is already written begins to collapse at the scale and velocity of vibe coding. Security must move into the act of creation. That means governance frameworks must define where AI tools are appropriate, such as rapid prototyping and boilerplate generation, versus where they require additional rigor, including security-critical code, authentication systems, data handling, and core architecture.
The EU AI Act, which became fully enforceable in August 2025, places strict obligations on systems where AI-generated code reaches production without meaningful human review. Regulatory exposure is now joining business risk and reputational risk as consequences of governance failures in AI-assisted development.
When someone tells your development team to “just vibe it out” and skip the security requirements, they are not streamlining the process. They are accepting a liability on behalf of the organization, one that may not surface until your data is in a threat actor’s hands.
Code as Craft: What We Lose When We Sideline Skilled Engineers
There is a dimension to this problem that does not appear in vulnerability databases, but that seasoned practitioners feel acutely.
Luke Gaskell, Senior Solutions Architect at Imaginex Digital, puts it plainly: “Development is an art form.” He is not speaking romantically. He is describing something functional and consequential. Skilled developers do not just make things work; they make things right. They reason about edge cases, failure modes, data flows, and the implications of architectural choices that will not reveal their costs until months or years later.
Gaskell’s concern with vibe coding is not philosophical. It is practical. “AI likes to make a mess of things,” he observes, “especially when you inject excessive context and set the expectations that it’s going to come out right. It’s not… It’s going to forget what it’s doing and it’ll undo everything you’re trying to accomplish.” That observation captures something the benchmarks can only partially measure: the compounding, often invisible degradation that happens when AI systems are handed complex, context-rich codebases and asked to navigate them carefully. They frequently cannot.
This tracks directly with what the data shows. GitClear’s 2025 analysis of 211 million lines of code changes documented a 4x growth in code duplication rates since AI coding tool adoption began, rising from 8.3% of changed lines in 2021 to 12.3% by 2024. Simultaneously, refactoring activity, the work developers do to consolidate code into coherent, maintainable structures, dropped by 60%. These are not just metrics. They are signals that a generation of software is being built without the architectural thinking that makes it last.
AI models generate code based on probabilistic patterns rather than architectural context. The result is often what practitioners call “Frankenstein code,” functionally active but structurally incoherent. It works until it doesn’t, and when it doesn’t, the humans left holding it often cannot understand why.
Telling a senior engineer that their role is to approve whatever the AI produces in thirty seconds is not an efficiency gain. It is the squandering of hard-won expertise in the name of throughput.
The Hidden Costs: Subsidies, Overhead, and the Supply Chain Reckoning
The vibe coding pitch is irresistible in the short term: build faster, spend less. The longer-term financial picture is considerably more complicated.
Processing overhead is real and growing. Newer reasoning models, the kind used by advanced vibe coding platforms, run multiple hidden reasoning steps before returning output. Unlike traditional models that scaled cost primarily with input and output length, these systems charge for “thinking intensity,” making costs unpredictable at enterprise scale. GitClear’s research also documented that AI-generated code produces 154% larger pull requests and drives 91% longer code review times, negating a significant portion of the speed gain. Faros AI found organizations experienced 9% more bugs per developer as AI adoption grew.
The subsidy window is closing. Current AI coding platform pricing is being substantially underwritten by venture capital and hyperscaler investment. The gap between a $40 vibe-coded prototype and $6,000 to $32,000 in Year 1 production costs is where most such projects collapse. PlanetScale eliminated its free tier in 2024. Hyperscalers are collectively committing over $360 billion annually in AI infrastructure, and that capital must eventually generate returns. The question identified across the industry is not whether costs will shift to enterprise users, but when and by how much. Organizations building business processes on subsidized AI tooling are accumulating a supply chain liability that has not yet come due.
Technical debt compounds invisibly. A METR randomized controlled trial on experienced open-source developers found that those permitted to use AI tools took 19% longer than those who did not. Before the study, participants predicted a 24% speedup. After experiencing the slowdown, they still believed AI had made them 20% faster. That 39-percentage-point gap between perception and reality is perhaps the most important finding in recent AI development research. It suggests that the productivity narrative around vibe coding is significantly shaped by how it feels rather than what it delivers.
The Elephant Graveyard: Unmaintainable Apps and the Late-Stage AI Problem
Vibe coding’s most enthusiastic proponents celebrate it as a democratizing force, and for rapid prototyping, there is real truth to that. But the accelerating ease of starting applications has created a landscape littered with what might be called an elephant graveyard of half-built, half-tested software.
These applications were born quickly, deployed quickly, and then largely forgotten or, worse, left running. They are difficult to audit because the humans who launched them did not write the code and often cannot read it. They are difficult to update because AI tools, when handed a late-stage codebase requiring careful, surgical modification, tend to introduce regressions. This is precisely the dynamic Gaskell describes: the AI forgets what it was doing, loses the thread of the original intent, and begins undoing the very things the developer was trying to preserve.
ThoughtWorks has documented this at the organizational level: “The rise of coding agents further amplifies these risks, since AI now generates larger change sets that are harder to review.” Late-stage maintenance is fundamentally different from early-stage generation. It demands precise understanding of subtle interdependencies that AI systems, at their current level of maturity, demonstrably struggle with.
The open source ecosystem is already showing the stress fractures. Vibe-coded contributions are flooding maintainer inboxes with submissions that look legitimate but embed subtle defects. Daniel Stenberg shut down the cURL bug bounty program after AI-generated submissions reached 20% of total volume and the overall valid rate dropped to 5%. The signal-to-noise ratio in collaborative development is degrading, and the humans responsible for sustaining critical infrastructure are paying the price.
The Path Forward: Governance in the Loop, Always
None of this is an argument against AI-assisted development. These tools are powerful, and used responsibly, they genuinely accelerate certain categories of work. The issue is not the technology. It is the organizational and governance context in which it is deployed.
Based on our red team experience and the broader research landscape, here is what responsible use looks like:
Define where AI assistance is appropriate. Prototyping, boilerplate generation, documentation, and code scaffolding are well-suited to AI assistance. Authentication systems, data handling, core business logic, and security-critical paths require experienced human authorship and rigorous review.
Security requirements must precede development, not follow it. No project, vibe-coded or otherwise, should begin without a minimum viable security specification. Threat modeling does not have to be a multi-week exercise; even a one-hour structured conversation about data flows, trust boundaries, and access controls is vastly better than nothing.
Governance cannot be overridden by speed pressure. When executives or stakeholders push to skip security review in the name of delivery velocity, that decision needs to be documented, owned, and escalated. The risk belongs to the organization, not the development team.
Test AI-generated code as adversarially as possible. Static analysis, dynamic analysis, and human penetration testing are not optional quality gates. They are the mechanism by which AI-generated vulnerabilities are caught before attackers find them first.
Invest in your engineers. The goal of AI tooling should be to amplify human expertise, not replace it. A senior engineer working with AI, applying architectural judgment and security intuition to AI-generated output, is dramatically more valuable and safer than an AI agent working alone toward a demo deadline.
Conclusion: The Vibe Is Not a Strategy
When our red team walked out of that debrief, the mood was surprisingly good. This particular application was never meant to be a long-term asset. It was a team-building exercise that tested whether speed-first development holds up under adversarial scrutiny. The leaders were pleased with the outcome, not despite the fact that we broke it, but because of it. That is the point. The exercise was an opportunity to make the lesson tangible, and it did.
The twenty minutes it took to compromise that application represents something important: it is the minimum amount of time an attacker needs if your organization treats governance as optional. Security debt, like financial debt, does not disappear when you stop thinking about it. It accrues interest, and the interest rate in cybersecurity is measured in breaches, not basis points.
Vibe coding is a tool. Like every tool, its value depends entirely on how it is used, by whom, and within what structure of accountability. The vibe cannot be your security strategy. Governance has to stay in the loop, not because it is bureaucratically satisfying, but because twenty minutes is a very short window between deployment and disaster.
About the Author
Matthew brings over 15 years of security leadership, specializing in risk management, compliance, and security operations. He transforms complex requirements, from SOC 2 and NIST to AI governance, into practical programs that align governance, operations, and business priorities. Known for building frameworks that work in the real world, Matthew establishes ownership, workflows, and metrics that teams can execute confidently. He bridges the gap between executives seeking clarity and technical teams needing practical solutions, excelling in high-stakes scenarios such as incident response planning, tabletop exercises, and audits.