| Audience: | CIO đźž„ CISO đźž„ CTO/VP Engineering |
| Primary Sectors: | Healthcare Systems đźž„ Utilities/Energy đźž„ Government/Public Sector |
| Decision Horizon: | Immediate for vulnerability-management funding; next renewal cycle for critical software suppliers. |
Executive Summary
AI-assisted vulnerability discovery is changing the economics of cyber defense. The problem is twofold: attackers can find weaknesses faster, and enterprises will discover more flaws than they can safely patch. This is especially true in vendor-controlled, legacy, clinical, OT, and public-sector environments.
Decision Posture: Expand AI vulnerability discovery, code scanning, and autofix only where the organization has a funded closure model: named owners, remediation SLAs, testing capacity, supplier obligations, exception handling, and compensating controls. Otherwise, additional scanning creates a larger queue of known-but-unfixed risk. The better move is to stop funding discovery without funding closure.
Our Analysis
AI bug finding will help defenders, but it also exposes the uncomfortable operating gap, which shows that most enterprises are better at detecting risk than retiring it.
The Narrative vs. The Reality
The market narrative is that AI will industrialize vulnerability discovery much as fuzzing did: tools will scan continuously, defenders will find bugs earlier, and software will improve. That is plausible. Google’s OSS-Fuzz shows that continuous automated testing can produce real defensive value, helping identify and fix more than 13,000 vulnerabilities and 50,000 bugs across 1,000 projects as of May 2025.1
But AI does not merely improve the old model. Anthropic says Claude Mythos Preview can identify and exploit zero-day vulnerabilities in every major operating system and major web browser when directed to do so. Anthropic also reports that non-experts have used it to generate complete working exploits.2 Separate academic research found GPT-4 could exploit 87% of a small benchmark of real-world one-day vulnerabilities when given CVE descriptions, while other tested models and open-source scanners did not succeed under the same conditions.3 That changes the operating problem. The bottleneck moves from finding to closing.
Five realities matter more than the tool announcement:
- More findings do not equal lower risk. A scan result becomes risk reduction only after ownership, prioritization, testing, change approval, deployment, and verification.
- Many exposed systems are not directly fixable by the enterprise. Hospitals, utilities, and agencies often depend on vendor-controlled software, certified platforms, medical devices, OT systems, and inherited applications where patching is constrained by safety, uptime, procurement, or support contracts.
- Autofix is useful but not autonomous assurance. GitHub’s Copilot Autofix creates suggested fixes and pull requests, but GitHub states that the feature is a best effort and not guaranteed to succeed in every situation.4
- AI-generated patches introduce a second risk surface. A large-scale 2025 study of AI-generated patches across more than 20,000 GitHub issues found that agentic repair workflows can generate new vulnerabilities, particularly when given more autonomy and limited issue context.5
- Durable reduction comes from exploitability reduction, not endless triage. U.S. and allied guidance increasingly points software manufacturers toward memory-safe languages, memory-safety roadmaps, secure-by-design practices, and systematic prevention of vulnerability classes. But NSA and CISA also caution that memory-safe language adoption is not practical in all circumstances or solution areas.6
The Signal in the Noise
AI has made weak software easier to interrogate. It has not made fragile estates easier to repair.
What Changes Our Decision
CIOs should treat AI vulnerability discovery as a capacity multiplier for both sides, instead of an internal cyber program. The control point shifts to budget governance, supplier leverage, and exploitability-debt management. That is, which systems can be fixed quickly, which cannot, who owns the gap, and what compensating controls are funded before the next disclosure cycle.
Decision Rule
Do not approve net-new AI vulnerability-discovery spend unless the sponsor can show a closure model for the findings it will create. The minimum closure model should include:
- named technical and business owners for affected systems;
- remediation SLAs by criticality and exploitability;
- test and release capacity for high-priority fixes;
- supplier response obligations for vendor-controlled systems;
- a documented exception process for systems that cannot be patched quickly;
- compensating controls for persistent exposure.
Without those elements, the spend should be reclassified from “risk reduction” to “risk visibility” and funded only where visibility itself is the objective.
Recommended Actions
- Create a “cannot-patch-fast” register. The CISO should own the register, but Infrastructure, Enterprise Architecture, Procurement, and business system owners must populate it. Include systems where vulnerabilities persist because of vendor control, operational fragility, certification, clinical safety, OT constraints, end-of-life status, missing source code, or unclear ownership. Rank by exposure, privilege, data sensitivity, business criticality, and supplier responsiveness.
- Gate AI scanner expansion through remediation capacity. Before expanding code scanning, AI bug discovery, or automated triage, require the sponsor to identify who will review findings, who can approve fixes, how emergency changes will be tested, and which backlog items will be displaced. This converts “more visibility” into an explicit capacity tradeoff.
- Put supplier obligations into renewal and procurement. For critical software suppliers, require vulnerability disclosure timelines, no-cost patches for known exploited vulnerabilities, SBOM support, secure-by-design evidence, and a memory-safety or exploitability-reduction roadmap where relevant. CISA and FBI already frame secure-by-design ownership as a manufacturer obligation for software supporting critical infrastructure and national critical functions.7
- Use memory safety as a signal, not a universal mandate. For custom software, make memory-safe defaults part of the architecture standard for new high-risk components. For purchased software, ask vendors where memory-unsafe code remains, what has been sandboxed, what is on the roadmap, and which components are too sensitive to rely on patching alone.
- Reserve formal verification and deep refactoring for the narrowest blast radius.
Use the strongest techniques where failure cost is highest: identity, cryptography, clinical safety, payment authorization, grid control, privileged access, and internet-facing control planes. Do not turn this into a broad rewrite program.
What to Avoid
- Using AI scanners to manufacture unfunded risk. A larger vulnerability queue without closure authority will create audit evidence, not resilience.
- Allowing autonomous patching into production systems that touch regulated data, privileged access, clinical workflows, public safety, OT, or critical financial processes.
- Making this a developer-language crusade. Most target-sector CIOs cannot rewrite their estates into Rust. They can, however, change procurement terms, architecture gates, compensating controls, and funding logic.
Bottom Line
AI is making vulnerability discovery cheaper than vulnerability retirement. The CIO’s job isn’t just to find bugs faster, but to also make sure the flaws that matter have an owner, a fix path, or a funded containment plan.
Evidence and Sources
- Google OSS-Fuzz. 2025. “OSS-Fuzz — Continuous Fuzzing for Open Source Software.” Google reports that, as of May 2025, OSS-Fuzz had helped identify and fix more than 13,000 vulnerabilities and 50,000 bugs across 1,000 projects.
- Anthropic. 2026. “Claude Mythos Preview.” Anthropic states that Mythos Preview can identify and exploit zero-day vulnerabilities in every major operating system and major web browser when directed, and describes examples of non-experts using it to produce working exploits.
- Fang, Richard, Rohan Bindu, Akul Gupta, and Daniel Kang. 2024. “LLM Agents Can Autonomously Exploit One-Day Vulnerabilities.” The study found GPT-4 exploited 87% of a 15-vulnerability benchmark when given CVE descriptions; without descriptions, performance fell sharply.
- GitHub Docs. 2026. “Resolving Code Scanning Alerts.” GitHub states that Copilot Autofix can generate suggested fixes and pull requests, but operates on a best-effort basis and is not guaranteed to succeed in every situation.
- Sajadi, Amirali, Kostadin Damevski, and Preetha Chatterjee. 2025. “How Safe Are AI-Generated Patches?” The study analyzed AI-generated patches across more than 20,000 GitHub issues and found that agentic workflows can generate vulnerabilities, especially with greater autonomy and limited context.
- NSA and CISA. 2025. “Memory Safe Languages: Reducing Vulnerabilities in Modern Software Development”; Office of the National Cyber Director. 2024. “Back to the Building Blocks.” NSA and CISA caution that memory-safe language adoption is not practical in all circumstances, while ONCD identifies memory-safe programming languages as a high-leverage way to reduce memory-safety vulnerabilities.
- CISA and FBI. 2024. “Product Security Bad Practices.” The guidance says software manufacturers supporting critical infrastructure or national critical functions should prioritize security from the onset, publish memory-safety roadmaps for existing memory-unsafe products, and patch known exploited vulnerabilities in software components before release.