Over the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple auditsOver the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple audits

Proof, Not Guesswork: An AI Audit Pipeline That Finds What Other Web3 Audits Miss

2026/03/03 14:07
6 min read
For feedback or concerns regarding this content, please contact us at [email protected]

Over the past few months I’ve been building and running an AI pipeline that only reports what it can prove. On codebases that had already passed multiple audits, it still uncovered exploitable vulnerabilities. In one run alone, it surfaced eight reproducible issues, including High and Critical findings. That outcome is not luck. It is what a reproducible, multi-stage process produces: a short report and executable proof-of-concept files committed alongside the code. Every finding is backed by a runnable exploit that demonstrates the vulnerability in practice. Proof, not guesswork. In a sector where exploits are real and trust in “potential” findings is low, that bar is not optional.

Early on, I experimented with single-model scans and various AI audit tools. The pattern was consistent: long lists of potential issues, many false positives, and very little that could be demonstrated concretely. Closing the gap between “possible” and “provable” became the goal. The current pipeline grew out of that frustration with maybe-lists and unverified claims.

This is not a replacement for a formal audit. It is an additional, reproducible second opinion at the code level, a code review at scale. You still need a full human audit before mainnet. A reproducible, exploit-backed second pass should be standard practice for code that keeps evolving. This is the pass I run when I want to know what slipped through, or what appeared after the last audit.

What I Built (And Kept Iterating On)

What I built is a pipeline that produces a report and executable proof-of-concept files in the repository. Every finding includes a runnable exploit that demonstrates the issue. If it is in the report, there is a concrete proof-of-concept that shows how it can be triggered.

The filter is not heuristic. Every dropped finding receives a documented rejection reason: factually wrong, no valid attack path, design choice, duplicate, or out of scope. That is a structured quality gate, not gut feel. Only findings with a runnable, non-trivial proof-of-concept survive to the final report. When severity sources disagree, the lower severity is used.

When there is a prior audit report, such as a PDF, I ingest it so the pipeline does not re-flag already reported issues. The run is self-contained and sets up the test environment, such as Foundry, if needed. Output is structured to fit how teams and bug bounty platforms operate.

Pricing is a distinct service with its own scope, positioned between single-model scans and full audits. Full human audits typically range from $50k to $200k or more. This sits in between: repeatable, proof-driven, and scoped to code-level risk.

Scale and Models: What I’ve Been Running

Runs and codebases: Over the past few months I have executed dozens of full pipeline runs across more than 15 codebases. Many had already been audited by two or more teams. This is not a handful of reviews, but repeated application across real projects. The pipeline has been calibrated over many real runs; the current configuration is the result of continuous refinement, not a one-off design.

Getting to this point required substantial experimentation across models and configurations, as well as real spend. The current setup reflects what held up under adversarial challenge and what consistently produced reproducible results.

Explorers per run: Each run executes 8 to 10 explorer agents in parallel. A typical run takes several hours and produces 40 or more candidate findings before deduplication and challenge. The funnel reduces that to a single-digit or low-teens final report. The candidate-to-report ratio is often 3:1 or 4:1.

One run produced 11 findings with executable proof-of-concepts in the final report, including 3 High, 5 Medium, and 3 Low. Another produced 10 findings with proof-of-concepts, 2 overlapping with prior audits, and 8 new findings including High and Critical. Most of that report was new to the client.

Models in the mix: I do not rely on a single model. Multiple model families are used for exploration, challenge, and proof-of-concept construction. The mix is not trial and error. I track which models contribute unique High findings, which generate mostly noise, and which hold up under adversarial challenge. Model selection has been refined over many runs; the current set remains because the data supports it. Remove one and you can miss a finding. Retries and fallbacks ensure that each run completes even if a step encounters issues.

Why a Second Pass Finds Things

Human audits are finite. Edge cases slip through. Code changes. Refactors introduce new paths. A single audit is a snapshot; code is a moving system. Relying on one pass is comfortable, but risky when the attack surface is large.

A second pass does not imply the first audit was poor. It acknowledges that coverage is difficult.

The pipeline is not just another opinion. It works differently: multiple independent exploration paths, an adversarial challenger that questions every finding, conservative deduplication rules, and explicit validation where only issues backed by a working proof-of-concept survive. That is a methodologically different perspective.

Explorers perform structured analysis across protocol design, logic, economics, and attack paths. Findings are merged and deduplicated. A challenger attempts to invalidate them. Proof-of-concepts are built and executed in your framework, such as Foundry or Hardhat. Only findings backed by a working exploit remain in the final report. Everything rejected is logged with a documented reason.

The real pipeline includes significantly more stages than the simplified diagram below. Behind each stage are validation and control mechanisms. It is staged quality assurance, not a linear AI workflow. A full run spans multiple structured analysis and validation phases, emphasizing depth rather than runtime.

The Numbers

Dozens of candidates per run are systematically reduced to a small, defensible set. Only findings backed by runnable proof-of-concepts survive. That is what it takes to get to proof instead of guesswork.

If it is in the report, there is a working proof-of-concept demonstrating it. That is the bar.

When It Fits

Pre-launch as an additional pass. After a refactor or upgrade. Following governance or tokenomics changes. Before a raise or external audit. Narrow scopes such as a single module or integration.

The same fixed categories are applied every run. I do not publish the list. What ultimately appears in the report depends on the specific attack surface of your codebase.

What Kind of Hardening Shows Up

Across dozens of runs, certain patterns consistently surface. Governance parameters that can be zero or invalid on production paths. Withdrawal and queue logic that cannot progress under loss scenarios. First-depositor and reward front-running in share-based systems. Withdrawal flows that stall when one market becomes illiquid.

These are recurring classes of issues that appear under structured analysis and adversarial challenge. If your system has similar surface area, that is the kind of coverage this pipeline is designed to stress.

If a reproducible, exploit-backed second pass makes sense for your codebase, you know where to find me. Telegram: @Kurt0x


Proof, Not Guesswork: An AI Audit Pipeline That Finds What Other Web3 Audits Miss was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.

Market Opportunity
Notcoin Logo
Notcoin Price(NOT)
$0.0003589
$0.0003589$0.0003589
-2.89%
USD
Notcoin (NOT) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact [email protected] for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

OpenClaw API Integration Is Live in the Crypto.com App: Here’s What Traders Need to Know

OpenClaw API Integration Is Live in the Crypto.com App: Here’s What Traders Need to Know

TLDR: OpenClaw API integration is now live in the Crypto.com App via the new Agent Key feature for traders. Users can set weekly trading budgets to cap how much
Share
Blockonomi2026/03/03 19:30
The Best Crypto Presale in 2025? Solana and ADA Struggle, but Lyno AI Surges With Growing Momentum

The Best Crypto Presale in 2025? Solana and ADA Struggle, but Lyno AI Surges With Growing Momentum

The post The Best Crypto Presale in 2025? Solana and ADA Struggle, but Lyno AI Surges With Growing Momentum appeared on BitcoinEthereumNews.com. With the development of 2025, certain large cryptocurrencies encounter continuous issues and a new player secures an impressive advantage. Solana is struggling with congestion, and the ADA of Cardano is still at a significantly lower level than its highest price. In the meantime, Lyno AI presale is gaining momentum, attracting a large number of investors. Solana Faces Setbacks Amid Market Pressure However, despite the hype surrounding ETFs, Solana fell by 7% to $ 203, due to the constant congestion problems that hamper its network functionality. This makes adoption slow and aggravates traders who want to get things done quickly. Recent upgrades should combat those issues but the competition is rising, and Solana continues to lag in terms of user adoption and ecosystem development. Cardano Struggles to Regain Momentum ADA, the token of a Cardano, costs 72% less than the 2021 high and is developing more slowly than Ethereum Layer 2 solutions. The adoption of the coin is not making any progress despite the good forecasts. Analysts believe that the road to regain the past heights is long before Cardano can go back, with more technological advancements getting more and more attention. Lyno AI’s Explosive Presale Growth In stark contrast, Lyno AI is currently in its Early Bird presale, in which tokens are sold at 0.05 per unit and have already sold 632,398 tokens and raised 31,462 dollars. The next stage price will be established at $0.055 and the final target will be at $0.10. Audited by Cyberscope , Lyno AI provides a cross-chain AI arbitrage platform that enables retail traders to compete with institutions. Its AI algorithms perform trades in 15+ blockchains in real time, opening profitable arbitrage opportunities to everyone. Those who make purchases above 100 dollars are also offered the possibility of winning in the 100K Lyno AI…
Share
BitcoinEthereumNews2025/09/18 18:22
What to Expect From The Fed This Year After First Rate Cut in 2025

What to Expect From The Fed This Year After First Rate Cut in 2025

The United States central bank has just cut rates for the first time this year, and investors are now watching for its next move.
Share
CryptoPotato2025/09/18 13:02