Can AI Agents Boost Ethereum Security? OpenAI and Paradigm Created a Testing Ground

ETH-2,9%

In brief

  • EVMbench tests AI agents on 120 real-world Ethereum smart contract vulnerabilities.
  • Tool evaluates detection, patching, and exploitation across three distinct modes.
  • GPT-5.3-Codex achieved 72.2% success rate in exploit mode testing.

ChatGPT maker OpenAI and crypto-focused investment firm Paradigm have introduced EVMbench, a tool to help improve Ethereum Virtual Machine smart contract security. EVMbench is designed to evaluate AI agents’ ability to detect, patch, and exploit high-severity vulnerabilities in Ethereum Virtual Machine (EVM) smart contracts. Smart contracts are the heart of the Ethereum network, holding the code that powers everything from decentralized finance protocols to token launches. The weekly number of smart contracts deployed on Ethereum reached an all-time high of 1.7 million in November 2025, with 669,500 deployed last week alone, according to Token Terminal. 

EVMbench draws on 120 curated vulnerabilities from 40 audits, most sourced from open audit competitions such as Code4rena, according to an OpenAI blog post. It also includes scenarios from the security auditing process for Tempo, Stripe’s purpose-built layer-1 blockchain focused on high-throughput, low-cost stablecoin payments. Payments giant Stripe launched the public testnet for Tempo in December, saying at the time that it was being built with input from Visa, Shopify, and OpenAI, among others. The goal is to ground testing in economically meaningful, real-world code—particularly as AI-driven stablecoin payments expand, the firm added.

Introducing EVMbench—a new benchmark that measures how well AI agents can detect, exploit, and patch high-severity smart contract vulnerabilities. https://t.co/op5zufgAGH

— OpenAI (@OpenAI) February 18, 2026

EVMbench is meant to evaluate AI models across three modes: Detect, patch, and exploit. In “detect,” agents audit repositories and are scored on their recall of ground-truth vulnerabilities. In “patch,” agents must eliminate vulnerabilities without breaking intended functionality. Finally, in the “exploit” phase, agents attempt end-to-end fund-draining attacks in a sandboxed blockchain environment, with grading performed via deterministic transaction replay. In exploit mode, GPT-5.3-Codex running via OpenAI’s Codex CLI achieved a score of 72.2%, compared to 31.9% for GPT-5, which was released six months earlier. Performance was weaker in the detect and patch tasks, where agents sometimes failed to audit exhaustively or struggled to preserve full contract functionality. The ChatGPT makers’ researchers cautioned that EVMbench does not fully capture real-world security complexity. Still, they added that measuring AI performance in economically relevant environments is critical as models become powerful tools for both attackers and defenders. Sam Altman’s OpenAI and Ethereum co-founder Vitalik Buterin have previously been at odds over the pace of AI development. In January 2025, Altman said that his firm was “confident we know how to build AGI as we have traditionally understood it.” But Buterin advocated that AI systems should include a “soft pause” capability that could temporarily restrict industrial-scale AI operations if warning signs emerge.

Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to Disclaimer.

Related Articles

US CFTC Releases Crypto Asset Collateral Pilot Guidance: BTC/ETH Capital Adequacy Ratio 20%, Stablecoins 2%

The U.S. Commodity Futures Trading Commission (CFTC) has released guidance on a pilot program for crypto assets as collateral, allowing Bitcoin, Ethereum, and stablecoins to be used as margin. Futures brokers must comply with capital requirements and regulatory reporting obligations, and after three months can expand to other crypto assets as collateral. The guidance clarifies the use cases for crypto assets and derivatives clearing requirements.

GateNews28m ago

Venus Attacker Converts BNB and Other Assets to ETH, Invested $9.92 Million Only Recovered $5 Million

On March 22, on-chain analyst Remainder monitored that the Venus attacker converted all previously extracted BNB, BTC, and CAKE into ETH and bridged it to the Ethereum network, valued at approximately $4.72 million. Currently, the attacker has recovered assets worth approximately $5 million, but whether they conducted long/short operations on trading platforms remains unconfirmed.

GateNews30m ago

Ethereum whales return to profitability, bull cycle signals repeat

The largest Ethereum wallets have returned to an unrealized profit state, historically linked to strong price recoveries. The ETH Whales Unrealized Profit Ratio has surpassed 0, indicating significant investor gains. Past data suggests a correlation with upcoming price trends.

TapChiBitcoin57m ago

Mysterious Whale Withdraws $4.29M USDT from Aave to Purchase 2,012 ETH

Gate News bot message, a mysterious whale withdrew $4.29 million USDT from Aave and purchased 2,012 ETH. The whale currently holds 119,826 ETH valued at $248.98 million and maintains $4.35 million in USDT on Aave. According to Onchain Lens, this whale previously used $30.72 million USDT to acquire 1

GateNews1h ago

Citigroup Slashes Bitcoin and Ethereum 12-Month Price Targets, Citing Stalled U.S. Crypto Legislation Weighing on Upside Catalysts

Citigroup has lowered its target prices for Bitcoin and Ethereum, indicating a cautious outlook on the medium-term prospects of the crypto market. The Bitcoin target price has been reduced from $143,000 to $112,000, while Ethereum's has been lowered from $4,304 to $3,175, mainly due to slow progress in U.S. cryptocurrency legislation. Although there is still room for upward movement over the next year, the market may remain volatile in the short term, with Ethereum's valuation becoming more reliant on fundamentals.

区块客1h ago

Machi's $11.06M ETH Long Position Fully Liquidated, Total Losses Exceed $30M

Gate News bot message, Machi's 5,250 ETH long position valued at $11.06 million has been fully liquidated following a sharp market drop. His total losses have now exceeded $30.22 million, with only $158,000 remaining in his account.

GateNews1h ago
Comment
0/400
No comments