Gate News message, April 27 — Logan Kilpatrick, senior product manager at Google DeepMind and product lead for Google AI Studio, stated on X that every company building AI-based products should establish its own custom benchmarks to measure AI model performance. He described this as a method to make model improvements “disproportionately benefit your company” and urged founders and business leaders to “start tomorrow.”
Most companies currently rely on public leaderboards to select AI models, but these measure general capabilities that often misalign with specific business scenarios. Kilpatrick cited the example of a contract review company most concerned with clause extraction accuracy—a capability absent from public benchmarks, making it impossible to assess model performance on that task. Custom benchmarks offer two key advantages: first, they enable companies to evaluate each model update against their own business tasks and select the model that performs best in their actual use case rather than the highest-ranked model overall; second, they allow companies to share these test sets with model providers, driving continuous optimization in areas that matter to their business.
Kilpatrick noted that companies like Zapier and Sierra are already implementing this approach, stating that “there is a lot of alpha that can be created here.”
Disclaimer: The information on this page may come from third parties and does not represent the views or opinions of Gate. The content displayed on this page is for reference only and does not constitute any financial, investment, or legal advice. Gate does not guarantee the accuracy or completeness of the information and shall not be liable for any losses arising from the use of this information. Virtual asset investments carry high risks and are subject to significant price volatility. You may lose all of your invested principal. Please fully understand the relevant risks and make prudent decisions based on your own financial situation and risk tolerance. For details, please refer to
Disclaimer.
Related Articles
OpenAI's Greg Brockman: AI Shifting from Chat to Autonomous Task Execution
Gate News message, April 27 — Greg Brockman, president and co-founder of OpenAI, says the next wave of artificial intelligence will move users from chatting with AI bots to assigning real-world tasks. This shift requires enterprises to rethink operational workflows and establish new protocols for se
GateNews1m ago
B.AI Upgrades Infrastructure, Launches Major Skills Features
Gate News message, April 27 — B.AI announced multiple product and ecosystem advancements this week. The BAIclaw landing page received a complete visual and interaction overhaul, with website multilingual support expanded to 10 languages, strengthening its global usability.
On the infrastructure
GateNews15m ago
Beijing Issues Ban-Removal Requirement for Trading to Be Canceled! Meta’s $2.0 Billion Acquisition of China’s AI Startup Manus Fails
The China National Development and Reform Commission officially issued an announcement today (April 27), stating that the Office of the Work Mechanism for Foreign Investment Security Reviews, “in accordance with law and regulations, has made a decision to prohibit investment in the foreign investment acquisition of the Manus project and requires the party concerned to cancel the acquisition transaction.” This is among the few cases since China’s “Administrative Measures for Foreign Investment Security Reviews” took effect in which the highest-strength measures were used to impose a “prohibition on investment” and require the cancellation of an already completed transaction.
Meta splashes $2 billion to buy the cheapest AI application
Time goes back to December 29, 2025. Meta announced the acquisition of China’s AI agent startup Manus, and the market estimated the price would fall between $2 billion and $3 billion. Manus is a general-purpose AI developed by Beijing Butterfly Effect Technology Development, and after it launched on March 6, 2025, it became the talk of the industry overnight due to an outstanding performance in the GAIA benchmark
ChainNewsAbmedia29m ago
Xizhi Technology-P IPO Shares Surge Over 360% on Gray Market, Gains Narrow to 320%
Gate News message, April 27 — Xizhi Technology-P (01879.HK), a Hong Kong-listed AI chip company, saw its shares surge over 360% on the gray market (dark market) earlier today, though gains have since narrowed to 320%.
The stock is trading ahead of its official Hong Kong IPO
GateNews42m ago
Should AI boost productivity or lower costs? A tenfold efficiency increase hasn’t turned into a tenfold revenue jump, but in Silicon Valley, nobody dares to call it off
Five Yuan Capital partner Meng Xing has recently published a Silicon Valley inspection report, proposing a judgment that has even changed his own note-taking habit: Silicon Valley is entering a stage where even people who can “ride waves” are drowned by the waves. The iteration speed of AI has shifted from “monthly” to “weekly”—even Silicon Valley itself can’t keep up with its own pace.
When AI amplifies a team’s productivity by five times, you can reduce 80% of the workforce to maintain the same output, or keep headcount and do five times the work. Meng Xing’s observations this time in Silicon Valley are essentially the first draft of the answer given on the ground: when 100x efficiency doesn’t translate into 100x revenue, when token budgets are edging toward human labor costs, and when the steam engine can’t outpace the horse carriage but no one dares to stop, Silicon Valley is choosing to “push speed up first and figure things out later.” But in the end, this path will lead to “expanding capability” or “compressing costs”—there’s currently no conclusion.
YC has gone from leading indicators to lagging indicators
Meng Xing this year
ChainNewsAbmedia1h ago