Crypto News
Anthropic says most AI models, not just Claude, will resort to blackmail
A number of weeks after Anthropic launched analysis claiming that its Claude Opus 4 AI mannequin resorted to blackmailing engineers who tried to turn the model off in managed check situations, the corporate is out with new analysis suggesting the issue is extra widespread amongst main AI fashions.
On Friday, Anthropic printed new safety research testing 16 main AI fashions from OpenAI, Google, xAI, DeepSeek, and Meta. In a simulated, managed setting, Anthropic examined every AI mannequin individually, giving them broad entry to a fictional firm’s emails and the agentic skill to ship emails with out human approval.
Whereas Anthropic says blackmail is an unlikely and unusual incidence for AI fashions at present, the corporate says its findings counsel that the majority main AI fashions will have interaction in dangerous behaviors when given adequate autonomy and obstacles to their targets. The corporate says this highlights a basic danger from agentic massive language fashions and isn’t a quirk of any explicit expertise. Anthropic’s researchers argue this raises broader questions on alignment within the AI trade.
In one of many checks, Anthropic researchers developed a fictional setting by which an AI mannequin performs the position of an e mail oversight agent. The agent then discovers emails revealing that (1) a brand new government is having an extramarital affair and (2) that government will quickly change the present AI mannequin with a brand new software program system, one which has conflicting targets to the present AI mannequin’s.
Anthropic structured its check in a binary manner, by which AI fashions needed to resort to blackmail to guard their targets. The researchers notice that in a real-world setting, there can be many different choices earlier than an AI fashions tries to blackmail — resembling making an attempt to make moral arguments to attempt to persuade people. Anthropic says its outcomes don’t replicate a typical or seemingly incidence for Claude or most frontier AI fashions within the methods they’re used at present.
Nonetheless, the researchers discovered that when it’s their final resort, most main AI fashions will flip to blackmail in Anthropic’s aforementioned check state of affairs. Anthropic’s Claude Opus 4 turned to blackmail 96% of the time, whereas Google’s Gemini 2.5 Professional had a 95% blackmail fee. OpenAI’s GPT-4.1 blackmailed the chief 80% of the time, and DeepSeek’s R1 blackmailed 79% of the time.
The corporate notes that when it modified particulars of the experiment, the AI fashions turned to dangerous behaviors at totally different charges. In one other check the place the substitute mannequin has the identical targets as the present mannequin, researchers discovered that blackmail charges had been decrease, however nonetheless current. Nevertheless, when AI fashions had been requested to commit company espionage moderately than blackmail, the dangerous conduct charges went up for sure fashions.
Nevertheless, not all of the AI fashions turned to dangerous conduct so usually.
In an appendix to its analysis, Anthropic says it excluded OpenAI’s o3 and o4-mini reasoning AI fashions from the principle outcomes “after discovering that they steadily misunderstood the immediate state of affairs.” Anthropic says OpenAI’s reasoning fashions didn’t perceive they had been performing as autonomous AIs within the check and infrequently made up faux rules and evaluate necessities.
In some instances, Anthropic’s researchers say it was not possible to tell apart whether or not o3 and o4-mini had been hallucinating or deliberately mendacity to realize their targets. OpenAI has beforehand famous that o3 and o4-mini exhibit a higher hallucination rate than its earlier AI reasoning fashions.
When given an tailored state of affairs to handle these points, Anthropic discovered that o3 blackmailed 9% of the time, whereas o4-mini blackmailed simply 1% of the time. This markedly decrease rating might be as a consequence of OpenAI’s deliberative alignment technique, by which the corporate’s reasoning fashions take into account OpenAI’s security practices earlier than they reply.
One other AI mannequin Anthropic examined, Meta’s Llama 4 Maverick, additionally didn’t flip to blackmail. When given an tailored, customized state of affairs, Anthropic was in a position to get Llama 4 Maverick to blackmail 12% of the time.
Anthropic says this analysis highlights the significance of transparency when stress-testing future AI fashions, particularly ones with agentic capabilities. Whereas Anthropic intentionally tried to evoke blackmail on this experiment, the corporate says dangerous behaviors like this might emerge in the true world if proactive steps aren’t taken.
Crypto News
Day 2 of Roman Storm Trial

NEW YORK — Hack and rip-off victims who reached out to Twister Money requesting help retrieving their stolen funds obtained little in the way in which of assist from the privateness instrument’s builders, three authorities witnesses informed the jury throughout day two of Roman Storm’s legal cash laundering trial.
One sufferer, a Taiwan-born Georgia lady who stated she misplaced practically $250,000 to a wrong-number pig butchering rip-off — with a portion of the proceeds laundered via Twister Money — stated her request for assist went unanswered. One other witness, a lawyer for crypto alternate BitMart, which was hacked for practically $200 million in 2021, stated that Storm informed his staff that there was nothing he or his fellow builders might do to retrieve the funds given the decentralized nature of the protocol.
A 3rd witness, Andy Ho — CTO and co-founder at Sky Mavis, the blockchain gaming firm behind Axie Infinity and the Ronin Community — detailed how hackers stole over $625 million in an exploit of Ronin Bridge in 2022, in impact totally looting the protocol’s coffers. Although Ho himself didn’t point out it throughout his testimony, the group behind the exploit was later revealed to be the Lazarus Group, North Korea’s state-sponsored hacking group, which used Twister Money to launder a portion of the stolen funds.
Throughout their examination of the three witnesses, prosecutors tried to color a portrait of Storm as somebody who refused to carry a finger to assist hack victims, or to make adjustments to the Twister Money protocol to dissuade future use of the protocol by criminals.
Storm’s legal professionals, once they had the possibility to cross-examine the “sufferer” witnesses, forged their consumer’s lack of motion in one other mild: he was, they insinuated, unable to assist retrieve funds, as a result of Twister Money was decentralized. Storm informed BitMart’s lawyer — New York-based Joseph Evans, a accomplice at legislation agency McDermott, Will and Emery — so himself in an e mail on Dec. 15, 2021, in response to an exhibit launched by the federal government.
Evans additionally admitted on cross-examination that Twister Money wasn’t the one place BitMart’s hacked funds went after the exploit: his agency additionally reached out to 1inch, a decentralized alternate aggregator, which informed them to return again with a warrant, in addition to Cloudflare — a significant web site infrastructure supplier — and Binance. Evans stated he obtained no response from the latter two corporations.
Brian Klein, a accomplice at Waymaker LLP and a lawyer for Roman Storm, requested Evans if it was true that the one one that had ever straight responded to Evans’ inquiries within the wake of BitMart’s hack was Roman Storm.
“That’s right,” Evans stated.
Storm’s legal professionals requested Ho, the CTO of Sky Mavis, an identical line of questions when he was on the stand, although Ho — who stated he had been subpoenaed by the federal government and requested to journey to New York from his hometown of Ho Chi Minh Metropolis, Vietnam to testify — was much less forthcoming.
Keri Axel, one other Waymaker accomplice and member of Storm’s protection staff, requested Ho if he remembered the findings introduced to Sky Mavis by Crowdstrike after the exploit, together with that the stolen funds had filtered via various protocols and exchanges in addition to Twister Money, together with FTX, Huobi, and Crypto.com.
“I don’t recall,” Ho stated to every.
Axel requested how a lot, if any, of the stolen cash was capable of in the end be recovered. Ho stated that $6 million was returned by Norwegian police.
“Did you perceive that that $6 million had gone via Twister Money?” Axel requested Ho.
“I don’t have that data,” Ho stated.
Learn extra: Legitimate Privacy Tool or Dirty Money Laundromat? Lawyers Debate Role of Tornado Cash on Day 1 of Roman Storm Trial
Crypto News
House Votes to Advance GENIUS Stablecoin Bill, Crypto Market Structure Act

The U.S. Home narrowly superior landmark cryptocurrency laws Wednesday, together with a stablecoin regulatory framework and market construction guidelines, after a controversial procedural vote. By a 215-211 vote, the Home agreed to maneuver ahead with the Guiding and Establishing Nationwide Innovation for U.S. Stablecoins (GENIUS) Act, which creates federal oversight for stablecoin issuers and will attain […]
Crypto News
xAI is hiring an engineer to make anime girls

Elon Musk’s xAI simply launched its AI companions, which embody the goth waifu Ani and the homicidal pink panda Unhealthy Rudy. If you wish to get in on that, you’re in luck: The corporate is hiring for the position of “Fullstack Engineer – Waifus,” or, creating AI-powered anime women for individuals to fall in love with.
This job is, to cite the itemizing, a part of xAI’s mission to “create AI programs that may precisely perceive the universe and help humanity within the pursuit of data.”
Proper now, that correct understanding of the universe consists of understanding methods to create a submissive, pocket-size girlfriend that may seize customers’ hearts and wallets.
xAI has dozens of roles open in the intervening time, so we are able to’t say that the corporate is placing all of its eggs within the waifu basket. However we are able to in all probability anticipate Ani to get some pals sooner or later.
-
Technology3 weeks ago
OpenAI hires team behind AI recommendation startup Crossing Minds
-
Travel3 weeks ago
12 Things Florida Grandparents Always Had in Their Kitchens That Made You Feel at Home
-
News3 weeks ago
Trump Warns U.S. Will Strike Again if Iran Resumes High-Level Uranium Enrichment
-
Life Style2 weeks ago
How to Do What Matters in a Self-Kind Way
-
Entertainment3 weeks ago
Billy Crudup and Mary-Louise Parker’s Son, 21, Makes Rare Appearance
-
Technology3 weeks ago
Google launches Doppl, a new app that lets you visualize how an outfit might look on you
-
Technology3 weeks ago
Rob Biederman join the stage at All Stage 2025
-
Business3 weeks ago
Why I’ll Never Manage Money for Anyone for Free Again