Technology
Open source LLMs hit Europe’s digital sovereignty roadmap
Large language models (LLMs) landed on Europe’s digital sovereignty agenda with a bang last week, as news emerged of a new program to develop a series of “truly” open source LLMs covering all European Union languages.
This includes the current 24 official EU languages, as well as languages for countries currently negotiating for entry to the EU market, such as Albania. Future-proofing is the name of the game.
OpenEuroLLM is a collaboration between some 20 organizations, co-led by Jan Hajič, a computational linguist from the Charles University in Prague, and Peter Sarlin, CEO and co-founder of Finnish AI lab Silo AI, which AMD acquired last year for $665 million.
The project fits a broader narrative that has seen Europe push digital sovereignty as a priority, enabling it to bring mission-critical infrastructure and tools closer to home. Most of the cloud giants are investing in local infrastructure to ensure EU data stays local, while AI darling OpenAI recently unveiled a new offering that allows customers to process and store data in Europe.
Elsewhere, the EU recently signed an $11 billion deal to create a sovereign satellite constellation to rival Elon Musk’s Starlink.
So OpenEuroLLM is certainly on-brand.
However, the stated budget just for building the models themselves is €37.4 million, with roughly €20 million coming from the EU’s Digital Europe Programme — a drop in the ocean compared to what the giants of the corporate AI world are investing. The actual budget is more when you factor in funding allocated for tangential and related work, and arguably the biggest expense is compute. The OpenEuroLLM project’s partners include EuroHPC supercomputer centers in Spain, Italy, Finland, and the Netherlands — and the broader EuroHPC project has a budget of around €7 billion.
But the sheer number of disparate participating parties, spanning academia, research, and corporations, have led many to question whether its goals are achievable. Anastasia Stasenko, co-founder of LLM company Pleias, questioned whether a “sprawling consortia of 20+ organizations” could have the same measured focus of a homegrown private AI firm.
“Europe’s recent successes in AI shine through small focused teams like Mistral AI and LightOn — companies that truly own what they’re building,” Stasenko wrote. “They carry immediate responsibility for their choices, whether in finances, market positioning, or reputation.”
Up to scratch
The OpenEuroLLM project is either starting from scratch or it has a head start — depending on how you look at it.
Since 2022, Hajič has also been coordinating the High Performance Language Technologies (HPLT) project, which has set out to develop free and reusable datasets, models, and workflows using high-performance computing (HPC). That project is scheduled to end in late 2025, but it can be viewed as a sort of “predecessor” to OpenEuroLLM, according to Hajič, given that most of the partners on HPLT (aside from the U.K. partners) are participating here, too.
“This [OpenEuroLLM] is really just a broader participation, but more focused on generative LLMs,” Hajič said. “So it’s not starting from zero in terms of data, expertise, tools, and compute experience. We have assembled people who know what they’re doing — we should be able to get up to speed quickly.”
Hajič said that he expects the first version(s) to be released by mid-2026, with the final iteration(s) arriving by the project’s conclusion in 2028. But those goals might still seem lofty when you consider that there isn’t much to poke at yet beyond a bare-bones GitHub profile.
“In that respect, we are starting from scratch — the project started on Saturday [February 1],” Hajič said. “But we have been preparing the project for a year [the tender process opened in February 2024].”
From academia and research, organizations spanning Czechia, the Netherlands, Germany, Sweden, Finland, and Norway are part of the OpenEuroLLM cohort, in addition to the EuroHPC centers. From the corporate world, Finland’s AMD-owned AI lab Silo AI is on board, as are Aleph Alpha (Germany), Ellamind (Germany), Prompsit Language Engineering (Spain), and LightOn (France).
One notable omission from the list is that of French AI unicorn Mistral, which has positioned itself as an open source alternative to incumbents such as OpenAI. While nobody from Mistral responded to TechCrunch for comment, Hajič did confirm that he tried to initiate conversations with the startup, but to no avail.
“I tried to approach them, but it hasn’t resulted in a focused discussion about their participation,” Hajič said.
The project could still gather new participants as part of the EU program that’s providing funding, though it will be limited to EU organizations. This means that entities from the U.K. and Switzerland won’t be able to take part. This flies in contrast to the Horizon R&D program, which the U.K. rejoined in 2023 after a prolonged Brexit stalemate and which provided funding to HPLT.
Build up
The project’s top-line goal, as per its tagline, is to create: “A series of foundation models for transparent AI in Europe.” Additionally, these models should preserve the “linguistic and cultural diversity” of all EU languages — current and future.
What this translates to in terms of deliverables is still being ironed out, but it will likely mean a core multilingual LLM designed for general-purpose tasks where accuracy is paramount. And then also smaller “quantized” versions, perhaps for edge applications where efficiency and speed are more important.
“This is something we still have to make a detailed plan about,” Hajič said. “We want to have it as small but as high-quality as possible. We don’t want to release something which is half-baked, because from the European point-of-view this is high-stakes, with lots of money coming from the European Commission — public money.”
While the goal is to make the model as proficient as possible in all languages, attaining equality across the board could also be challenging.
“That is the goal, but how successful we can be with languages with scarce digital resources is the question,” Hajič said. “But that’s also why we want to have true benchmarks for these languages, and not to be swayed toward benchmarks which are perhaps not representative of the languages and the culture behind them.“
In terms of data, this is where a lot of the work from the HPLT project will prove fruitful, with version 2.0 of its dataset released four months ago. This dataset was trained 4.5 petabytes of web crawls and more than 20 billion documents, and Hajič said that they will add additional data from Common Crawl (an open repository of web-crawled data) to the mix.
The open source definition
In traditional software, the perennial struggle between open source and proprietary revolves around the “true” meaning of “open source.” This can be resolved by deferring to the formal “definition” as per the Open Source Initiative, the industry stewards of what are and aren’t legitimate open source licenses.
More recently, the OSI has formed a definition of “open source AI,” though not everyone is happy with the outcome. Open source AI proponents argue that not only models should be freely available, but also the datasets, pretrained models, weights — the full shebang. The OSI’s definition doesn’t make training data mandatory, because it says AI models are often trained on proprietary data or data with redistribution restrictions.
Suffice it to say, the OpenEuroLLM is facing these same quandaries, and despite its intentions to be “truly open,” it will probably have to make some compromises if it’s to fulfill its “quality” obligations.
“The goal is to have everything open. Now, of course, there are some limitations,” Hajič said. “We want to have models of the highest quality possible, and based on the European copyright directive we can use anything we can get our hands on. Some of it cannot be redistributed, but some of it can be stored for future inspection.”
What this means is that the OpenEuroLLM project might have to keep some of the training data under wraps, but be made available to auditors upon request — as required for high-risk AI systems under the terms of the EU AI Act.
“We hope that most of the data [will be open], especially the data coming from the Common Crawl,” Hajič said. “We would like to have it all completely open, but we will see. In any case, we will have to comply with AI regulations.”
Two for one
Another criticism that emerged in the aftermath of OpenEuroLLM’s formal unveiling was that a very similar project launched in Europe just a few short months previous. EuroLLM, which launched its first model in September and a follow-up in December, is co-funded by the EU alongside a consortium of nine partners. These include academic institutions such as the University of Edinburgh and corporations such as Unbabel, which last year won millions of GPU training hours on EU supercomputers.
EuroLLM shares similar goals to its near-namesake: “To build an open source European Large Language Model that supports 24 Official European Languages, and a few other strategically important languages.”
Andre Martins, head of research at Unbabel, took to social media to highlight these similarities, noting that OpenEuroLLM is appropriating a name that already exists. “I hope the different communities collaborate openly, share their expertise, and don’t decide to reinvent the wheel every time a new project gets funded,” Martins wrote.
Hajič called the situation “unfortunate,” adding that he hoped they might be able to cooperate, though he stressed that due to the source of its funding in the EU, OpenEuroLLM is restricted in terms of its collaborations with non-EU entities, including U.K. universities.
Funding gap
The arrival of China’s DeepSeek, and the cost-to-performance ratio it promises, has given some encouragement that AI initiatives might be able to do far more with much less than initially thought. However, over the past few weeks, many have questioned the true costs involved in building DeepSeek.
“With respect to DeepSeek, we actually know very little about what exactly went into building it,” Peter Sarlin, who is technical co-lead on the OpenEuroLLM project, told TechCrunch.
Regardless, Sarlin reckons OpenEuroLLM will have access to sufficient funding, as it’s mostly to cover people. Indeed, a large chunk of the costs of building AI systems is compute, and that should mostly be covered through its partnership with the EuroHPC centers.
“You could say that OpenEuroLLM actually has quite a significant budget,” Sarlin said. “EuroHPC has invested billions in AI and compute infrastructure, and have committed billions more into expanding that in the coming few years.”
It’s also worth noting that the OpenEuroLLM project isn’t building toward a consumer- or enterprise-grade product. It’s purely about the models, and this is why Sarlin reckons the budget it has should be ample.
“The intent here isn’t to build a chatbot or an AI assistant — that would be a product initiative requiring a lot of effort, and that’s what ChatGPT did so well,” Sarlin said. “What we’re contributing is an open source foundation model that functions as the AI infrastructure for companies in Europe to build upon. We know what it takes to build models, it’s not something you need billions for.”
Since 2017, Sarlin has spearheaded AI lab Silo AI, which launched — in partnership with others, including the HPLT project — the family of Poro and Viking open models. These already support a handful of European languages, but the company is now readying the next iteration “Europa” models, which will cover all European languages.
And this ties in with the whole “not starting from scratch” notion espoused by Hajič — there is already a bedrock of expertise and technology in place.
Sovereign state
As critics have noted, OpenEuroLLM does have a lot of moving parts — which Hajič acknowledges, albeit with a positive outlook.
“I’ve been involved in many collaborative projects, and I believe it has its advantages versus a single company,” he said. “Of course they’ve done great things at the likes of OpenAI to Mistral, but I hope that the combination of academic expertise and the companies’ focus could bring something new.”
And in many ways, it’s not about trying to outmaneuver Big Tech or billion-dollar AI startups; the ultimate goal is digital sovereignty: (mostly) open foundation LLMs built by, and for, Europe.
“I hope this won’t be the case, but if, in the end, we are not the number one model, and we have a ‘good’ model, then we will still have a model with all the components based in Europe,” Hajič said. “This will be a positive result.”
Technology
Pintarnya raises $16.7M to power jobs and financial services in Indonesia
Pintarnya, an Indonesian employment platform that goes beyond job matching by offering financial services along with full-time and side-gig opportunities, said it has raised a $16.7 million Series A round.
The funding was led by Square Peg with participation from existing investors Vertex Venture Southeast Asia & India and East Ventures.
Ghirish Pokardas, Nelly Nurmalasari, and Henry Hendrawan founded Pintarnya in 2022 to tackle two of the biggest challenges Indonesians face daily: earning enough and borrowing responsibly.
“Traditionally, mass workers in Indonesia find jobs offline through job fairs or word of mouth, with employers buried in paper applications and candidates rarely hearing back. For borrowing, their options are often limited to family/friend or predatory lenders with harsh collection practices,” Henry Hendrawan, co-founder of Pintarnya, told TechCrunch. “We digitize job matching with AI to make hiring faster and we provide workers with safer, healthier lending options — designed around what they can reasonably afford, rather than pushing them deeper into debt.”
Around 59% of Indonesia’s 150 million workforce is employed in the informal sector, highlighting the difficulties these workers encounter in accessing formal financial services because they lack verifiable income and official employment documentation.
Pintarnya tackles this challenge by partnering with asset-backed lenders to offer secured loans, using collateral such as gold, electronics, or vehicles, Hendrawan added.
Since its seed funding in 2022, the platform currently serves over 10 million job seeker users and 40,000 employers nationwide. Its revenue has increased almost fivefold year-over-year and expects to reach break-even by the end of the year, Hendrawn noted. Pintarnya primarily serves users aged 21 to 40, most of whom have a high school education or a diploma below university level. The startup aims to focus on this underserved segment, given the large population of blue-collar and informal workers in Indonesia.
Techcrunch event
San Francisco
|
October 27-29, 2025
“Through the journey of building employment services, we discovered that our users needed more than just jobs — they needed access to financial services that traditional banks couldn’t provide,” said Hendrawan. “We digitize job matching with AI to make hiring faster and we provide workers with safer, healthier lending options — designed around what they can reasonably afford, rather than pushing them deeper into debt.”

While Indonesia already has job platforms like JobStreet, Kalibrr, and Glints, these primarily cater to white-collar roles, which represent only a small portion of the workforce, according to Hendrawan. Pintarnya’s platform is designed specifically for blue-collar workers, offering tailored experiences such as quick-apply options for walk-in interviews, affordable e-learning on relevant skills, in-app opportunities for supplemental income, and seamless connections to financial services like loans.
The same trend is evident in Indonesia’s fintech sector, which similarly caters to white-collar or upper-middle-class consumers. Conventional credit scoring models for loans, which rely on steady monthly income and bank account activity, often leave blue-collar workers overlooked by existing fintech providers, Hendrawan explained.
When asked about which fintech services are most in demand, Hendrawan mentioned, “Given their employment status, lending is the most in-demand financial service for Pintarnya’s users today. We are planning to ‘graduate’ them to micro-savings and investments down the road through innovative products with our partners.”
The new funding will enable Pintarnya to strengthen its platform technology and broaden its financial service offerings through strategic partnerships. With most Indonesian workers employed in blue-collar and informal sectors, the co-founders see substantial growth opportunities in the local market. Leveraging their extensive experience in managing businesses across Southeast Asia, they are also open to exploring regional expansion when the timing is right.
“Our vision is for Pintarnya to be the everyday companion that empowers Indonesians to not only make ends meet today, but also plan, grow, and upgrade their lives tomorrow … In five years, we see Pintarnya as the go-to super app for Indonesia’s workers, not just for earning income, but as a trusted partner throughout their life journey,” Hendrawan said. “We want to be the first stop when someone is looking for work, a place that helps them upgrade their skills, and a reliable guide as they make financial decisions.”
Technology
OpenAI warns against SPVs and other ‘unauthorized’ investments
In a new blog post, OpenAI warns against “unauthorized opportunities to gain exposure to OpenAI through a variety of means,” including special purpose vehicles, known as SPVs.
“We urge you to be careful if you are contacted by a firm that purports to have access to OpenAI, including through the sale of an SPV interest with exposure to OpenAI equity,” the company writes. The blog post acknowledges that “not every offer of OpenAI equity […] is problematic” but says firms may be “attempting to circumvent our transfer restrictions.”
“If so, the sale will not be recognized and carry no economic value to you,” OpenAI says.
Investors have increasingly used SPVs (which pool money for one-off investments) as a way to buy into hot AI startups, prompting other VCs to criticize them as a vehicle for “tourist chumps.”
Business Insider reports that OpenAI isn’t the only major AI company looking to crack down on SPVs, with Anthropic reportedly telling Menlo Ventures it must use its own capital, not an SPV, to invest in an upcoming round.
Technology
Meta partners with Midjourney on AI image and video models
Meta is partnering with Midjourney to license the startup’s AI image and video generation technology, Meta Chief AI Officer Alexandr Wang announced Friday in a post on Threads. Wang says Meta’s research teams will collaborate with Midjourney to bring its technology into future AI models and products.
“To ensure Meta is able to deliver the best possible products for people it will require taking an all-of-the-above approach,” Wang said. “This means world-class talent, ambitious compute roadmap, and working with the best players across the industry.”
The Midjourney partnership could help Meta develop products that compete with industry-leading AI image and video models, such as OpenAI’s Sora, Black Forest Lab’s Flux, and Google’s Veo. Last year, Meta rolled out its own AI image generation tool, Imagine, into several of its products, including Facebook, Instagram, and Messenger. Meta also has an AI video generation tool, Movie Gen, that allows users to create videos from prompts.
The licensing agreement with Midjourney marks Meta’s latest deal to get ahead in the AI race. Earlier this year, CEO Mark Zuckerberg went on a hiring spree for AI talent, offering some researchers compensation packages worth upwards of $100 million. The social media giant also invested $14 billion in Scale AI, and acquired the AI voice startup Play AI.
Meta has held talks with several other leading AI labs about other acquisitions, and Zuckerberg even spoke with Elon Musk about joining his $97 billion takeover bid of OpenAI (Meta ultimately did not join the offer, and OpenAI denied Musk’s bid).
While the terms of Meta’s deal with Midjourney remain unknown, the startup’s CEO, David Holz, said in a post on X that his company remains independent with no investors; Midjourney is one of the few leading AI model developers that has never taken on outside funding. At one point, Meta talked with Midjourney about acquiring the startup, according to Upstarts Media.
Midjourney was founded in 2022 and quickly became a leader in the AI image generation space for its realistic, unique style. By 2023, the startup was reportedly on pace to generate $200 million in revenue. The startup sells subscriptions starting at $10 per month. It offers pricier tiers, which offer more AI image generations, that cost as much as $120 per month. In June, the startup released its first AI video model, V1.
Techcrunch event
San Francisco
|
October 27-29, 2025
Meta’s partnership with Midjourney comes just two months after the startup was sued by Disney and Universal, alleging that it trained AI image models on copyrighted works. Several AI model developers — including Meta — face similar allegations from copyright holders, however, recent court cases pertaining to AI training data have sided with tech companies.
Got a sensitive tip or confidential documents? We’re reporting on the inner workings of the AI industry — from the companies shaping its future to the people impacted by their decisions. Reach out to Rebecca Bellan at [email protected] and Maxwell Zeff at [email protected]. For secure communication, you can contact us via Signal at @rebeccabellan.491 and @mzeff.88.
We’re always looking to evolve, and by providing some insight into your perspective and feedback into TechCrunch and our coverage and events, you can help us! Fill out this survey to let us know how we’re doing and get the chance to win a prize in return!
