Technology

Google Gemini: Everything you need to know about the generative AI models

Google’s trying to make waves with Gemini, its flagship suite of generative AI models, apps, and services. But what’s Gemini? How can you use it? And how does it stack up to other generative AI tools such as OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?

To make it easier to keep up with the latest Gemini developments, we’ve put together this handy guide, which we’ll keep updated as new Gemini models, features, and news about Google’s plans for Gemini are released.

What is Gemini?

Gemini is Google’s long-promised, next-gen generative AI model family. Developed by Google’s AI research labs DeepMind and Google Research, it comes in four flavors:

Gemini Ultra, a very large model.
Gemini Pro, a large model – though smaller than Ultra. The latest version, Gemini 2.0 Pro Experimental, is Google’s flagship.
Gemini Flash, a speedier, “distilled” version of Pro. It also comes in a slightly smaller and faster version, called Gemini Flash-Lite, and a version with reasoning capabilities, called Gemini Flash Thinking Experimental.
Gemini Nano, two small models: Nano-1 and the slightly more capable Nano-2, which is meant to run offline

All Gemini models were trained to be natively multimodal — that is, able to work with and analyze more than just text. Google says they were pre-trained and fine-tuned on a variety of public, proprietary, and licensed audio, images, and videos; a set of codebases; and text in different languages.

This sets Gemini apart from models such as Google’s own LaMDA, which was trained exclusively on text data. LaMDA can’t understand or generate anything beyond text (e.g., essays, emails, and so on), but that isn’t necessarily the case with Gemini models.

We’ll note here that the ethics and legality of training models on public data, in some cases without the data owners’ knowledge or consent, are murky. Google has an AI indemnification policy to shield certain Google Cloud customers from lawsuits should they face them, but this policy contains carve-outs. Proceed with caution — particularly if you’re intending on using Gemini commercially.

What’s the difference between the Gemini apps and Gemini models?

Gemini is separate and distinct from the Gemini apps on the web and mobile (formerly Bard).

The Gemini apps are clients that connect to various Gemini models and layer a chatbot-like interface on top. Think of them as front ends for Google’s generative AI, analogous to ChatGPT and Anthropic’s Claude family of apps.

Google Gemini mobile app — **Image Credits:**Google

Gemini on the web lives here. On Android, the Gemini app replaces the existing Google Assistant app. And on iOS, the Google and Google Search apps serve as that platform’s Gemini clients.

On Android, it also recently became possible to bring up the Gemini overlay on top of any app to ask questions about what’s on the screen (e.g., a YouTube video). Just press and hold a supported smartphone’s power button or say, “Hey Google”; you’ll see the overlay pop up.

Gemini apps can accept images as well as voice commands and text — including files like PDFs and soon videos, either uploaded or imported from Google Drive — and generate images. As you’d expect, conversations with Gemini apps on mobile carry over to Gemini on the web and vice versa if you’re signed in to the same Google Account in both places.

Gemini Advanced

The Gemini apps aren’t the only means of recruiting Gemini models’ assistance with tasks. Slowly but surely, Gemini-imbued features are making their way into staple Google apps and services like Gmail and Google Docs.

To take advantage of most of these, you’ll need the Google One AI Premium Plan. Technically a part of Google One, the AI Premium Plan costs $20 and provides access to Gemini in Google Workspace apps like Docs, Maps, Slides, Sheets, Drive, and Meet. It also enables what Google calls Gemini Advanced, which brings the company’s more sophisticated Gemini models to the Gemini apps.

Gemini Advanced users get extras here and there, too, like priority access to new features, the ability to run and edit Python code directly in Gemini, and a larger “context window.” Gemini Advanced can remember the content of — and reason across — roughly 750,000 words in a conversation (or 1,500 pages of documents). That’s compared to the 24,000 words (or 48 pages) the vanilla Gemini app can handle.

Screenshot of a Google Gemini commercial — **Image Credits:**Google

Gemini Advanced also gives users access to Google’s Deep Research feature, which uses “advanced reasoning” and “long context capabilities” to generate research briefs. After you prompt the chatbot, it creates a multi-step research plan, asks you to approve it, and then Gemini takes a few minutes to search the web and generate an extensive report based on your query. It’s meant to answer more complex questions such as, “Can you help me redesign my kitchen?”

Google also offers Gemini Advanced users a memory feature, that allows the chatbot to use your old conversations with Gemini as context for your current conversation. Gemini Advanced users also get increased usage for NotebookLM, the company’s product that turns PDFs into AI-generated podcasts.

Gemini Advanced users also get access to Google’s experimental version of Gemini 2.0 Pro, the company’s flagship model that’s optimized for difficult coding and math problems.

Another Gemini Advanced exclusive is trip planning in Google Search, which creates custom travel itineraries from prompts. Taking into account things like flight times (from emails in a user’s Gmail inbox), meal preferences, and information about local attractions (from Google Search and Maps data), as well as the distances between those attractions, Gemini will generate an itinerary that updates automatically to reflect any changes.

Gemini across Google services is also available to corporate customers through two plans, Gemini Business (an add-on for Google Workspace) and Gemini Enterprise. Gemini Business costs as low as $6 per user per month, while Gemini Enterprise — which adds meeting note-taking and translated captions as well as document classification and labeling — is generally more expensive, but is priced based on a business’s needs. (Both plans require an annual commitment.)

In Gmail, Gemini lives in a side panel that can write emails and summarize message threads. You’ll find the same panel in Docs, where it helps you write and refine your content and brainstorm new ideas. Gemini in Slides generates slides and custom images. And Gemini in Google Sheets tracks and organizes data, creating tables and formulas.

Google’s AI chatbot recently came to Maps, where Gemini can summarize reviews about coffee shops or offer recommendations about how to spend a day visiting a foreign city.

Gemini’s reach extends to Drive as well, where it can summarize files and folders and give quick facts about a project. In Meet, meanwhile, Gemini translates captions into additional languages.

Gemini in Gmail — **Image Credits:**Google

Gemini recently came to Google’s Chrome browser in the form of an AI writing tool. You can use it to write something completely new or rewrite existing text; Google says it’ll consider the web page you’re on to make recommendations.

Elsewhere, you’ll find hints of Gemini in Google’s database products, cloud security tools, and app development platforms (including Firebase and Project IDX), as well as in apps like Google Photos (where Gemini handles natural language search queries), YouTube (where it helps brainstorm video ideas), and the NotebookLM note-taking assistant.

Code Assist (formerly Duet AI for Developers), Google’s suite of AI-powered assistance tools for code completion and generation, is offloading heavy computational lifting to Gemini. So are Google’s security products underpinned by Gemini, like Gemini in Threat Intelligence, which can analyze large portions of potentially malicious code and let users perform natural language searches for ongoing threats or indicators of compromise.

Gemini extensions and Gems

Announced at Google I/O 2024, Gemini Advanced users can create Gems, custom chatbots powered by Gemini models. Gems can be generated from natural language descriptions — for example, “You’re my running coach. Give me a daily running plan” — and shared with others or kept private.

Gems are available on desktop and mobile in 150 countries and most languages. Eventually, they’ll be able to tap an expanded set of integrations with Google services, including Google Calendar, Tasks, Keep, and YouTube Music, to complete custom tasks.

Speaking of integrations, the Gemini apps on the web and mobile can tap into Google services via what Google calls “Gemini extensions.” Gemini today integrates with Google Drive, Gmail, and YouTube to respond to queries such as “Could you summarize my last three emails?” Later this year, Gemini will be able to take additional actions with Google Calendar, Keep, Tasks, YouTube Music and Utilities, the Android-exclusive apps that control on-device features like timers and alarms, media controls, the flashlight, volume, Wi-Fi, Bluetooth, and so on.

Gemini Live in-depth voice chats

An experience called Gemini Live allows users to have “in-depth” voice chats with Gemini. It’s available in the Gemini apps on mobile and the Pixel Buds Pro 2, where it can be accessed even when your phone’s locked.

With Gemini Live enabled, you can interrupt Gemini while the chatbot’s speaking (in one of several new voices) to ask a clarifying question, and it’ll adapt to your speech patterns in real time. At some point, Gemini is supposed to gain visual understanding, allowing it to see and respond to your surroundings, either via photos or video captured by your smartphones’ cameras.

Live is also designed to serve as a virtual coach of sorts, helping you rehearse for events, brainstorm ideas, and so on. For instance, Live can suggest which skills to highlight in an upcoming job or internship interview, and it can give public speaking advice.

You can read our review of Gemini Live here. Spoiler alert: We think the feature has a ways to go before it’s super useful — but it’s early days, admittedly.

Image generation via Imagen 3

Gemini users can generate artwork and images using Google’s built-in Imagen 3 model.

Google says that Imagen 3 can more accurately understand the text prompts that it translates into images versus its predecessor, Imagen 2, and is more “creative and detailed” in its generations. In addition, the model produces fewer artifacts and visual errors (at least according to Google), and is the best Imagen model yet for rendering text.

Google Imagen 3 — A sample from Imagen 3.Image Credits:Google

Back in February 2024, Google was forced to pause Gemini’s ability to generate images of people after users complained of historical inaccuracies. But in August, the company reintroduced people generation for certain users, specifically English-language users signed up for one of Google’s paid Gemini plans (e.g., Gemini Advanced) as part of a pilot program.

Gemini for teens

In June, Google introduced a teen-focused Gemini experience, allowing students to sign up via their Google Workspace for Education school accounts.

The teen-focused Gemini has “additional policies and safeguards,” including a tailored onboarding process and an “AI literacy guide” to (as Google phrases it) “help teens use AI responsibly.” Otherwise, it’s nearly identical to the standard Gemini experience, down to the “double check” feature that looks across the web to see if Gemini’s responses are accurate.

Gemini in smart home devices

A growing number of Google-made devices tap Gemini for enhanced functionality, from the Google TV Streamer to the Pixel 9 and 9 Pro to the newest Nest Learning Thermostat.

On the Google TV Streamer, Gemini uses your preferences to curate content suggestions across your subscriptions and summarize reviews and even whole seasons of TV.

Google TV Streamer set up — **Image Credits:**Google

On the latest Nest thermostat (as well as Nest speakers, cameras, and smart displays), Gemini will soon bolster Google Assistant’s conversational and analytic capabilities.

Subscribers to Google’s Nest Aware plan later this year will get a preview of new Gemini-powered experiences like AI descriptions for Nest camera footage, natural language video search and recommended automations. Nest cameras will understand what’s happening in real-time video feeds (e.g., when a dog’s digging in the garden), while the companion Google Home app will surface videos and create device automations given a description (e.g., “Did the kids leave their bikes in the driveway?,” “Have my Nest thermostat turn on the heating when I get home from work every Tuesday”).

Google Gemini in smart home — Gemini will soon be able to summarize security camera footage from Nest devices.Image Credits:Google

Also later this year, Google Assistant will get a few upgrades on Nest-branded and other smart home devices to make conversations feel more natural. Improved voices are on the way, in addition to the ability to ask follow-up questions and “[more] easily go back and forth.”

What can the Gemini models do?

Because Gemini models are multimodal, they can perform a range of multimodal tasks, from transcribing speech to captioning images and videos in real time. Many of these capabilities have reached the product stage (as alluded to in the previous section), and Google is promising much more in the not-too-distant future.

Of course, it’s a bit hard to take the company at its word. Google seriously underdelivered with the original Bard launch. More recently, it ruffled feathers with a video purporting to show Gemini’s capabilities that was more or less aspirational — not live.

Also, Google offers no fix for some of the underlying problems with generative AI tech today, like its encoded biases and tendency to make things up (i.e., hallucinate). Neither do its rivals, but it’s something to keep in mind when considering using or paying for Gemini.

Assuming for the purposes of this article that Google is being truthful with its recent claims, here’s what the different tiers of Gemini can do now and what they’ll be able to do once they reach their full potential:

What you can do with Gemini Ultra

Google says that Gemini Ultra — thanks to its multimodality — can be used to help with things like physics homework, solving problems step-by-step on a worksheet, and pointing out possible mistakes in already filled-in answers.

However, we haven’t seen much of Gemini Ultra in recent months. The model does not appear in the Gemini app, and isn’t listed on Google Gemini’s API pricing page. However, that doesn’t mean Google won’t bring Gemini Ultra back to the forefront of its offerings in the future.

Ultra can also be applied to tasks such as identifying scientific papers relevant to a problem, Google says. The model can extract information from several papers, for instance, and update a chart from one by generating the formulas necessary to re-create the chart with more timely data.

Gemini Ultra technically supports image generation. But that capability hasn’t made its way into the productized version of the model yet — perhaps because the mechanism is more complex than how apps such as ChatGPT generate images. Rather than feed prompts to an image generator (like DALL-E 3, in ChatGPT’s case), Gemini outputs images “natively,” without an intermediary step.

Ultra is available as an API through Vertex AI, Google’s fully managed AI dev platform, and AI Studio, Google’s web-based tool for app and platform developers.

Gemini Pro’s capabilities

Google says that its latest Pro model, Gemini 2.0 Pro, is its best model yet for coding performance and complex prompts. It’s currently available as an experimental version, meaning it can have unexpected issues.

Gemini 2.0 Pro outperforms its predecessor, Gemini 1.5 Pro, in benchmarks measuring coding, reasoning, math, and factual accuracy. The model can take in up to 1.4 million words, two hours of video, or 22 hours of audio and can reason across or answer questions about that data (more or less).

However, Gemini 1.5 Pro still powers Google’s Deep Research feature.

Gemini 2.0 Pro works alongside a feature called code execution, released in June alongside Gemini 1.5 Pro, which aims to reduce bugs in code that the model generates by iteratively refining that code over several steps. (Code execution also supports Gemini Flash.)

Within Vertex AI, developers can customize Gemini Pro to specific contexts and use cases via a fine-tuning or “grounding” process. For example, Pro (along with other Gemini models) can be instructed to use data from third-party providers like Moody’s, Thomson Reuters, ZoomInfo and MSCI, or source information from corporate datasets or Google Search instead of its wider knowledge bank. Gemini Pro can also be connected to external, third-party APIs to perform particular actions, like automating a back-office workflow.

AI Studio offers templates for creating structured chat prompts with Pro. Developers can control the model’s creative range and provide examples to give tone and style instructions — and also tune Pro’s safety settings.

Vertex AI Agent Builder lets people build Gemini-powered “agents” within Vertex AI. For example, a company could create an agent that analyzes previous marketing campaigns to understand a brand style and then apply that knowledge to help generate new ideas consistent with the style.

Gemini Flash is lighter but packs a punch

Google calls Gemini 2.0 Flash its AI model for the agentic era. The model can natively generate images and audio, in addition to text, and can use tools like Google Search and interact with external APIs.

The 2.0 Flash model is faster than Gemini’s previous generation of models and even outperforms some of the larger Gemini 1.5 models on benchmarks measuring coding and image analysis. You can try Gemini 2.0 Flash in the Gemini web or mobile app, and through Google’s AI developer platforms.

In December, Google released a “thinking” version of Gemini 2.0 Flash that’s capable of “reasoning,” in which the AI model takes a few seconds to work backwards through a problem before it gives an answer.

In February, Google made Gemini 2.0 Flash thinking available in the Gemini app. The same month, Google also released a smaller version called Gemini 2.0 Flash-Lite. The company says this model outperforms its Gemini 1.5 Flash model, but runs at the same price and speed.

An offshoot of Gemini Pro that’s small and efficient, built for narrow, high-frequency generative AI workloads, Flash is multimodal like Gemini Pro, meaning it can analyze audio, video, images, and text (but it can only generate text). Google says that Flash is particularly well-suited for tasks like summarization and chat apps, plus image and video captioning and data extraction from long documents and tables.

Devs using Flash and Pro can optionally leverage context caching, which lets them store large amounts of information (e.g., a knowledge base or database of research papers) in a cache that Gemini models can quickly and relatively cheaply access. Context caching is an additional fee on top of other Gemini model usage fees, however.

Gemini Nano can run on your phone

Gemini Nano is a much smaller version of the Gemini Pro and Ultra models, and it’s efficient enough to run directly on (some) devices instead of sending the task to a server somewhere. So far, Nano powers a couple of features on the Pixel 8 Pro, Pixel 8, Pixel 9 Pro, Pixel 9 and Samsung Galaxy S24, including Summarize in Recorder and Smart Reply in Gboard.

The Recorder app, which lets users push a button to record and transcribe audio, includes a Gemini-powered summary of recorded conversations, interviews, presentations, and other audio snippets. Users get summaries even if they don’t have a signal or Wi-Fi connection — and in a nod to privacy, no data leaves their phone in process.

Pixel8Pro Recorder Summaries — **Image Credits:**Google

Nano is also in Gboard, Google’s keyboard replacement. There, it powers a feature called Smart Reply, which helps to suggest the next thing you’ll want to say when having a conversation in a messaging app such as WhatsApp.

In the Google Messages app on supported devices, Nano drives Magic Compose, which can craft messages in styles like “excited,” “formal,” and “lyrical.”

Google says that a future version of Android will tap Nano to alert users to potential scams during calls. The new weather app on Pixel phones uses Gemini Nano to generate tailored weather reports. And TalkBack, Google’s accessibility service, employs Nano to create aural descriptions of objects for low-vision and blind users.

How much do the Gemini models cost?

Gemini 1.5 Pro, 1.5 Flash, 2.0 Flash, and 2.0 Flash-Lite are available through Google’s Gemini API for building apps and services — all with free options. But the free options impose usage limits and leave out certain features, like context caching and batching.

Gemini models are otherwise pay-as-you-go. Here’s the base pricing — not including add-ons like context caching — as of September 2024:

Gemini 1.5 Pro: $1.25 per 1 million input tokens (for prompts up to 128K tokens) or $2.50 per 1 million input tokens (for prompts longer than 128K tokens); $5 per 1 million output tokens (for prompts up to 128K tokens) or $10 per 1 million output tokens (for prompts longer than 128K tokens)
Gemini 1.5 Flash: 7.5 cents per 1 million input tokens (for prompts up to 128K tokens), 15 cents per 1 million input tokens (for prompts longer than 128K tokens), 30 cents per 1 million output tokens (for prompts up to 128K tokens), 60 cents per 1 million output tokens (for prompts longer than 128K tokens)
Gemini 2.0 Flash: 10 cents per 1 million input tokens, 40 cents per 1 million output tokens. For audio specifically, it costs 70 center per 1 million input tokens, and also 40 centers per 1 million output tokens.
Gemini 2.0 Flash-Lite: 7.5 cents per 1 million input tokens, 30 cents per 1 million output tokens.

Tokens are subdivided bits of raw data, like the syllables “fan,” “tas,” and “tic” in the word “fantastic”; 1 million tokens is equivalent to about 700,000 words. Input refers to tokens fed into the model, while output refers to tokens that the model generates.

2.0 Pro pricing has yet to be announced, and Nano is still in early access.

What’s the latest on Project Astra?

Project Astra is Google DeepMind’s effort to create AI-powered apps and “agents” for real-time, multimodal understanding. In demos, Google has shown how the AI model can simultaneously process live video and audio. Google released an app version of Project Astra to a small number of trusted testers in December but has no plans for a broader release right now.

The company would like to put Project Astra in a pair of smart glasses. Google also gave a prototype of some glasses with Project Astra and augmented reality capabilities to a few trusted testers in December. However, there’s not a clear product at this time, and it’s unclear when Google would actually release something like this.

Project Astra is still just that, a project, and not a product. However, the demos of Astra reveal what Google would like its AI products to do in the future.

Is Gemini coming to the iPhone?

It might.

Apple has said that it’s in talks to put Gemini and other third-party models to use for a number of features in its Apple Intelligence suite. Following a keynote presentation at WWDC 2024, Apple SVP Craig Federighi confirmed plans to work with models, including Gemini, but he didn’t divulge any additional details.

This post was originally published February 16, 2024, and is updated regularly.

Technology

Pintarnya raises $16.7M to power jobs and financial services in Indonesia

Pintarnya, an Indonesian employment platform that goes beyond job matching by offering financial services along with full-time and side-gig opportunities, said it has raised a $16.7 million Series A round.

The funding was led by Square Peg with participation from existing investors Vertex Venture Southeast Asia & India and East Ventures.

Ghirish Pokardas, Nelly Nurmalasari, and Henry Hendrawan founded Pintarnya in 2022 to tackle two of the biggest challenges Indonesians face daily: earning enough and borrowing responsibly.

“Traditionally, mass workers in Indonesia find jobs offline through job fairs or word of mouth, with employers buried in paper applications and candidates rarely hearing back. For borrowing, their options are often limited to family/friend or predatory lenders with harsh collection practices,” Henry Hendrawan, co-founder of Pintarnya, told TechCrunch. “We digitize job matching with AI to make hiring faster and we provide workers with safer, healthier lending options — designed around what they can reasonably afford, rather than pushing them deeper into debt.”

Around 59% of Indonesia’s 150 million workforce is employed in the informal sector, highlighting the difficulties these workers encounter in accessing formal financial services because they lack verifiable income and official employment documentation.

Pintarnya tackles this challenge by partnering with asset-backed lenders to offer secured loans, using collateral such as gold, electronics, or vehicles, Hendrawan added.

Since its seed funding in 2022, the platform currently serves over 10 million job seeker users and 40,000 employers nationwide. Its revenue has increased almost fivefold year-over-year and expects to reach break-even by the end of the year, Hendrawn noted. Pintarnya primarily serves users aged 21 to 40, most of whom have a high school education or a diploma below university level. The startup aims to focus on this underserved segment, given the large population of blue-collar and informal workers in Indonesia.

Techcrunch event

San Francisco
|
October 27-29, 2025

“Through the journey of building employment services, we discovered that our users needed more than just jobs — they needed access to financial services that traditional banks couldn’t provide,” said Hendrawan. “We digitize job matching with AI to make hiring faster and we provide workers with safer, healthier lending options — designed around what they can reasonably afford, rather than pushing them deeper into debt.”

Pintarnya Founders Press Kit — image credits: Pintarnya (Founders of Pintarnya Ghirish Pokardas, Nelly Nurmalasari, and Henry Hendrawan)

While Indonesia already has job platforms like JobStreet, Kalibrr, and Glints, these primarily cater to white-collar roles, which represent only a small portion of the workforce, according to Hendrawan. Pintarnya’s platform is designed specifically for blue-collar workers, offering tailored experiences such as quick-apply options for walk-in interviews, affordable e-learning on relevant skills, in-app opportunities for supplemental income, and seamless connections to financial services like loans.

The same trend is evident in Indonesia’s fintech sector, which similarly caters to white-collar or upper-middle-class consumers. Conventional credit scoring models for loans, which rely on steady monthly income and bank account activity, often leave blue-collar workers overlooked by existing fintech providers, Hendrawan explained.

When asked about which fintech services are most in demand, Hendrawan mentioned, “Given their employment status, lending is the most in-demand financial service for Pintarnya’s users today. We are planning to ‘graduate’ them to micro-savings and investments down the road through innovative products with our partners.”

The new funding will enable Pintarnya to strengthen its platform technology and broaden its financial service offerings through strategic partnerships. With most Indonesian workers employed in blue-collar and informal sectors, the co-founders see substantial growth opportunities in the local market. Leveraging their extensive experience in managing businesses across Southeast Asia, they are also open to exploring regional expansion when the timing is right.

“Our vision is for Pintarnya to be the everyday companion that empowers Indonesians to not only make ends meet today, but also plan, grow, and upgrade their lives tomorrow … In five years, we see Pintarnya as the go-to super app for Indonesia’s workers, not just for earning income, but as a trusted partner throughout their life journey,” Hendrawan said. “We want to be the first stop when someone is looking for work, a place that helps them upgrade their skills, and a reliable guide as they make financial decisions.”

Technology

OpenAI warns against SPVs and other ‘unauthorized’ investments

In a new blog post, OpenAI warns against “unauthorized opportunities to gain exposure to OpenAI through a variety of means,” including special purpose vehicles, known as SPVs.

“We urge you to be careful if you are contacted by a firm that purports to have access to OpenAI, including through the sale of an SPV interest with exposure to OpenAI equity,” the company writes. The blog post acknowledges that “not every offer of OpenAI equity […] is problematic” but says firms may be “attempting to circumvent our transfer restrictions.”

“If so, the sale will not be recognized and carry no economic value to you,” OpenAI says.

Investors have increasingly used SPVs (which pool money for one-off investments) as a way to buy into hot AI startups, prompting other VCs to criticize them as a vehicle for “tourist chumps.”

Business Insider reports that OpenAI isn’t the only major AI company looking to crack down on SPVs, with Anthropic reportedly telling Menlo Ventures it must use its own capital, not an SPV, to invest in an upcoming round.

Technology

Meta partners with Midjourney on AI image and video models

Meta is partnering with Midjourney to license the startup’s AI image and video generation technology, Meta Chief AI Officer Alexandr Wang announced Friday in a post on Threads. Wang says Meta’s research teams will collaborate with Midjourney to bring its technology into future AI models and products.

“To ensure Meta is able to deliver the best possible products for people it will require taking an all-of-the-above approach,” Wang said. “This means world-class talent, ambitious compute roadmap, and working with the best players across the industry.”

The Midjourney partnership could help Meta develop products that compete with industry-leading AI image and video models, such as OpenAI’s Sora, Black Forest Lab’s Flux, and Google’s Veo. Last year, Meta rolled out its own AI image generation tool, Imagine, into several of its products, including Facebook, Instagram, and Messenger. Meta also has an AI video generation tool, Movie Gen, that allows users to create videos from prompts.

The licensing agreement with Midjourney marks Meta’s latest deal to get ahead in the AI race. Earlier this year, CEO Mark Zuckerberg went on a hiring spree for AI talent, offering some researchers compensation packages worth upwards of $100 million. The social media giant also invested $14 billion in Scale AI, and acquired the AI voice startup Play AI.

Meta has held talks with several other leading AI labs about other acquisitions, and Zuckerberg even spoke with Elon Musk about joining his $97 billion takeover bid of OpenAI (Meta ultimately did not join the offer, and OpenAI denied Musk’s bid).

While the terms of Meta’s deal with Midjourney remain unknown, the startup’s CEO, David Holz, said in a post on X that his company remains independent with no investors; Midjourney is one of the few leading AI model developers that has never taken on outside funding. At one point, Meta talked with Midjourney about acquiring the startup, according to Upstarts Media.

Midjourney was founded in 2022 and quickly became a leader in the AI image generation space for its realistic, unique style. By 2023, the startup was reportedly on pace to generate $200 million in revenue. The startup sells subscriptions starting at $10 per month. It offers pricier tiers, which offer more AI image generations, that cost as much as $120 per month. In June, the startup released its first AI video model, V1.

Techcrunch event

San Francisco
|
October 27-29, 2025

Meta’s partnership with Midjourney comes just two months after the startup was sued by Disney and Universal, alleging that it trained AI image models on copyrighted works. Several AI model developers — including Meta — face similar allegations from copyright holders, however, recent court cases pertaining to AI training data have sided with tech companies.

Got a sensitive tip or confidential documents? We’re reporting on the inner workings of the AI industry — from the companies shaping its future to the people impacted by their decisions. Reach out to Rebecca Bellan at [email protected] and Maxwell Zeff at [email protected]. For secure communication, you can contact us via Signal at @rebeccabellan.491 and @mzeff.88.

We’re always looking to evolve, and by providing some insight into your perspective and feedback into TechCrunch and our coverage and events, you can help us! Fill out this survey to let us know how we’re doing and get the chance to win a prize in return!