Technology
Researchers suggest OpenAI trained AI models on paywalled O’Reilly books
OpenAI has been accused by many parties of training its AI on copyrighted content sans permission. Now a new paper by an AI watchdog organization makes the serious accusation that the company increasingly relied on nonpublic books it didn’t license to train more sophisticated AI models.
AI models are essentially complex prediction engines. Trained on a lot of data — books, movies, TV shows, and so on — they learn patterns and novel ways to extrapolate from a simple prompt. When a model “writes” an essay on a Greek tragedy or “draws” Ghibli-style images, it’s simply pulling from its vast knowledge to approximate. It isn’t arriving at anything new.
While a number of AI labs, including OpenAI, have begun embracing AI-generated data to train AI as they exhaust real-world sources (mainly the public web), few have eschewed real-world data entirely. That’s likely because training on purely synthetic data comes with risks, like worsening a model’s performance.
The new paper, out of the AI Disclosures Project, a nonprofit co-founded in 2024 by media mogul Tim O’Reilly and economist Ilan Strauss, draws the conclusion that OpenAI likely trained its GPT-4o model on paywalled books from O’Reilly Media. (O’Reilly is the CEO of O’Reilly Media.)
In ChatGPT, GPT-4o is the default model. O’Reilly doesn’t have a licensing agreement with OpenAI, the paper says.
“GPT-4o, OpenAI’s more recent and capable model, demonstrates strong recognition of paywalled O’Reilly book content … compared to OpenAI’s earlier model GPT-3.5 Turbo,” wrote the co-authors of the paper. “In contrast, GPT-3.5 Turbo shows greater relative recognition of publicly accessible O’Reilly book samples.”
The paper used a method called DE-COP, first introduced in an academic paper in 2024, designed to detect copyrighted content in language models’ training data. Also known as a “membership inference attack,” the method tests whether a model can reliably distinguish human-authored texts from paraphrased, AI-generated versions of the same text. If it can, it suggests that the model might have prior knowledge of the text from its training data.
The co-authors of the paper — O’Reilly, Strauss, and AI researcher Sruly Rosenblat — say that they probed GPT-4o, GPT-3.5 Turbo, and other OpenAI models’ knowledge of O’Reilly Media books published before and after their training cutoff dates. They used 13,962 paragraph excerpts from 34 O’Reilly books to estimate the probability that a particular excerpt had been included in a model’s training dataset.
According to the results of the paper, GPT-4o “recognized” far more paywalled O’Reilly book content than OpenAI’s older models, including GPT-3.5 Turbo. That’s even after accounting for potential confounding factors, the authors said, like improvements in newer models’ ability to figure out whether text was human-authored.
“GPT-4o [likely] recognizes, and so has prior knowledge of, many non-public O’Reilly books published prior to its training cutoff date,” wrote the co-authors.
It isn’t a smoking gun, the co-authors are careful to note. They acknowledge that their experimental method isn’t foolproof and that OpenAI might’ve collected the paywalled book excerpts from users copying and pasting it into ChatGPT.
Muddying the waters further, the co-authors didn’t evaluate OpenAI’s most recent collection of models, which includes GPT-4.5 and “reasoning” models such as o3-mini and o1. It’s possible that these models weren’t trained on paywalled O’Reilly book data or were trained on a lesser amount than GPT-4o.
That being said, it’s no secret that OpenAI, which has advocated for looser restrictions around developing models using copyrighted data, has been seeking higher-quality training data for some time. The company has gone so far as to hire journalists to help fine-tune its models’ outputs. That’s a trend across the broader industry: AI companies recruiting experts in domains like science and physics to effectively have these experts feed their knowledge into AI systems.
It should be noted that OpenAI pays for at least some of its training data. The company has licensing deals in place with news publishers, social networks, stock media libraries, and others. OpenAI also offers opt-out mechanisms — albeit imperfect ones — that allow copyright owners to flag content they’d prefer the company not use for training purposes.
Still, as OpenAI battles several suits over its training data practices and treatment of copyright law in U.S. courts, the O’Reilly paper isn’t the most flattering look.
OpenAI didn’t respond to a request for comment.
Technology
The Case for Custom eLearning Platforms: Why Organizations Are Making the Switch
The corporate eLearning market has exploded in recent years, growing over 800% since 2000. As the demand for eLearning continues to accelerate, more and more organizations are finding that off-the-shelf solutions cannot keep pace with their training needs. This has led many companies to make the switch to custom-built eLearning platforms tailored specifically for their requirements.
There are several key reasons driving the demand for customized eLearning tools:
Greater Flexibility and Scalability
Generic eLearning software packages often impose rigid constraints that limit their ability to adapt to an organization’s evolving needs. Meanwhile, the “one-size-fits-all” approach fails to support the personalized learning critical for employee development. Custom platforms provide flexibility to add and modify features to match ever-changing business goals. As companies scale training across global workforces, custom solutions built on cloud infrastructure can scale seamlessly to handle growing demand.
Deeper Integration Across Systems
Smooth integration with existing HR, LMS, and other business systems is critical for optimizing training workflows. However, off-the-shelf tools rarely integrate well, creating data and process siloes. Custom platforms can tightly integrate role-based learning paths with core business applications, sync user profiles, enable single sign-on, and more. This level of integration catalyzes more impactful training function.
Better Data and Analytics
Generic software severely limits access to data insights that drive improvement. Custom platforms unlock a trove of analytics on content consumption, learner progression, platform adoption, and real-time feedback. Integrated analytics dashboards and APIs allow businesses to derive deep visibility across the learner lifecycle. These insights help continuously enhance learner experience, target development gaps, and demonstrate direct training ROI.
Enhanced Learner Engagement
For modern learners accustomed to consumer-grade digital experiences, poor platform usability quickly erodes engagement. Custom designs allow companies to incorporate familiar features from popular apps and websites while optimizing for their audience. Adaptive learning approaches further personalize content to individual styles and needs. With modular component architecture, custom platforms stay on the cutting edge of new modalities like AR/ VR to captivate learners.
Brand and Culture Alignment
Off-the-shelf tools impose a generic and often disruptive experience that clashes with existing brand identity and culture. In contrast, custom platforms allow organizations to carry over familiar styling, voice, and workflow patterns. Consistency in experience preserves brand recognition while smoother onboarding leads to wider adoption across all employee groups. Over time, the platform can evolve alongside cultural changes as well.
While custom elearning tools require greater upfront investment, for enterprise training needs, the long-term benefits far outweigh the costs. The ability to mold platforms to current and future needs results in greater leverage from learning spend.
As businesses demand ever-more from their learning technology, custom solutions provide the agility needed for true scale. Rather than forcing training functions into the constraints of generic software, custom elearning development keeps the focus on nurturing talent and capabilities. For any organization looking to drive workforce transformation through learning, custom elearning represents the way forward.
Technology
Pintarnya raises $16.7M to power jobs and financial services in Indonesia
Pintarnya, an Indonesian employment platform that goes beyond job matching by offering financial services along with full-time and side-gig opportunities, said it has raised a $16.7 million Series A round.
The funding was led by Square Peg with participation from existing investors Vertex Venture Southeast Asia & India and East Ventures.
Ghirish Pokardas, Nelly Nurmalasari, and Henry Hendrawan founded Pintarnya in 2022 to tackle two of the biggest challenges Indonesians face daily: earning enough and borrowing responsibly.
“Traditionally, mass workers in Indonesia find jobs offline through job fairs or word of mouth, with employers buried in paper applications and candidates rarely hearing back. For borrowing, their options are often limited to family/friend or predatory lenders with harsh collection practices,” Henry Hendrawan, co-founder of Pintarnya, told TechCrunch. “We digitize job matching with AI to make hiring faster and we provide workers with safer, healthier lending options — designed around what they can reasonably afford, rather than pushing them deeper into debt.”
Around 59% of Indonesia’s 150 million workforce is employed in the informal sector, highlighting the difficulties these workers encounter in accessing formal financial services because they lack verifiable income and official employment documentation.
Pintarnya tackles this challenge by partnering with asset-backed lenders to offer secured loans, using collateral such as gold, electronics, or vehicles, Hendrawan added.
Since its seed funding in 2022, the platform currently serves over 10 million job seeker users and 40,000 employers nationwide. Its revenue has increased almost fivefold year-over-year and expects to reach break-even by the end of the year, Hendrawn noted. Pintarnya primarily serves users aged 21 to 40, most of whom have a high school education or a diploma below university level. The startup aims to focus on this underserved segment, given the large population of blue-collar and informal workers in Indonesia.
Techcrunch event
San Francisco
|
October 27-29, 2025
“Through the journey of building employment services, we discovered that our users needed more than just jobs — they needed access to financial services that traditional banks couldn’t provide,” said Hendrawan. “We digitize job matching with AI to make hiring faster and we provide workers with safer, healthier lending options — designed around what they can reasonably afford, rather than pushing them deeper into debt.”

While Indonesia already has job platforms like JobStreet, Kalibrr, and Glints, these primarily cater to white-collar roles, which represent only a small portion of the workforce, according to Hendrawan. Pintarnya’s platform is designed specifically for blue-collar workers, offering tailored experiences such as quick-apply options for walk-in interviews, affordable e-learning on relevant skills, in-app opportunities for supplemental income, and seamless connections to financial services like loans.
The same trend is evident in Indonesia’s fintech sector, which similarly caters to white-collar or upper-middle-class consumers. Conventional credit scoring models for loans, which rely on steady monthly income and bank account activity, often leave blue-collar workers overlooked by existing fintech providers, Hendrawan explained.
When asked about which fintech services are most in demand, Hendrawan mentioned, “Given their employment status, lending is the most in-demand financial service for Pintarnya’s users today. We are planning to ‘graduate’ them to micro-savings and investments down the road through innovative products with our partners.”
The new funding will enable Pintarnya to strengthen its platform technology and broaden its financial service offerings through strategic partnerships. With most Indonesian workers employed in blue-collar and informal sectors, the co-founders see substantial growth opportunities in the local market. Leveraging their extensive experience in managing businesses across Southeast Asia, they are also open to exploring regional expansion when the timing is right.
“Our vision is for Pintarnya to be the everyday companion that empowers Indonesians to not only make ends meet today, but also plan, grow, and upgrade their lives tomorrow … In five years, we see Pintarnya as the go-to super app for Indonesia’s workers, not just for earning income, but as a trusted partner throughout their life journey,” Hendrawan said. “We want to be the first stop when someone is looking for work, a place that helps them upgrade their skills, and a reliable guide as they make financial decisions.”
Technology
OpenAI warns against SPVs and other ‘unauthorized’ investments
In a new blog post, OpenAI warns against “unauthorized opportunities to gain exposure to OpenAI through a variety of means,” including special purpose vehicles, known as SPVs.
“We urge you to be careful if you are contacted by a firm that purports to have access to OpenAI, including through the sale of an SPV interest with exposure to OpenAI equity,” the company writes. The blog post acknowledges that “not every offer of OpenAI equity […] is problematic” but says firms may be “attempting to circumvent our transfer restrictions.”
“If so, the sale will not be recognized and carry no economic value to you,” OpenAI says.
Investors have increasingly used SPVs (which pool money for one-off investments) as a way to buy into hot AI startups, prompting other VCs to criticize them as a vehicle for “tourist chumps.”
Business Insider reports that OpenAI isn’t the only major AI company looking to crack down on SPVs, with Anthropic reportedly telling Menlo Ventures it must use its own capital, not an SPV, to invest in an upcoming round.
-
Trending2 weeks agoApril 15, 2026 – MJF vs. Darby Allin for AEW World Title, More
-
Trending2 weeks agoTwenty-six true difference-makers in the 2026 NFL Draft; plus, 5 must-pick sleepers
-
Trending3 weeks agoWe’re 75 With $3.2 Million. Our Grandchild Needs Help Paying for College, but It’s Not Our Fault She Picked a School That’s $90k a Year!
-
Trending2 weeks agoCincinnati Reds Minor League Game Review: April 18, 2026
-
Trending1 week ago2026 NFL draft intel, notes: What Adam Schefter is hearing
-
Trending3 weeks agoTwitch star calls out Mr. Beast over “most uncomfortable” livestream ever
-
Trending1 week ago2026 NBA playoffs: Western Conference first-round takeaways
-
Entertainment2 weeks agoChristina Applegate’s friends fearing the worst as ‘hellish’ details of star’s hospitalization are revealed: report
