The NBA x Open AI Deal
When will we see the first deal between a major sports league and a generative AI platform?
From Sam Altman to Mark Zuckerberg to Marc Andreessen… it feels like everyone’s discussing how generative AI will change the future of media.
But few are discussing how an emerging market for AI training data is impacting media today.
In the past two years, media companies have driven significant revenues by licensing data to AI companies to train their models. Reddit signed a $60 million annual deal with Google. NewsCorp signed a $250 million, 5-year deal with Open AI. Shutterstock signed $25-50 million deals with multiple Big Tech companies.
These all made me wonder:
When will we see the first deal between a major sports league and a generative AI platform?
Sports leagues sit on troves of proprietary, high-quality video and image data. For example, every year the NBA produces roughly 1,300 games, each producing 2.5 hours of video. Open AI, or any other foundational AI platform, would pay a high price for that dataset.
Here’s a thought exercise around how much the NBA could charge Open AI for its library of games. It’s based on $2-4/minute of HD video – the market rate AI companies are paying media publishers.
For perspective, $87 million nears the annual value of the NBA’s largest brand sponsorships with Nike (~$125 million) and PepsiCo (~$100 million).
It’s also worth noting $87 million is a conservative estimate, for a few reasons. First, the NBA could sell an even larger data package, including practice, press conference and off-court video, as well as images. Second, the NBA could command a premium beyond the $4/minute market rate because it’s valuable IP. Third, the NBA could sell the same data package to multiple AI companies. Fourth, the NBA could earn a revenue share from any consumer applications built on top of the models (more on this later).
That leads to the question:
Why hasn’t the NBA, or any other major league, licensed its data to major AI platforms yet?
To answer the question, I spoke with media companies who have licensed their data, as well as sports leagues who haven’t. A few themes became clear…
The media companies that have licensed their data are primarily: print media companies (e.g. The New York Times & NewsCorp), and stock libraries (e.g. Shutterstock & Getty). Both categories face major headwinds and stagnating revenues, so they’ve lunged at this new income stream.
In contrast, major sports leagues enjoy growing revenues that are derisked by long-term TV deals. Leagues like the NBA don’t need the incremental revenue from AI companies. So they have the luxury of waiting and seeing how the market develops.
For now, leagues view generative AI with major concern, which I’d bucket into three categories: commercial, brand and political. I’ll explain each.
Concern #1: Commercial
Sports leagues don’t want to set commercial precedent for their datasets before the market fully develops.
In 2023, The New York Times sued Open AI, alleging they used millions of copyrighted articles without permission to train their models. 20+ similar lawsuits have emerged. Each brings up the same legal question: Does using copyrighted content to train AI models constitute fair use or copyright infringement?
There is zero legal precedent here. If the courts rule copyright infringement, media companies will be able to increase the price of their datasets. If the courts rule fair use, media companies might have few buyers at all. $2-4 per minute is the current market rate for licensed video. But that could drastically change based on how the courts rule.
The last thing leagues want to do is set a negative precedent and get unfavorable terms before the market stabilizes. Leagues are also averse to deals that aren’t timebound. Leagues have grown historically by selling media and sponsorship deals for a set timeframe, and upping their value after each term. However, it’s difficult to timebound an AI data licensing deal because once data is used to train an AI model, it becomes part of the model’s foundational knowledge and can influence outputs beyond any licensing period.
Concern #2: Brand
Sports leagues also worry that AI content made with their IP could be brand-dilutive.
Leagues are incredibly protective of their IP. At the Big 4 leagues, all creative assets made by partners (from a Nike ad to a Fanatics t-shirt) run through committees of lawyers, marketers and other stakeholders to ensure the assets meet strict brand guidelines.
AI tools undermine that level of control. Users will inevitably try to produce inappropriate content with IP, and IP holders cannot possibly track the near-infinite number of outputs.
Grok, the AI chatbot developed by xAI and integrated into X, is a pressing case study. Most major models like ChatGPT prevent you from generating images with popular IP to avoid liability for copyright infringement. But Grok ignores those legal concerns and lets users run wild. We’ve already seen viral trends of AI-generated content, from Patrick Mahomes colluding with NFL refs, to NBA players depicted with Hitler.
Until AI tools can effectively moderate creation and distribution, leagues will hesitate to embrace them.
Concern #3: Political
Lastly, leagues have political concerns around licensing video data to AI companies.
Technically, according to how CBAs are structured, most leagues retain the right to license video and data rights to new partners. However, the leagues also act as middlemen between multiple stakeholders: players unions, team owners and commercial partners.
If the NBA were to license data to Open AI… and then ChatGPT generated an image including LeBron James’ face, or ESPN’s logo… those derivative works could infringe on NIL and copyright laws. Those stakeholders might make an angry call to the league office. In fact, players unions have already called leagues to express frustration around Grok generations on X, which has commercial partnerships with the major leagues.
##
Despite these commercial, brand and political concerns, many league executives believe they will inevitably, eventually strike deals with AI companies.
Check out these sports scenes from three of the most advanced AI video models. All are nearing consumer-grade quality.
As image and video models improve, fan engagement applications will emerge. In this article, I theorized about those applications. Fans could generate video of LeBron James and Michael Jordan playing 1v1, or do a personal video chat with Lionel Messi. That future isn’t here yet. But some AI researchers claim full-fidelity and customizable video and image models are close.
As the first consumer AI applications gain traction… I’m paying attention to: What building blocks and tools will allow sports leagues to license their data to AI companies?
I believe sports leagues will have fundamental needs around…
Selling data to AI companies
Attributing data used to generate AI content
Limiting data from creating unwanted AI content
I’ll explain and highlight startups building around each need.
1. Selling data
First, leagues will need tools to effectively sell their data libraries.
We’ve already seen multiple venture-backed startups emerge as data aggregators and brokers. Examples include Tollbit, Protege, and Human Native AI. These startups are consolidating data libraries from rights holders, and then selling them to AI companies. Some of these startups are targeting niches of rights holders. For example, Troveo focuses on content creator data, RightsTrade on film & TV, and Created by Humans on books.
At small-scale, these companies are effectively services companies. Most are now helping smaller rights holders – think a collection of local magazines, which might own decades of private data, but lack the access or bargaining power to sell to an AI company.
But at full-scale, they can evolve into marketplaces and offer useful features for both sides. For example, they can let rights holders place specific governance controls on their libraries, and also let AI companies filter for more granular datasets. As they scale, these companies will also gain leverage to take a larger cut of each transaction.
Rights aggregators are a popular business model in the sports industry. For example, One Team Partners works across player unions from the NFLPA to MLBPA to commercialize athlete rights. Will Ventures invested in Athletes.org, which has a similar vision for college athletes. Bundling rights helps rights holders scale their market value – and the same principle could apply to AI training data.
2. Attributing data
After selling their data, sports leagues must ensure that AI companies are accurately attributing the data used in their outputs. For example, if ChatGPT generates an image of a basketball player, the NBA must know what percentage was based on their data – and how much they should be compensated.
Last year, ChatGPT integrated link-outs to media publishers in its responses. These link-outs were part of their licensing deals with many media publishers, allowing them to earn variable compensation based on usage. However, media publishers have expressed frustration that their content isn’t being linked out sufficiently or accurately.
ProRata is building technology that can analyze a piece of generative content and attribute each source of contributing data. For example, ProRata could measure that a ChatGPT response about the NFL was 50% lifted from NFL.com and 50% from ESPN.
3. Limiting data
Lastly, sports leagues must be able to limit the spread of deepfakes and other inappropriate content generated with their IP. Even if leagues partner with AI companies and set strict controls on generation, open-source models will still pose a threat.
That’s why we’re seeing startups emerge around deepfake detection. Companies like Reality Defender, Blackbird, and Smart Protection help rights holders identify AI-generated image, video and voice that uses their IP. Some startups have focused on specific use cases. For example, Loti monitors deepfakes for public figures (e.g. athletes & artists), whereas Deep Media focuses on government agencies and major tech companies.
In addition to detecting deepfakes, these startups help with enforcement. They have relationships with publishers like Meta to automate takedowns that are detected by their systems. Long-term, these startups could serve as a “blue check mark” for rights holders and publishers by tagging what is and isn’t AI-generated content.
##
Truth is: The above-mentioned commercial, brand and political concerns aren’t unique to sports leagues like the NBA.
They’re relevant to all valuable IP holders – from film & TV companies like Netflix, to game publishers like EA, to music labels like Warner, to every individual artist and creator.
I’ll be closely watching which consumer AI applications emerge using recognizable IP across entertainment. And I’ll be paying attention to which tools give IP holders the confidence to engage with these applications.
If you’re building at this intersection of generative AI and IP holders, I’d love to hear your thoughts.
Thought exercises are fun, but I would be interested to see a bit more 2nd order thought applied to the BOE estimates. For example why would someone need every single minute of every single broadcast from every single camera to train a model? Couldn't Turkish Basketball leagues be and excellent substitute for NBA footage at a lower cost? I also wouldn't assume that there are actually monetizable consumer use-cases out there.
- one person's personal non-professional opinion