Media Briefing: How 3 publishers are making their sites more/less habitable to AI crawlers
This Media Briefing covers the latest in media trends for Digiday+ members and is distributed over email every Thursday at 10 a.m. ET. More from the series →
Many publishers, like 404 Media and The Washington Post, have grown wary of AI crawler bots and their ability to scrape and take original content for unapproved uses, including training large language models or altogether regurgitating the articles with a new headline and no credit.
Meanwhile, other publishers like Politico EU, are choosing to welcome AI crawlers with open arms.
The publishers’ varied approaches likely relate to their respective business models, according to Melissa Chowning, founder and CEO of audience development and marketing firm Twenty-First Digital. 404 Media is reliant on subscriptions, whereas Politico EU and The Washington Post would want to strike a balance between using generative AI bots as upper-funnel traffic sources and using paywalls to block the bots and protect their subscription businesses.
In this piece, we look at the strategies that the three publishers have taken to make their websites more or less habitable to AI crawler bots and the pros and cons behind each of those decisions. — Kayleigh Barber and Sara Guaglione
404 Media’s walled-off approach
Tech media start-up 404 Media is currently only blocking GPTBot, according to its robots.txt file. Instead, the company’s founders decided to put up a registration wall in an attempt to take a sweeping action against all current and future bots.
“The reality is that OpenAI is not the only AI out there and it’s obviously not the only scraper out there … It’s very much a whack-a-mole solution. I don’t really want to be in a place where I have to ask the developer every week or go into GitHub myself and add a block to a new AI tool,” said Joseph Cox, co-founder of 404 Media, which launched in August with only four full-time employees, all co-founders and journalists.
The registration wall requires readers to provide their email address before accessing the site’s content. The co-founders shared the reasoning for putting up the registration in a note to readers on the website, explaining that 404 has had a particular problem of AI bots scraping content, regurgitating the content with a different headline on other websites, which then have higher search rankings on Google than the originally reported article they’d first produced.
Pros:
- Taking a firm stance to protect a publisher’s content from being scraped by large tech companies and used for free to train their LLMs
- Have more leverage to negotiate with these companies for licensing deals down the line
Publishers who take this approach “want to be paid for their content,” said Yoram Wurmser, eMarketer principal analyst, Insider Intelligence. This gives publishers leverage to negotiate with AI companies on licensing deals.
Chowning said a number of their clients are choosing to “have their defenses up” like 404 Media.
Cons:
- Content could be harder to find, with limited reach
- Creates friction for readers
- It’s not foolproof. Someone could enter an email and allow a bot to subvert the reg wall
Making the “difficult choice” to lock all content away from crawlers means it could be harder for readers to find 404 Media’s coverage, Chowning said, such as if they aren’t getting surfaced in AI-generated search results. “You don’t want subscription product content to be that available, but then you do lose some of that accessibility,” she said.
The Washington Post’s selective strategy
The Post’s engineering team examines LLM-based bots and determines when to block them based on how they will affect the Post’s “SEO metrics,” a spokesperson said.
The engineering team “analyzes several factors, including traffic patterns, to determine when to deny or slow crawler access across categories like web archiver bots, enterprise data aggregator bots and more,” they added. The spokesperson declined to answer further questions.
Pros:
- Evaluating each crawler’s impact to a site’s traffic can help determine whether the value exchange is worth it
Arvid Tchivzhel, managing director at Mather Economics’ digital consulting practice called this approach “very pragmatic and data-driven.” By evaluating referral traffic from different platforms, the Post can decide on a value exchange that works in its favor, he said. And if a platform is crawling the Post’s content without driving much traffic, it can block those crawlers knowing that it won’t have a significant impact on its referral stats. (At publishing time, the Washington Post was blocking both OpenAI’s GPTBot and Google’s Google-Extended bots.)
The Post – and other publishers – can do this by A/B testing certain browsers or geographies, or blocking and unblocking bots at different times to measure the changes to referral traffic, Tchivzhel said.
Cons:
- Not all publishers have the resources to do this
The Washington Post is in a unique position to do this evaluation because of the resources it has on hand as a large publisher, Chowning said. “Not everybody has a team that can evaluate the impact of various crawlers. So most publishers have to make a gut-level decision [about blocking AI web crawlers],” she said.
Going forward, large publishers are going to have to calculate if the AI web crawlers are “a marketing tool or taking our proprietary information?” Wurmser said.
Politico’s open embrace
Politico is taking an entirely different approach. The publisher – whose parent company Axel Springer signed a licensing deal with OpenAI last year – recently made changes to the design of its EU websites to actually make it easier for crawlers to access its content.
In an interview with Press Gazette, Politico’s vp for product and design Max Leroy said his team organized the website with clearer site mapping (with more sections and subsections) in the hopes that content would show up in search result pages and generated answers in AI chat interfaces. Leroy said he wants Politico EU’s content to appear in Google’s new Search Generative Experience answer formats. Leroy and Politico EU declined an interview request.
Pros:
- The ability to draw in readers from search and AI-powered platforms
- Potential to increase scale, before readers come up against a membership paywall
Tchivzhel said Axel Springer’s OpenAI deal likely is an incentive to keep content open to AI crawlers. And publishers that do keep their sites available to those crawlers have the potential to build brand awareness if they appear as the original source links below generated responses to users’ questions in AI-powered search results or chatbots, he added.
Chowning said Politico EU’s decision to organize content on its website with different subheadings has human user experience benefits as well, as it’s also “organizing [the site] in a way that makes it more readable by humans.”
Cons:
- Remains to be seen how much this strategy will really benefit publishers
This approach only works for Politico EU because the publisher’s freemium content model gives it a “distinct monetization strategy,” Chowning said. If a publisher’s business model is entirely subscription- or membership-based, publishers may need to be more careful about blocking AI crawlers to protect their content, she said.
Politico EU’s strategy may work for now, but Wurmser believes generative AI will reduce publishers’ referral traffic in the long run and moves to try to maintain traffic might not work for very long. It’s also not clear how willing users will be to click through to the links below AI-generated search results to access more information on publishers’ sites, Wurmser said.
What we’ve heard
“Publishers once trading on scale can no longer trade on scale because of the referral traffic disruption. So those publishers that remain – and I hope that there are a lot of us – will be ones who have a very strong connection to a very qualified and engaged audience … We’re not losing because of scale anymore.”
– Lindsey Abramo, World of Good Brand’s CEO, on the shift in her media selling mindset
Dotdash Meredith finally reports digital revenue growth
For Dotdash Meredith, 2023 may have been another mediocre year on the whole, but the fourth quarter marked a turn for the better.
While other companies reported declines in digital ad revenue during Q4, it was the first time in the combined company’s history that digital revenue – which includes advertising, performance marketing and licensing – grew year over year since Meredith was acquired at the end of 2021, according to IAC’s Q4 earnings report published on Tuesday.
Although 2023 saw a 9% year-over-year increase in digital revenue compared to Q4 2022, there was almost a 7% decline from two years ago when comparing to the $303.7 million in Meredith’s and Dotdash’s combined pro forma digital revenue for Q4 2021.
Notable full year 2023 numbers:
- DDM’s total revenue in 2023 was just under $1.7 billion, down about 12% year over year, from the $1.9 billion in 2022.
- Adjusted EBITDA in 2023 was up 46% year over year to $222.8 million.
- Total advertising revenue for the year was down almost 10% year over year, to $560.8 million, according to IAC’s Grids and Metrics Q4 2023 document.
- Performance marketing revenue was up about 16% year over year, to $231.1 million.
- The licensing and other revenues category was down almost 10% year over year to $100.6 million.
- Print totaled $823.5 million in 2023, down almost 20% from a little over $1 billion in 2022.
Notable Q4 numbers:
- Total revenue for DDM in the fourth quarter 2023 was $475.9 million, roughly flat year over year to the $477.6 million generated in Q4 2022.
- Digital revenue increased by 9% year over year to $283.6 million.
- Print revenue was down 12% year over year to $198 million, due to a planned reduction in circulation of certain publications and the shift in ad spend from print to digital mediums.
- Adjusted EBITDA in Q4 was up 69% year over year to $123.5 million.
The bright light in Q4
In its latest earnings report, Dotdash Meredith attributed the digital revenue growth to an increase in both programmatic and direct-sold advertising revenue. Digital advertising revenue totaled $185.5 million in the quarter, up 3.7% year over year; IAC did not break out print ad revenue.
Programmatic advertising revenue was up by an undisclosed amount due to a 10% increase in core sessions traffic year over year and higher ad rates, per the earnings report. Premium direct-sold advertising (which IAC’s CEO Joey Levin said during the earnings call represented about two-thirds of DDM’s ad revenue) increased primarily due to increased spend in the beauty, travel and technology advertising categories. Performance marketing revenue grew by 31% in the quarter to $71.1 million as a result of a 54% increase in affiliate commerce. The growth was partially offset by declines in this category concentrated in the finance and health categories.
DDM’s cookieless, intent-targeting ad tool D/Cipher is now being used in more than 30% of the company’s direct-sold ad campaigns, representing over 150 deals since it was launched last year. DDM maintains that D/Cipher is better at driving campaign performance and conversions than third-party cookies.
2024 outlook
Due in part to a promising Q4 and the fact that both digital traffic and monetization have “continued their momentum into the first quarter of ‘24,” IAC CFO and COO Christopher Halpin said during the company’s earnings call on Wednesday that DDM is expected to have a total adjusted EBITDA of $280-300 million in 2024, up from $222.8 million in 2023. He said this will largely, if not entirely, come from the digital business.
Throughout 2024, digital revenue is expected to grow by 10% or more year over year while print revenue is expected to decline at a similar rate to the 12% decline it saw in Q4, particularly in the first half of the year, said Halpin.“Now the focus is building on the momentum, taking more share with D/Cipher, and establishing Dotdash Meredith as a digital leader in both publishing and advertising. We’re sitting at the right table now, working our way towards the head,” said Levin in the letter to shareholders.
Numbers to know
£39 million (about $49 million): The amount of money that The Guardian is on track to lose during this fiscal year, which will end next month.
28%: The amount that Slate’s total full-year revenue grew by year over year in 2023, which was the most profitable year in the company’s 27-year-old history.
20: The number of CBS News journalists laid off as part of Paramount’s widespread layoffs, including several correspondents, many of which are based in the newsroom’s Washington, D.C., bureau.
4.86 million: The size of Dow Jones’s digital subscription base as of January. Approximately 80% of the company’s overall revenue comes from consumer and enterprise subscriptions as of today.
7: The number of full-time Fatherly employees laid off on Friday, and while BDG has not formally shut down the parenting title, the brand will significantly decrease its editorial output as a result of the layoffs.
What we’ve covered
Why New York Magazine’s the Cut is expanding at a time when many media companies are cutting costs:
- New York Magazine’s the Cut is expanding this year, adding four full-time editorial staff, verticals and inventory as it chases new and existing advertiser dollars.
- But how can the Vox Media-owned title afford to expand at a time when most large digital publishers are undergoing layoffs?
Read more about why the Cut is on an expansion trajectory here.
Most publishers grew their ad offerings last year, with a focus on branded content:
- Digiday’s survey of more than 300 publisher professionals found that, overall, more than half of publishers (56%) grew their ad products last year.
- The increase, however, wasn’t an overwhelming one as far as how many ad products publishers added. Fifty-three percent of publisher pros said the number of ad products they offered increased only somewhat last year.
Learn more about publishers’ ad offerings in the latest survey from Digiday+ Research here.
WTF are Related Website Sets (RWS) in Google’s Privacy Sandbox?
- RWS is the proposed means for publishers to declare a relationship between their various web domains (and associated ones) after Google Chrome pulls support for third-party cookies.
- To further ensure user privacy, the latest proposals limit publishers to listing five associated domains (formerly, this was three) in a set.
Watch a video explainer of RWS here.
The Trade Desk is rolling out OpenPath to CTV:
- The Trade Desk is extending OpenPath to CTV media owners, with separate sources from both the buy- and sell-sides of the industry telling Digiday TTD began opening such negotiations in recent months.
- Cox Media Group and Vizio have already been confirmed as trading their CTV inventory on the platform.
Learn more about this expansion of OpenPath here.
The New York Times expects ad revenue to continue to decline in 2024:
- The Times’ 2023 fourth quarter earnings report showed the company isn’t entirely immune from the volatile ad market. In fact, the company doesn’t expect to improve in the first quarter of this year.
- Digital ad sales fell by 3.7% to $107.7 million in Q4 2023, down from $111.9 million in Q4 2022.
Read more about the Times’ fourth quarter performance here.
What we’re reading
The Trade Desk launches SP500+, a new tool that helps buyers target premium publishers:
Launched in beta, the new tool gives media buyers the ability to target about 500 sellers and publishers (hence the name SP500+) that are deemed as high quality inventory, according to Adweek. The New York Times, Disney+, Hulu, Spotify, ABC and The Wall Street Journal are all making their ad inventory available through the tool.
Jimmy Finkelstein explains why The Messenger folded:
In an interview with Axios, The Messenger founder said that, had he been able to raise $20 million in funding, the media start-up would have been able to achieve profitability by August. Finkelstein said the company had a full-year revenue projection of $60 million in 2024, compared to the $3 million made in 2023.
How Betches is covering the U.S. election for an online Gen Z audience:
The digital media company Betches is expanding its largely lifestyle and entertainment podcast network to include a new political podcast called “American Fever Dream.” It will be co-hosted by internet personality Vitus Spehar, who goes by the handle @underthedesknews on TikTok and Instagram, reported The Washington Post.
Rolling Stone’s top editor is leaving after reported editorial differences with CEO:
As of March 1, Noah Shachtman will no longer be the editor-in-chief of Rolling Stone, a role he’s held for since 2021. According to a report by The New York Times, the resignation comes after editorial differences between Shachtman and the publication’s CEO Gus Wenner.