Bluesky users debate plans around user data and AI training – TechCrunch

Latest
AI
Amazon
Apps
Biotech & Health
Climate
Cloud Computing
Commerce
Crypto
Enterprise
EVs
Fintech
Fundraising
Gadgets
Gaming
Government & Policy
Hardware
Layoffs
Media & Entertainment
Meta
Microsoft
Privacy
Robotics
Security
Social
Space
Startups
TikTok
Transportation
Venture
Events
Startup Battlefield
StrictlyVC
Newsletters
Podcasts
Videos
Partner Content
TechCrunch Brand Studio
Crunchboard
Contact Us
Social network Bluesky recently published a proposal on GitHub outlining new options it could give users to indicate whether they want their posts and data to be scraped for things like generative AI training and public archiving.CEO Jay Graber discussed the proposal earlier this week, while on-stage at South by Southwest, but it attracted fresh attention on Friday night, after she posted about it on Bluesky. Some users reacted with alarm to the company’s plans, which they saw as a reversal of Bluesky’s previous insistence that it won’t sell user data to advertisers and won’t train AI on user posts.“Oh, hell no!” the user Sketchette wrote. “The beauty of this platform was the NOT sharing of information. Especially gen AI. Don’t you cave now.”Graber replied that generative AI companies are “already scraping public data from across the web,” including from Bluesky, since “everything on Bluesky is public like a website is public.” So she said Bluesky is trying to create a “new standard” to govern that scraping, similar to the robots.txt file that websites use to communicate their permissions to web crawlers.Debates about AI training and copyright have dragged robots.txt into the spotlight, among other things highlighting the fact that it’s not legally enforceable. Bluesky frames its proposed standard as one that would have a similar “mechanism and expectations,” providing “a machine-readable format, which good actors are expected to abide, and does carry ethical weight, but is not legally enforceable.”Under the proposal, users of the Bluesky app, or other apps that use the underlying ATProtocol, could go into their settings and allow or disallow the usage of their Bluesky data across four categories: generative AI, protocol bridging (i.e., connecting different social ecosystems), bulk datasets, and web archiving (such as the Internet Archive’s Wayback Machine).If a user indicates that they don’t want their data used to train generative AI, the proposal says, “Companies and research teams building AI training sets are expected to respect this intent when they see it, either when scraping websites, or doing bulk transfers using the protocol itself.”Molly White, who writes the Citation Needed newsletter and Web3 is Going Just Great blog, described this as “a good proposal,” and said it was “weird to see people flaming BlueSky for it,” since it’s not so much “welcoming in AI scraping” but rather “trying to add a consent signal to allow users to communicate preferences for the scraping that is already happening.”“I think the weakness with this and [Creative Commons’] similar proposal for ‘preference signals’ is that they rely on scrapers to respect these signals out of some desire to be good actors,” White continued. “We’ve already seen some of these companies blow right past robots.txt or pirate material to scrape.”TopicsAnthony Ha is TechCrunch’s weekend editor. Previously, he worked as a tech reporter at Adweek, a senior editor at VentureBeat, a local government reporter at the Hollister Free Lance, and vice president of content at a VC firm. He lives in New York City. People are using Google’s new AI model to remove watermarks from images
Nvidia’s AI empire: A look at its top startup investments
Photo calorie app Cal AI, downloaded over a million times, was built by two teenagers
Joby Aviation and Virgin Atlantic partner to launch electric air taxis in the UK
Amazon’s Echo will send all voice recordings to the cloud, starting March 28
Skype is shutting down in May — these are the best alternatives
FBI, EPA, and Treasury told Citibank to freeze funds as Trump administration tries to claw back climate money
Subscribe for the industry’s biggest tech newsEvery weekday and Sunday, you can get the best of TechCrunch’s coverage.TechCrunch’s AI experts cover the latest news in the fast-moving field.Every Monday, gets you up to speed on the latest advances in aerospace.Startups are the core of TechCrunch, so get our best coverage delivered weekly.By submitting your email, you agree to our Terms and Privacy Notice.© 2025 Yahoo.
Source: https://techcrunch.com/2025/03/15/bluesky-users-debate-plans-around-user-data-and-ai-training/