We keep an eye out for the most interesting stories about Labby subjects: digital media, startups, the web, journalism, strategy, and more. Here’s some of what we’ve seen lately.
“Over the past year, many of the most important web sources used for training A.I. models have restricted the use of their data, according to a study published this week by the Data Provenance Initiative, an M.I.T.-led research group. The study, which looked at 14,000 web domains that are included in three commonly used A.I. training data sets, discovered an ’emerging crisis in consent,’ as publishers and online platforms have taken steps to prevent their data from being harvested.” —
Tags:
AI,
Common Crawl,
crawlers,
Data Provenance Initiative,
Robots Exclusion Protocol