As artificial intelligence (AI) becomes more prevalent, the regulation of AI data scraping is raising complex questions for both website owners and data collection companies. While many websites have tried to ban AI data scraping through their terms of use, a recent federal court decision has clarified that the regulation of public data scraping from social media platforms should be governed by federal copyright law.
Data scraping involves extracting and copying data from websites, including social media platforms, often for training AI models. Since scraped data can contain user-generated content and personal information, many companies, including X Corp. (the owner of the social media platform X, formerly Twitter), have sought to enforce anti-scraping policies through their terms of use. X Corp. recently challenged data-scraping company Bright Data Ltd. in federal court for violating these policies.
The case, X Corp. v. Bright Data Ltd., saw the US District Court for the Northern District of California side with Bright Data, dismissing X Corp.’s breach-of-contract claims. The court ruled that X Corp.’s attempt to enforce its anti-scraping terms was preempted by federal copyright law.
X Corp. aimed to hold Bright Data accountable for accessing its systems to scrape and sell data from the X website. X Corp. argued that Bright Data violated “browsewrap” and “clickwrap” agreements that prohibited scraping. Despite these claims, the court found that the rights to the content on X belong to the users, not X Corp. The court emphasized that the Copyright Act governs the rights to such public data, not X Corp.’s terms of service, leading to the dismissal of the case.
The court noted that X Corp.’s terms of service specify that X users retain copyright ownership of the content they post, while X Corp. has a non-exclusive, royalty-free license to use that content. The court criticized X Corp. for trying to expand its rights beyond this license, which could restrict others from using or distributing the content. It also rebuked X Corp. for attempting to implement a private copyright system conflicting with the Copyright Act, which governs public data use.
DC IP Lawyers Note: This decision highlights the Copyright Act’s role in protecting the broad rights of copyright owners, irrespective of the content’s medium. The court reinforced that X users, as copyright owners, have the power to exclude others from using their content, even if it’s accessible through X. The case also underscored the importance of fair use in digital contexts and ensuring that content Congress intended to be freely accessible remains so.
Although the court dismissed the case, Judge William Alsup noted that not all state law interests would be preempted by copyright law, particularly those focused on privacy protection.
For data-scraping companies, this ruling is likely favorable, as it clarifies that such companies may not be liable for violating website terms of service that prohibit scraping if copyright law is not infringed. Companies should still analyze the copyright status of the data they scrape and consider fair use defenses.
For website owners and social media companies, this case demonstrates the limitations of the Copyright Act in protecting against data scraping, particularly when the data involved is subject to fair use or not covered by copyright. Terms of use alone may not suffice to prevent data scraping.
DC IP Lawyers advise that, ultimately, copyright law and its application to data scraping, especially in the context of AI, remain crucial issues. How courts across the nation address these concerns will continue to evolve.