Government,
9th U.S. Circuit Court of Appeals
Mar. 27, 2018
‘Scraping’ is just automated access, and everyone does it
If courts allow companies to use the Computer Fraud and Abuse Act to block automated access by competitors, it will threaten open access to information for everyone.
Jamie Lee Williams
Staff Attorney, Electronic Frontier Foundation
For tech lawyers, one of the hottest questions this year is: Can companies use the Computer Fraud and Abuse Act -- an imprecise and outdated criminal anti-"hacking" statute intended to target computer break-ins -- to block their competitors from accessing publicly available information on their websites? The answer to this question has wide-ranging implications for everyone: It could impact the public's ability to meaningfully access publicly available information on the open web. In a world of algorithms and artificial intelligence, lack of access to data is a barrier to entry, and blocking access to data means blocking any chance for meaningful competition.
The CFAA was enacted in 1986, when there were only about 2,000 computers connected to the internet. The law makes it a crime to access a computer connected to the internet "without authorization" but fails to explain what this means. It was passed with the aim of outlawing computer break-ins, but has since metastasized in some jurisdictions into a tool to enforce computer use policies, like terms of service, which no one reads.
Efforts to use the CFAA to threaten competitors increased in 2016 following the confusing 9th U.S. Circuit Court of Appeals' Facebook v. Power Ventures decision. The case involved a social media aggregator's access to Facebook users' accounts. Power Ventures was a product that allowed Facebook users to post simultaneously to multiple social media websites Facebook didn't like this and sent a cease and desist letter, tried to block Power Ventures IP address, and eventually filed a lawsuit. The court held that Power Ventures violated the CFAA by continuing to access Facebook after receiving the cease and desist letter and after Facebook's instituted its IP address block -- despite having ongoing authorization from Facebook's users to access their accounts via their still valid login credentials.
After the decision was issued, companies -- almost immediately -- started citing the case in cease and desist letters, demanding that competitors stop using automated methods to access publicly available information on their websites. Some of these disputes have made their way to court, the most high profile of which is hiQ v. LinkedIn, which involves automated access of publicly available LinkedIn data. Note however that Power Ventures didn't involve public data; it involved private data, stored behind a user-name and password barrier. As law professor Orin Kerr has explained, posting information on the web and then telling someone they are not authorized to access it is "like publishing a newspaper but then forbidding someone to read it."
The web is the largest, ever-growing data source on the planet. It's a critical resource for journalists, academics, businesses, and users alike. But meaningful access sometimes requires the assistance of technology, automating and expediting an otherwise tedious process of accessing, collecting and analyzing public information. This process of using a computer to automatically load and read the pages of a website for later analysis is often referred to as "web scraping".
As a technical matter, web scraping is simply machine automated web browsing. There is nothing that can be done with a web scraper that cannot be done by a human with a web browser. And it is important to understand that web scraping is a widely used method of interacting with the content on the web: Everyone does it -- even (and especially) the companies trying to convince courts to punish others for the same behavior.
Companies use automated web browsing products to gather web data on the performance ranking of its products in Amazon search results, or to monitor information posted publicly on social media to keep tabs on issues that require consumer support, track competitors, stay up to date on news relevant to their businesses, and influence product development. E-commerce businesses use automated web browsers to monitor competitors' pricing and inventory, and to aggregate information to help manage supply chains. Businesses also use automated web browsers to detect fraud, and to perform due diligence checks on their customers and suppliers collect market data to help plan for the future.
Journalists and information aggregators also rely on scraping. The San Francisco Chronicle used automated web browsing to assess the impact of Airbnb listings on the San Francisco rental market, and ProPublica used web browsing to uncover that Amazon's pricing algorithm was hiding the best deals from its customers. The Internet Archive's web crawlers -- one specialized example of the use web scraping -- works to archive as much of the public web as possible. Google's Crisis Map aggregated information about traffic, shelter availability and resource needs during California's 2017 wildfires. Google's web crawlers that power the search tool most of us rely on every day are simply web scraping "bots."
During a recent 9th Circuit hearing in hiQ v. Linkedin, LinkedIn tried to analogize the case to United States v. Jones, arguing that hiQ's use of automated tools to access public information is different "in kind" than manually accessing that same information, just as long-term GPS monitoring of someone's public movement is different from merely observing someone's public movements.
But LinkedIn itself acknowledges in its privacy policy that it, too, uses automated tools, to "collect public information about you, such as professional-related news and accomplishments" and makes that information available on its own website -- unless a user opts out via adjusting their default privacy settings. Question: How does LinkedIn gather that data on their users? Answer: web scraping. The only thing that makes hiQ's access different is that LinkedIn doesn't like it.
Of course LinkedIn doesn't like it; it wants to block a competitor's ability to meaningful access to information it that its users posts publicly online. But just because LinkedIn or any other company doesn't like automated access, that doesn't mean it should be a crime.
As law professor Michael J. Madison wrote, resolving the debate about the CFAA's scope "is linked closely to what sort of Internet society has and what sort of Internet society will get in the future." If courts allow companies to use the CFAA to block automated access by competitors, it will threaten open access to information for everyone.
Some have argued that scraping is what dooms access to public information, because websites will just place their data behind an authentication gate. But it's naïve to think that LinkedIn would put up barriers to access; that would mean less data collection and fewer eyes for advertisers. Its default is public, and it wants to keep it that way. It wants to participate in the open web and use the CFAA to avoid accepting the web's open access norms.
The public is already losing access to information. With the rise of proprietary algorithms and artificial intelligence, both private companies and governments are making high stakes decisions that impact lives with little to no transparency. In this context, it is imperative that courts not take lightly attempts to use the CFAA to limit access to public information on the web.
For reprint rights or to order a copy of your photo:
Email
jeremy@reprintpros.com
for prices.
Direct dial: 949-702-5390
Send a letter to the editor:
Email: letters@dailyjournal.com