Everyone's Doing It, But Is It Legal?

Watch out. Are your big data feeds in blatant violation of terms of service agreements?

The Big Data movement taking hold in Financial Services is based in a large part on the aggregation of data from different sources, often using web scraping methods like a bot or spider. And while it's easy to see the web as an open source of information for the taking, it's worth reminding that it's really not. So before finding yourself at the wrong end of a cease and desist letter or court order, take a good look at the terms of service.

These words of warning come from The Practicing Law Institute's lecture, "Everyone's Doing It, But Is It Legal? Web Scraping & Online Data Harvesting," as part of the Social Media 2014: Addressing Corporate Risks seminar.

Web scraping is when a company uses automated programing to retrieve information or content made public by another company. If the scraping company makes competitive use of the other firm's information it could potentially be copyright infringement or a breach of contract (terms of use), among other potential violations.

[To hear about how financial firms are managing their complex data architectures, attend the Future of the Financial Services Data Center panel at Interop 2014 in Las Vegas, March 31-April 4. You can also REGISTER FOR INTEROP HERE.]

"You may be surprised how few websites have prohibitions in place from web scraping bots," said Dale Cendali, ESQ from Kirkland & Ellis LLP. However, a site can always change its terms of use, and recent court cases against crawlers show there can be real impacts to violators.

Some Pointers

Are you scraping data from other sites? Interested in scraping? Or worried about being scraped? Cendali and Anthony Dreyer, Esq at Skadden, Arps, Slate, Meagher & Flom LLP, offer up this bit of advice:

If you are interested in scraping:
- Be aware of terms of use for sites you may want to crawl
- Consider what information you need to crawl, and how you intend to use it (is it copyrighted?)
- Consider how often you need to crawl (repeated crawling can weigh on a site's servers, potentially triggering liability)
- Consider what others in the same industry are doing
- Respect Robots.txt files (Robots.txt is a text file that website owners can put in web site hierarchy to instructs automated software not to crawl the site)

If you are concerned about being scraped:
- Draft terms of use to prevent scraping
- Prominently post terms of use and/or have people click to accept terms of use
- Use Robots.txt

3 Kinds of Contract Agreements for Websites

Perhaps the best defense, for the scraper or scrapee is the Terms of Service. If it's not explicitly stated that scraping is against the site's user agreement then the scraper may have a better legal ground to stand on in court. Naturally, an iron clad terms of service (coupled with cease and desist orders) helps protect the site being scraped.

[Wall Street is making headway in the social media space. Read: ING Goes Social: Rolls Out LinkedIn for Advisors to learn more.]

Of course, things aren't always that simple. According to Dreyer court cases in this arena have demonstrated that the display of contract agreement is also an important element in a defense. There are three kinds of online agreements for websites:

- Click-wrap: These require users to consent to terms and conditions by clicking that pesky "I Agree" or "I Accept" button before the user can proceed to use a website. These are generally considered enforceable, due to the clear actionable assent. Although courts acknowledge users don't really read the terms of agreement they do so at their own risk.
- Browse-wrap: This is the posting of a link to the terms and conditions on a websites for users to click on if interested, but is not required to use the site. It is usually found at the very bottom of a webpage on a toolbar. In this case user consent is implied by continued use of the site. However, the visibility and accessibility of the link plays an extremely significant roll in court.
- Contract implied by conduct: Less common, this is when terms of use are presented after first accessing information on a website. On subsequent visits it is understood the user is on notice. Consider it a one free pass situation.

Cendali adds that while marketers are often at battle with legal over the size and prominence of terms of service, a company's best defense is to make sure all the terms of service are prominent. Use all the terms like scraping, crawling, spidering, data harvesting etc and don't feel bad about bothering viewers with bigger notices.

Case in point, after a few interesting court cases of its own over data scraping, Ticketmaster now has what can be considered a very clear, very bold and all capitalized browse-wrap link at bottom of the webpage. "People may not like it on their site but you find creative ways to show it," offers Cendali.

In the world of financial services where data is being pulled beyond the US borders it would be wise to also consider international laws around Terms of Use and copyright infringement. Becca Lipman is Senior Editor for Wall Street & Technology. She writes in-depth news articles with a focus on big data and compliance in the capital markets. She regularly meets with information technology leaders and innovators and writes about cloud computing, datacenters, ... View Full Bio