BIG DATA AND COPYRIGHT
Author: Shubhangi Chhaya, III year of B.A.,LL.B. from Christ (deemed to be) University
According to the Berne Convention, copyright protects literary and artistic works that must first fulfill the “originality” requirement. Depending on the jurisdiction, such works may also have to fulfill the requirement of “fixation” and/or “human intellectual creations”.
An original work, in contrast to copies, reproductions, plagiarism, or derivative works, refers to a work created by the author and reflects the author’s own intellectual creation. Works, as the object of copyright, are expressions of the author’s certain ideas and emotions. The intangibility of the object is the essential characteristic that distinguishes intellectual property rights from other property rights, as does the object of copyright. However, such intangible objects can usually be fixed in a tangible form.
Article 2 of the Berne Convention provides that “it shall; however, be a matter for legislation in the countries of the Union to prescribe that works in general or any specified categories of works shall not be protected unless they have been fixed in some material form.”
The TRIPS Agreement recognizes computer software as “literary work” under the Berne Convention. Ordinarily, copyright laws protect software and computer programs used to gather and analyze Big Data. Original data analysis tools used to mine, clean, separate, and transform data can also be copyrighted.
To be eligible for protection, a piece of software and other data analysis tools sought to be protected by copyright must have been reduced into writing or expressed in a fixed medium and must possess some level of originality. In arriving at what constitute "originality", the Berne Convention, states that the Collections of literary or artistic works such as encyclopedias and anthologies which, by reason of the selection and arrangement of their contents, constitute intellectual creations shall be protected as such, without prejudice to the copyright in each of the works forming part of such collections.
DEFINING BIG DATA
The origins of large data sets go back to the 1960s and ‘70s when the world of data was just getting started with the first data centers and the development of the relational database.
Big data refers to data that is so large, fast or complex that it’s difficult or impossible to process using traditional methods. The act of accessing and storing large amounts of information for analytics has been around for a long time.
The concept of big data gained momentum in the early 2000s when industry analyst Doug Laney articulated the now-mainstream definition of big data as the three V’s-
Volume - Organizations collect data from a variety of sources, including transactions, smart (IoT) devices, industrial equipment, videos, images, audio, social media and more. In the past, storing all that data would have been too costly – but cheaper storage using data lakes, Hadoop and the cloud have eased the burden.
Velocity - With the growth in the Internet of Things, data streams into businesses at an unprecedented speed and must be handled in a timely manner. RFID tags, sensors and smart meters are driving the need to deal with these torrents of data in near-real time.
Variety - Data comes in all types of formats from structured, numeric data in traditional databases to unstructured text documents, emails, videos, audios, stock ticker data and financial transactions.
Around 2005, people began to realize just how much data users generated through Facebook, YouTube, and other online services. Put simply, big data is larger, more complex data sets, especially from new data sources. These data sets are so voluminous that traditional data processing software just can’t manage them. But these massive volumes of data can be used to address business problems you wouldn’t have been able to tackle before.
With the advent of the Internet of Things (IoT), more objects and devices are connected to the internet, gathering data on customer usage patterns and product performance. The emergence of machine learning has produced still more data.
Big data is a term that describes large, hard-to-manage volumes of data, both structured and unstructured. But it’s not just the type or amount of data that’s important, it’s what organizations do with the data that matters. Big data can be analyzed for insights that improve decisions and give confidence for making strategic business moves.
Intellectual property is defined by the Oxford English Dictionary as "intangible property that is the result of creativity". Intellectual property rights are the rights that adhere to such creations and that grant the holder thereof a monopoly on the use of that creation for a specified period and subject to certain exceptions.
The underlying aim of granting such (temporary) monopoly, which entails a certain social cost, is to incentivize creators to share their creation with the public, and to achieve the social benefits of increased creative activity.
According to the Berne Convention, copyright protects literary and artistic works that must first fulfill the “originality” requirement. Copyright refers to the legal right of the owner of intellectual property. In simpler terms, copyright is the right to copy. This means that the original creators of products and anyone they give authorization to are the only ones with the exclusive right to reproduce the work.
When someone creates a product that is viewed as original and that required significant mental activity to create, this product becomes an intellectual property that must be protected from unauthorized duplication. Examples of unique creations include computer software, art, poetry, graphic designs, musical lyrics and compositions, novels, film, original architectural designs, website content, etc. One safeguard that can be used to legally protect an original creation is copyright.
RELATION BETWEEN BIG DATA AND COPYRIGHT
Copyright interfaces with Big Data in several aspects. From the computer software applied in data collection and processing to the data sets (collections of data), to the outcomes generated via Big Data technologies.
In the context of big data projects, it is crucial to understand to what extent the data used can be copyright protected. In all likelihood, most of the data collected and processed in a big data analytics context will not be considered original and will therefore not benefit from copyright protection. Having said that, it cannot be excluded that the individual data can gain originality once they are connected with other information or presented in an original way (by means of different possible forms of expression).
Copyright comes into the picture since the law safeguards the computer software and programs that are used to collect and analyze big data. In most countries, such tools are used for data analytics that aid in mining, deleting, segregating, and transforming the data can be protected; for instance, Copyright Laws in Nigeria, India, and the USA.
Ordinarily, copyright laws protect software and computer programs used to gather and analyze Big Data. It is important to note that to be eligible for protection, a piece of software and other data analysis tools sought to be protected by copyright must have been reduced into writing or expressed in a fixed medium and must possess some level of originality. In arriving at what constitute "originality", the Berne Convention, states that the "Collections of literary or artistic works such as encyclopedias and anthologies which, by reason of the selection and arrangement of their contents, constitute intellectual creations shall be protected as such, without prejudice to the copyright in each of the works forming part of such collections."
CASE STUDIES OF BIG DATA AND COPYRIGHT
Data is IP,data is critical to our survival and our competitive edge. Data could reveal something around power density in a fuel cell. That data, if it got into the wrong hands, could result in us losing competitive edge in the market.
“What people don’t understand is that data is a commodity now, and that it will be a more valuable commodity than property in the future,” Says Shaw Jonathan.
In the Google Books case, the database basically consists of word-searchable scans of the books. From a copyright standpoint, therefore, it is doubtful whether a Big Data corpus of this sort, or a “dump” of personal data scraped from online search engines or social media sites would benefit from copyright protection. Hacking and other methods of unauthorized access to such corpora might be better handled via computer crimes and torts.
Website can plant small pieces of data known as cookies to identify the user, cookies can be used to record the user’s browsing activity on that site. These cookies can then be shared and the data therein consolidated to enable the behavioral advertising industry to broadcast, in real time, the usage patterns and interests of the user, and therefore to facilitate real-time bids by online advertisers for personalized advertising on the user’s browser page.
Netflix implements data analytics models to discover customer behavior and buying patterns. Then, using this information it recommends movies and TV shows to their customers. That is, it analyzes the customer’s choice and preferences and suggests shows and movies accordingly.
According to Netflix, around 75% of viewer activity is based on personalized recommendations. Netflix generally collects data, which is enough to create a detailed profile of its subscribers or customers. This profile helps them to know their customers better and in the growth of the business.
Google uses big data to optimize and refine its core search and ad-serving algorithms. And Google continually develops new products and services that have big data algorithms.
Google generally uses big data from its Web index to initially match the queries with potentially useful results. It uses machine-learning algorithms to assess the reliability of data and then ranks the sites accordingly.
Google optimized its search engine to collect the data from us as we browse the Web and show suggestions according to our preferences and interests.
The leading e-commerce company Amazon, Inc. (Amazon) utilized its big data resources to improve its performance. Being the dominant retailer on the Internet, Amazon had a vast database regarding the tastes, preferences, and previous purchasing history of its customers. Amazon leveraged its big data resources to give more relevant product recommendations and improve its customer care quality. Banking heavily on its big data resources, it upgraded its customer recommendation system.
The copyright holder is granted several exclusive economic rights that allow controlling the protected work's use and facilitate enforcement in case a third party uses the work without authorization. The rights of reproduction, communication to the public and distribution are indeed a useful toolkit which, balanced by the copyright exceptions, allows for an optimal protection of right holder's interests.
Copyright law therefore provides for a wide scope of measures securing the rights of the author in case of dissemination of his work and the use of these works by third parties. The rules governing copyright protection aim at enabling further use of the works, securing at the same time the legitimate interests of the author.
For a work to be protected, it must be fixed in some material (concrete) form. In this context, 'fixation', in a data context, would mean that the specific information needs to be saved in a tangible form. The form of saving the data can differ from handwritten notes (files), through photographic documentation (image) or recorded testimonies (sound) to digitized archives (digital files), as long as it remains concrete, can be easily identified and described. Results that have not yet been produced (future data), or results that cannot yet be described (e.g. because there are no means yet to express them) cannot benefit from copyright protection for as long as they have not materialized.
This can present some difficulties in a big data context, given that big data tends to involve dynamic datasets and notably relies on cloud computing services. In a data environment, the most important hindrance resulting from copyright protection is the necessity to obtain authorizations from the copyright holder of each individual data. In the context of big data projects, to the extent copyright applies, it would require identifying authors of hundreds (if not hundreds of thousands) of works. In many cases, it might be difficult to identify or find the right holder and/or understand whether he has given his authorizations for use of the work. In practice, this means that time-consuming analyses need to be performed before the data gathered can be used.
In conclusion to all the above discussion a final question emerges that - will big data qualify for copyright protection? The answer that can be concluded from the analysis is since these big data outputs are visualizations of data processing; they can be expressed in a material form. Thus, they meet the “fixation” requirement.
Secondly, it appears that these outputs will possess originality – either as compilations (outcomes of selection and arrangement of raw data according to an algorithm), or as a work of more creativity (articles, poems, painting, etc.).
In addition to this it has to be reconsidered what actually counts as authorship when it comes to big data. After all, the majority of the ‘work’ involved is undertaken by computing software, thus creating a sizable grey area between human creators and the tools they are using in the digital age.
Thus current IP laws are not adequate to guard precious pools of data available in the digital universe, and legislations have to be made expanding the scope of existing IP laws or developing entirely new IP protection regime.
 Berne Convention (n 11) art. 2(5)  Berne convention. Art 2(2)  TRIPS (n 1), art. 27(1)  Berne Convention art. 2(5)  Oracle, https://www.oracle.com/in/big-data/what-is-big-data/#challenges, (last visited on Jan 26, 2022) Jenn Cano, ‘The V's of Big Data: Velocity, Volume, Value, Variety, and Veracity’ (March 11, 2014), https://www.xsnet.com/blog/bid/205405/ (Last visited on Jan 26,2022)  SAS, https://www.sas.com/en_in/insights/big-data/what-is-big-data.html, (Last visited on Jan 26, 2022) Karen Hallenstein & Jane Perrier, 'Big Data & Intellectual Property – Strategic Alignment for Commercial Success' [Vol. 8, No. 31, Spring 2015, 1] http://www.iicj.net/subscribersonly/15april/iicj4april-ip-karenhallenstein-telstra-australia.pdf(Last visited on Jan 28, 2022)  Copyright, Well Kenton, https://www.investopedia.com/terms/c/copyright.asp (Last Visited on Jan 29, 2022)  Berne Convention (n 11) art. 2(5)  Bernard Brode, When big data collides with IP Law, (Jan 15, 2021) Insidebigdata,https://insidebigdata.com/2021/01/15/when-big-data-collides-with-intellectual-property-law/ (Last visited on Jan 30, 2022) Shaw, Jonathan, 'Why "Big Data" Is a Big Deal': [Harvard Magazine March-April 2014] http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal. (Last visited on Jan, 30, 2022) Article 2 (5), Berne Convention for the Protection of Literary and Artistic Works (as amended on September 28, 1979) (Authentic text) https://wipolex.wipo.int/en/treaties/textdetails/12214 (Last Visited on Jan. 31, 2022) Shaw, Jonathan, 'Why "Big Data" Is a Big Deal': [Harvard Magazine March-April 2014] http://harvardmagazine.com/2014/03/why-big-data-is-a-big-deal. (Last visited on Feb. 2, 2022)  Julien Debussche, Jasmien César, Big data: issues and opportunities, https://www.twobirds.com/en/news/articles/2019/global/big-data-and-issues-and-opportunities-ip- (Last visisted on Feb. 3, 2022)  Daniel Seng, Big Data and Copyright, SSRN,7-9file:///C:/Users/dell/Downloads/SSRN-id3913015.pdf (Last visited on Feb. 3, 2022)  Data flair, https://data-flair.training/blogs/big-data-case-studies/ (last visited on Jan. 31, 2022)  Data Science and Machine learning, https://towardsdatascience.com/the-most-important-supreme-court-decision-for-data-science-and-machine-learning-44cfc1c1bcaf (Last visited on Feb. 4,2022)  Charlotte Kalpatrick, How to protect Big data, https://www.managingip.com/article/b1kbljy4tbktcm/how-to-protect-big-data (Last visited on Feb. 4, 2022)