Is data really that important to a business?
A quick survey of the top companies by market capitalisation readily reveals that data is key.
We use cookies and other similar technology to collect data about you to allow us to deliver our online services, measure our website audience and improve your browsing experience. Full details on the cookies we use are set out in our Cookies policy. Please click OK to signify your consent to our use of cookies.
You can withdraw your consent by clicking “manage cookies” and following the instructions shown.
Global | Publication | February 2021
Disruptive technologies, such as AI, IoT, AVs, distributed ledger technology (DLT), cryptocurrencies and smart contracts, generate many different forms of data. What are the particular characteristics of such data, and to what extent can intellectual property rights or other rights protect them?
Machine-generated data, particularly those collected from large populations on a frequent basis (such as those captured by smart meters, AVs, and connected devices), can constitute valuable Big Data from which businesses can draw useful insights.
Determination of rights in relation to machine-generated data may not be straightforward because there are multiple actors within any particular data-generating ecosystem (for an example of this, see Autonomous Vehicles).
Machine-generated data will often be in the form of unstructured data. From a European perspective, it would not fall within the definition of a “database” in the Database Directive, although it may be possible to arrange the unstructured data so as to attract the EU database right protection.
The EU Commission considers that the sui generis dataset right does not apply to databases that are the by-product of the main activity of an organization (such as machine-generated data, IoT devices, Big Data, AI, etc.); it only covers databases that contain data obtained from external sources (for example, industries like publishers, who seek out data in order to commercialize databases).1
The contents of a machine-generated database are unlikely to attract copyright from a European perspective. Practically speaking, Big Data sets comprised of machine-generated data will best be protected contractually, or simply by keeping the datasets secure inside the business.
US Patent and Trademark Office requires a patentable invention to be by a natural person, not by AI. This obviously places limits around protecting an AI-generated invention.
As at the date of publication, the U.S. Copyright Office has not yet ruled specifically in relation to works generated by AI but has previously required that works must be created by humans to be copyrightable, and will not register works created by nature, animals or plants.2
Canada is, at the date of publication, in the process of clarifying protection for machine-generated works, including data. In 2019, Parliament’s Standing Committee on Industry, Science and Technology presented a Statutory Review of the Copyright Act to Parliament, containing 36 recommendations for legislative changes, including changes to provide clarity around machine-generated works.
Data that is purely machine-generated will not be protected under Singapore copyright law, as the Singapore courts have recognized that only natural persons may be considered authors of copyright works. (Singaporean law also does not provide for a sui generis database right, such as the one recognized under EU law.)
Machine-generated data may potentially be protected under the common law of confidence if it meets the qualifying criteria i.e. it possesses the necessary quality of confidentiality and was imparted in circumstances importing an obligation of confidence. However, this has yet to be tested in the Singaporean courts.
It can be difficult to determine the copyright holder for machine-generated data in China. Consequently it can be difficult to obtain protection under the PRC Copyright Law in respect of such data, especially where a data is co-generated by multiple actors. Big Data sets comprised of machine-generated data will best be protected contractually, or simply by keeping the datasets secret.
Australian legislation does not specifically provide for protecting machine generated data. A database may be capable of limited copyright protection under the Copyright Act 1968 (Cth). However, if the particular database is purely machine-generated it will not be capable of such protection as a human author is required.
A database can generally be protected as confidential information but obligations of confidentiality are difficult to enforce against third parties in Australia.
The coding which analyses data and turns it into meaning often comes in the form of an algorithm. The make-up of AI software which enables a machine to learn and to make predictions or decisions is also prescribed by the underlying algorithm.
Algorithms expressed in any form (be it in the form of natural language or software code) may in some jurisdictions attract copyright, by virtue of the author’s intellectual creativity to create that expression.
However, what is less clear is whether a machine-made algorithm would attract copyright. Such an algorithm might arise as a result of machine learning, where the human-created software (which would usually attract copyright) enables the machine to create its own algorithm (see Who controls the works produced by artificial intelligence?)
The outputs of an algorithm would not be likely to attract any copyright protection in many jurisdictions because the output is determined by the algorithm – no degree of (human) intellectual creation would have been required to arrive at the output.3 However, particular jurisdictions may have legislation conferring copyright on computer-generated works, as in the case of the UK’s CDPA 1988 (see Ways of Protecting Data).
Both algorithms and output of algorithms are capable of being protected by trade secrets law and by contract.
In Europe, training and testing datasets for AI are unlikely to fall within the narrow definition of “database” within the EU’s Database Directive – being “a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means.”
Such datasets are valuable and can be protected: (1) by trade secret law, provided they are kept secret by restricting third party access; or (2) contractually by sharing under the terms of a non-disclosure agreement.
Datasets in a:
The positions set out above are separate from the question whether the content of a particular ledger block may itself attract copyright (whether as one or several copyright works).
Statistics
In some cases, it may be possible for statistics to attract copyright in some jurisdictions.4 It would typically depend on whether the statistics reflect the author’s intellectual creation.
From a European perspective, the position can be summarized as follows:
Type of data | Patent right | Copyright | Trade Secrets | Copyright in Database | Sui Generis Database right | Contract |
---|---|---|---|---|---|---|
Structured Dataset | Possible | Possible | 🗸 |
Possible | Possible | 🗸 |
Unstructured Dataset | ✘ |
Possible | 🗸 |
✘ |
✘ |
🗸 |
Algorithm | Possible | 🗸 |
🗸 |
✘ |
✘ |
🗸 |
Output of algorithm | ✘ |
✘ |
🗸 |
✘ |
✘ |
🗸 |
Dataset collected by IoT | ✘ |
✘ |
🗸 |
✘ |
✘ |
🗸 |
Dataset in Permissionless Distributed ledger | ✘ |
✘ |
✘ |
✘ |
✘ |
✘ |
Training Dataset for AI | ✘ | ✘ |
🗸 |
✘ |
✘ |
🗸 |
Statistics | ✘ |
Possible | 🗸 |
✘ |
✘ |
🗸 |
A quick survey of the top companies by market capitalisation readily reveals that data is key.
The value that can be gained from data by businesses will inevitably lead to an increase in the use of data to improve daily operations and to develop new products, services and processes.
In many jurisdictions pure information, or data, is not considered to be property. This is because a claim to property in intangible information presents obvious definitional difficulties.
There is a patchwork of different rights, intellectual property rights and contract rights that may apply to data. Understanding the way in which these rights come into play enables a business to understand how its data assets can be protected.
Disruptive technologies, such as AI, IoT, AVs, distributed ledger technology (DLT), cryptocurrencies and smart contracts, generate many different forms of data. What are the particular characteristics of such data, and to what extent can intellectual property rights or other rights protect them?
In this section, we review the EU’s position with regards to industrial and non-personal data and look at whether other jurisdictions have similar initiatives.
Data location laws (in relation to industrial and non-personal data) can be restrictive (as in banking secrecy laws, which may require some types of data to remain onshore or to be “localised”) or liberalising (as in laws that ban the prohibition of export of data from a locality).
In furtherance of the objective of leveraging existing datasets paid for by public funds, a number of jurisdictions have sought to make public sector information (PSI) available to industry.
The exclusive possession or control of data can have antitrust / competition law considerations, giving rise to access disputes.
The uncertain nature of intellectual property rights in data means that “contract is king” in data transactions.
Data is an incredibly valuable resource for businesses, enabling organisations to effectively operate and to make business improvements. In order to exploit this value most effectively, businesses must invest in good data management.
Errors, incompleteness or biases within data may flow through, and be amplified by, data analytics process outputs upon which a business's strategic and investment decisions may depend, potentially causing business losses. In this section we deal with liability arising out of use of data / datasets that are in some respect sub-optimal.