Disruptive technology data: case studies

Global Publication February 2021

Disruptive technologies, such as AI, IoT, AVs, distributed ledger technology (DLT), cryptocurrencies and smart contracts, generate many different forms of data. What are the particular characteristics of such data, and to what extent can intellectual property rights or other rights protect them?


Machine-generated data

Machine-generated data, particularly those collected from large populations on a frequent basis (such as those captured by smart meters, AVs, and connected devices), can constitute valuable Big Data from which businesses can draw useful insights.

Determination of rights in relation to machine-generated data may not be straightforward because there are multiple actors within any particular data-generating ecosystem (for an example of this, see Autonomous Vehicles).

EU and the UK

Machine-generated data will often be in the form of unstructured data. From a European perspective, it would not fall within the definition of a “database” in the Database Directive, although it may be possible to arrange the unstructured data so as to attract the EU database right protection.

The EU Commission considers that the sui generis dataset right does not apply to databases that are the by-product of the main activity of an organization (such as machine-generated data, IoT devices, Big Data, AI, etc.); it only covers databases that contain data obtained from external sources (for example, industries like publishers, who seek out data in order to commercialize databases).1

The contents of a machine-generated database are unlikely to attract copyright from a European perspective. Practically speaking, Big Data sets comprised of machine-generated data will best be protected contractually, or simply by keeping the datasets secure inside the business.

United States

US Patent and Trademark Office requires a patentable invention to be by a natural person, not by AI. This obviously places limits around protecting an AI-generated invention.

As at the date of publication, the U.S. Copyright Office has not yet ruled specifically in relation to works generated by AI but has previously required that works must be created by humans to be copyrightable, and will not register works created by nature, animals or plants.2


Canada is, at the date of publication, in the process of clarifying protection for machine-generated works, including data. In 2019, Parliament’s Standing Committee on Industry, Science and Technology presented a Statutory Review of the Copyright Act to Parliament, containing 36 recommendations for legislative changes, including changes to provide clarity around machine-generated works.


Data that is purely machine-generated will not be protected under Singapore copyright law, as the Singapore courts have recognized that only natural persons may be considered authors of copyright works. (Singaporean law also does not provide for a sui generis database right, such as the one recognized under EU law.)

Machine-generated data may potentially be protected under the common law of confidence if it meets the qualifying criteria i.e. it possesses the necessary quality of confidentiality and was imparted in circumstances importing an obligation of confidence. However, this has yet to be tested in the Singaporean courts.


It can be difficult to determine the copyright holder for machine-generated data in China. Consequently it can be difficult to obtain protection under the PRC Copyright Law in respect of such data, especially where a data is co-generated by multiple actors. Big Data sets comprised of machine-generated data will best be protected contractually, or simply by keeping the datasets secret.


Australian legislation does not specifically provide for protecting machine generated data. A database may be capable of limited copyright protection under the Copyright Act 1968 (Cth). However, if the particular database is purely machine-generated it will not be capable of such protection as a human author is required.

A database can generally be protected as confidential information but obligations of confidentiality are difficult to enforce against third parties in Australia.


Algorithms and output of algorithms

The coding which analyses data and turns it into meaning often comes in the form of an algorithm. The make-up of AI software which enables a machine to learn and to make predictions or decisions is also prescribed by the underlying algorithm.

Algorithms expressed in any form (be it in the form of natural language or software code) may in some jurisdictions attract copyright, by virtue of the author’s intellectual creativity to create that expression.

However, what is less clear is whether a machine-made algorithm would attract copyright. Such an algorithm might arise as a result of machine learning, where the human-created software (which would usually attract copyright) enables the machine to create its own algorithm (see Who controls the works produced by artificial intelligence?)

The outputs of an algorithm would not be likely to attract any copyright protection in many jurisdictions because the output is determined by the algorithm – no degree of (human) intellectual creation would have been required to arrive at the output.3 However, particular jurisdictions may have legislation conferring copyright on computer-generated works, as in the case of the UK’s CDPA 1988 (see Ways of Protecting Data).

Both algorithms and output of algorithms are capable of being protected by trade secrets law and by contract.


Training and testing datasets for AI

In Europe, training and testing datasets for AI are unlikely to fall within the narrow definition of “database” within the EU’s Database Directive – being “a collection of independent works, data or other materials arranged in a systematic or methodical way and individually accessible by electronic or other means.”

Such datasets are valuable and can be protected: (1) by trade secret law, provided they are kept secret by restricting third party access; or (2) contractually by sharing under the terms of a non-disclosure agreement.


Datasets in a blockchain or distributed ledger

Datasets in a:

  • Permissionless blockchain may not be protectable at all. They are distributed across multiple nodes, available for all to see, and there are typically no contractual terms attached to them.
  • Permissioned (gated) distributed ledger may be protected by trade secret law or contractually.

The positions set out above are separate from the question whether the content of a particular ledger block may itself attract copyright (whether as one or several copyright works).


In some cases, it may be possible for statistics to attract copyright in some jurisdictions.4 It would typically depend on whether the statistics reflect the author’s intellectual creation.

From a European perspective, the position can be summarized as follows:

Type of data  Patent right  Copyright Trade Secrets Copyright in Database  Sui Generis Database right Contract  
Structured Dataset Possible Possible 🗸
Possible Possible 🗸
Unstructured Dataset
Possible 🗸

Algorithm Possible 🗸

Output of algorithm


Dataset collected by IoT


Dataset in Permissionless Distributed ledger

Training Dataset for AI

Possible 🗸



1   Executive Summary of the Evaluation of Directive 96/9/EC on the legal protection of databases

2   Compendium of U.S. Copyright Office Practices, § 313.2 (Third Edition (2017)).

3   Bookmakers’ Afternoon Greyhound Services Ltd v Wilf Gilbert (Staffordshire) Ltd [1994] FSR 723.

4   In PCR Ltd v Dow Jones Telerate Ltd [1998] EMLR 407, a report of summary of podcounts (number of cocoa pods per tree) and the likely size of crop in a given season constituted copyright, because it was one of the most important parts of a report. In that case the report leaked, and the defendant was held not to have knowledge that the leaked report was confidential.