Contractual considerations in sharing data

Global Publication February 2021

The license clause and restrictions

The uncertain nature of intellectual property rights in data means that “contract is king” in data transactions. Accordingly in practice licensors will usually seek to rely on contractual undertakings rather than a claim for infringement of intellectual property rights when seeking to protect a valuable dataset.

The core of a data license is normally a set of contractual undertakings on the part of the licensee not to do certain things with the licensed data.


Usage restrictions

Data licensing models can often be complex, driven by the commercial imperatives of the licensor. Common restrictions include:

  • Restrictions on distribution or publication of data: This is the key restriction for a licensor, because free distribution of data will undercut the licensor’s own business. This might be an outright prohibition or a restriction on the volume of data that can be distributed in order to maintain the licensor’s control over the availability of the dataset in the market.
  • User or device restrictions: For example, data may only be accessed by a certain number of users or used on a certain number of devices.
  • Restrictions on the purpose or types of use of the data: For example, the licensee may be restricted to internal business purposes only, restricted from performing certain types of (or any) manipulation of the data, combing the licensed data with other data or using it in certain products.


A data license will often include provisions to ensure that the licensee is complying with the restrictions, such as rights of audit. From the licensee’s perspective, not only is it important to agree restrictions that allow for its contemplated use, but also to have a data management procedure to ensure that it remains in compliance.

This can prove especially difficult where data is placed in “data lakes” with the potential for multiple use cases, some of which may not have been contemplated at the time that the license was executed.

Contracts are only enforceable against the parties to them (some jurisdictions permit enforceable third party rights). In order to protect the value of a dataset, licensors will often seek to build a robust contractual framework across the whole data supply chain. This can be achieved either by:

  • Mandating terms in any downstream distribution contract (for example, restrictions on further distribution or in relation to uses in respect of which the licensor would not itself license the data); or
  • Through “pass-through” models which require a distributor to ensure that any party to whom it distributes data executes a separate license with the original licensor.

In both cases, the objective is to preserve the contractual restrictions on usage of data, either through a chain of contracts or by creating a direct contractual relationship with the end user.


Data quality

The quality of a dataset is often a complex issue, in terms of both defining what quality looks like and the contractual assurances that a licensor is willing or able to give.

Licensees will often want contractual assurances that the dataset is suitable for its envisaged use case, especially if the licensee is placing reliance on the data. Depending on that use case, the attributes that it might be relevant include:

  • Size: Is the dataset of a suitable size, for example, for training an AI or to constitute a representative sample?
  • Completeness: Does the dataset contain all of the data points expected by the licensee?
  • Accuracy: Is the data correct and free of errors?
  • Timeliness: Is the dataset current? Where the licensee is taking a data feed, how often is the data updated and with what latency?
  • Source: Must the dataset include certain sources of input data - for example, to ensure that it is representative?


The Licensor’s position

For a licensor, it may be difficult to give assurances as to quality. Licensors are generally conscious that the licensee will most likely be “putting the data to work”, for example, for training an AI or as an input to a statistical model used in finance or industry.

In such a scenario, licensors are often unwilling to underwrite reliance on the data. This might be because:

  • There are practical challenges in ensuring the quality of a large dataset.
  • The licensor has itself not received assurances of quality from the suppliers of input data.
  • The licensor cannot quantify the risk of providing poor quality data where it does not have total control over the use to which it is put; or
  • It is not justified by the commercial balance of risk and reward. 


A licensor will often provide data on an “as is” basis or, rather than providing warranties that the output is of a specific quality, may warrant that it has followed specific processes when preparing the data – for example, that the data has been created in accordance with specified rules or algorithms or that specific quality assurance procedures have been followed.