AI and drug discovery, Part 3: Protecting the value of data in collaborations

September 23, 2019

Part of Artificial Intelligence and Drug Discovery – A Three-Part Update

In Part 2 of our 2019 AI and Pharma update, we looked at the Melloddy project—a ground-breaking collaboration between pharmaceutical companies, technology companies, and academic institutions to accelerate drug discovery by sharing pharmaceutical industry data. As we explained, the Melloddy project uses federated learning and blockchain technology to train machine learning (ML) engines using a decentralized architecture that protects each partner’s proprietary data. The Melloddy project is part of a growing trend of using AI/ML in the pharmaceutical industry, which we investigated in Part 1 of our update.

In this Part 3, we highlight some of the legal issues surrounding the use of AI/ML in collaborative drug discovery and some solutions for addressing these issues.

Data are Valuable Assets

According to the Economist, data is the world’s most valuable resource. Moreover, the value of data increases tremendously when shared or pooled. Larger data sets allow more powerful ML engines to be trained and more valuable insights to be derived—such as predictions of how molecules will interact with the human body.

Value creation through data sharing is a major driver of collaborations. Another major driver is the need for companies with disparate strengths and resources, including proprietary technology such as AI models and algorithms, or expertise in computational and medicinal chemistry or biology, to work together in interdisciplinary teams.

Given the value of data, it is important for companies to protect these assets, particularly when engaging in collaborations.

Protecting Data as Intellectual Property

Data can be protected as a form of intellectual property. Under Canadian law, although mere information is not subject to copyright protection, data sets that are assembled with the exercise of skill and judgment can be protected as copyrighted works.[1]

Canadian law also protects data as trade secrets if certain requirements are met. For example, steps must be taken to keep the data confidential. This means that everyone with access to the data (e.g., employees, contractors, or collaborators) should be contractually bound to keep the data confidential. Also, measures must be taken to safeguard the confidentiality of the data, for example, using data storage that is encrypted or otherwise secured. Further, trade secrets must have certain prescribed characteristics, often possessed by data useful for drug discovery: they must be identifiable, original, not already in the public domain, valuable to the business, valuable to those who do not have it, and difficult to reproduce.

It bears mentioning that any inventions underlying the data such as new ways of encoding or processing data, any advancements in AI/ML models or training methodologies, or any AI-aided discoveries may be subject to patent protection.  

Ownership of Derived Data

In addition to maintaining ownership of original data sets (e.g., data used to train ML models), companies should also consider ownership of derived data, i.e., data derived through computations performed on training data. Derived data may include predictions generated by trained models, the parameters of such models including features and weights, and various forms of intermediate data, all of which may have significant economic value. As such, it is important for companies engaging in collaborations to establish who owns what types of derived data and how derived data may be used, including after the collaboration ends.

Protecting Privacy Interests

Companies should also keep in mind that relevant stakeholders in the data can extend beyond the collaborators to include individuals whose private information might be shared. For instance, when sharing data obtained from clinical trials, pharmaceutical companies should take steps to comply with requirements imposed by legislation aimed at protecting personal data and privacy, such as the GDPR (European Union) and PIPEDA (Canada).

Legal Instruments

Companies interested in collaborations should be aware of legal instruments that can be used to address legal issues surrounding data ownership and privacy, which include:

  • Collaboration Agreements – As the name implies, a collaboration agreement governs how two or more parties work together. The agreement delineates each party’s obligations, including the data that each party brings to the collaboration, the form of that data (e.g., anonymous or pseudo-anonymous), how and where that data can be stored and used, how the data must be handled including confidentiality and privacy obligations and required technological safeguards. Contractual obligations prescribing how data are handled are essential for preserving trade secret protection for the data and complying with privacy legislation. If applicable, the collaboration agreement should also have clauses governing ownership and permitted uses of any derived data generated by the collaborators.
  • Data License Agreements – License agreements are contractual instruments by which an owner of an asset grants another party, i.e., a licensee, the right to use that asset in a prescribed way. In the data licensing context, the asset being licensed is data, either in the form of data sets or data streams. Ownership remains with the owner, and the parties can establish any royalties payable by the licensee. As in the case of a collaboration agreement, a data license agreement should also include clauses governing how data should be handled, and clauses governing the ownership and use of any derived data.
  • Data Trusts – While still in their infancy, data trusts hold the potential to enable sharing of data in ways that are safer, fairer, and more protective of sensitive information. Various formulations of data trusts exist. In one formulation, a data trust is a legal instrument under which a trustee holds data in trust for the benefit of third-party beneficiaries (e.g., collaborators seeking to share data or patients providing private information). In some cases, as with traditional trust instruments, the trustee of a data trust may have fiduciary obligations, including the duty of loyalty, to the beneficiaries. The duty of loyalty would prevent the trustee from utilizing the object of the trust (i.e., the data) to their own benefit or in a manner that is detrimental to the beneficiaries.

Legal instruments should be used in tandem with technological solutions, such as for filtering, storing, and transmitting sensitive information, to provide requisite protection of data. Moreover, as we have seen with the Melloddy project, emerging technologies like federated learning and blockchain can provide new ways for data to be shared safely between collaborators.


Along with the rise of AI/ML use in drug discovery, we are seeing an increase in collaborative sharing of proprietary or sensitive data. ML engines are ushering in a new era of drug discovery, and data useful for training ML engines have become tremendously valuable. Accordingly, companies should be aware that the value of data can be protected in many cases by recourse to intellectual property protection available under Canadian law for copyrights and trade secrets. Additionally, companies should pay careful attention to using appropriate legal instruments and technological solutions that facilitate safe collaborations.

The authors would like to thank students Malcolm Woodside, Roohie Sharma, Nareesa Nathoo, and Alexandra David for their assistance in preparing this update.


[1] CCH Canadian Ltd v Law Society of Upper Canada, 2004 SCC 13 at para 16.