AI and Drug Discovery, Part 2: In-Depth Look at the Mellody Project
Part of Artificial Intelligence and Drug Discovery – A Three Part Update
In Part 1 of our 2019 AI and Pharma update, we looked at how AI is being deployed in the pharmaceutical space both pre-and post development. We also highlighted some of the recent pharma-AI collaborations. In this part, we are going to look at an historic cross-competitor collaboration in the EU that is also leveraging blockchain technology to share large amounts of data to improve predictive models in drug discovery.
One of the largest and most exciting partnerships in 2019 in this space is the Machine Learning Ledger Orchestration for Drug Discovery project (Melloddy) that launched on June 1, 2019 in the EU. Melloddy is a collaboration between:
- 10 pharmaceutical companies: Amgen, Astellas, AstraZeneca, Bayer, Boehringer Ingelheim, GlaxoSmithKline, Janssen Pharmaceutica NV, Merck KgaA, Novartis, and Institut de Recherches Servier;
- 2 academic universities: KU Leuven and Budapesti Muszaki es Gazdasagtudomanyi Egyetem;
- 4 subject matter experts: Owkin, Substra Foundation, Loodse, and Iktos; and
- 1 AI computing company: NVIDIA
Each of the 10 pharmaceutical companies involved is providing access to data from their extensive chemical libraries amounting to approximately 10 million small molecules. This library consists of hundreds of terabytes of image data.
But how do 10 competitors share valuable, commercially sensitive, and proprietary information without compromising their own competitive edge?
Melloddy prevents the transfer of each partner’s proprietary information through a new type of machine learning that intersects with blockchain: federated learning.
Traditional machine learning algorithms are trained on data centralized in one location or data centre. Federated learning trains predictive models on data that is stored in multiple locations. The purpose is to expand the data available to a machine learning model while maintaining privacy, anonymity, and integrity of any individual user’s data. In a federated learning model, the data never leaves the owner’s infrastructure, but the machine learning algorithm can train on multiple sets of data.
More data means both faster development and more accurate predictive ability.
The best analogy (and indeed, the typical use-case for federated learning) is training a machine learning model across a network of smart phones. The model trains on data locally stored on the phone. It then summarizes the changes as an update to the model which is then sent to a decentralized server (e.g., cloud storage), where it is integrated with updates from training exercises from all of the other smart phones in the network. The only thing leaving any individual’s device is a summary of how the model was updated. The user’s data always remains on the user’s device and remains anonymous.
The model of federated learning deployed in the Melloddy project is enabled by a blockchain architecture provided by Owkin. Little is known publically about the inner workings of this blockchain; however, if it follows a federated learning model, it is likely that the machine learning algorithms are permitted “on-site” access to each competitor’s data to train the model. As the model learns, these adaptations are then summarized as updates and recorded on a transparent, decentralized ledger distributed across all members.
Since the ledger is distributed, there is no central authority and any interaction between the algorithm and the data needs to be approved by all partners and logged in the ledger before it can proceed. Once complete, the interaction is logged. In keeping with principles of blockchain, although the interaction between the model and the data itself is visible to all, no one can reverse engineer the interaction to recreate the underlying data that was accessed.
Translation: The blockchain ledger logs all of the algorithm’s access and training activities thus providing each company complete visibility over the use of its data; however, only the AI algorithms can directly access any company’s data and no company can access data of another company.
The result of this process is that Melloddy will be able to leverage over one billion sensitive data points across competitors, without compromising the competitive edge the data may provide to these companies. This in turn will accelerate the development of more accurate predictive models for identifying promising compounds, which all companies can benefit from. Indeed, after the Melloddy project concludes on May 31, 2022, it plans to prepare a fee-for-service price model to eventually make the technology available to the rest of the pharmaceutical industry.
As AI and blockchain technologies continue to develop, we will undoubtedly see a growing number of partnerships between technology and pharma companies. This innovative approach exemplifies a desire for faster and better advances in the life sciences space. Melloddy demonstrates a new era of complex collaborations which will drive better healthcare outcomes and further pharma company pipelines.
The authors would like to thank students Malcolm Woodside, Roohie Sharma, Nareesa Nathoo, and Alexandra David for their assistance in preparing this update.