Evaluation of a predictive AI algorithm for imaging probe design and development
Introduction
Artificial intelligence (AI) has grown in many fields allowing the generation of algorithms to aid in the lengthy process of drug discovery. Such efforts in drug discovery are still at early developing stages for application purposes. Very few reports of work have challenged these algorithms with a real scenario of drug discovery. Originally developed for drug discovery, the algorithm, REINVENT1, predicts a variety of physical-chemical properties of drug candidates. In this work, we evaluate it to create an “automatic” workflow with the goal to minimize the time consuming steps of trial and error that are throttling the throughput of imaging probe development. For proof-of-concept, we focus on the prediction of binding affinities of molecules to stearoyl-CoA desaturase-1 (SCD-1), which is overexpressed in many diseases, including cancer.2 For SCD-1-targeted positron emission tomography (PET) imaging, we have selected a lead molecule for radiolabeling with 18F.
Methods
The generative model of REINVENT uses reinforcement learning based on a recurrent neural network (RNN), the algorithm contains a long-short term memory (LSTM) and a gated recurrent unit (GRU) cells trained on ChEMBL data with canonical SMILES. In addition, REINVENT has a vast list of scoring functions which weights can be widely customized according to the user needs. The components for the scoring of REINVENT include: predictive property, tanimoto similarity, jaccard distance, matching substructure, custom alerts, Quantitative Estimate of Druglikeness (QED) score, molecular weight, topological polar surface area (TPSA), rotatable bonds, number of hydrogen bond donors, number of rings, and selectivity. The main challenge set for this algorithm was to produce diverse chemical structures using the scoring components of predictive property and tanimoto similarity to evaluate: 1) Structural uniqueness (percentage of unique SMILES), 2) Validity (Percentage of unique SMILES strings generated), 3) Novelty (Percentage of unique and valid molecules that were not included in the training set) and, 4) performance of the generative model when trained with small datasets (which is more reflective of reported receptors binding datastets). T- Distributed Stochastic Neighbor Embedding (t-SNE) analysis was used for visualization in 2D.
Results
Diversity distribution analyses were done for the 500,000 molecules with binding towards SCD-1 proposed by the algorithm. Based on the aforementioned criteria, the validity of the structures scored 100% and diversity values using Murcko scaffold evaluation scored at 97%; the scoring of the model was 50%. For imaging probe development, we chose one lead molecule, Molecule 1 (M1), identified for SCD-1 (pIC50 = 7.1, molecular weight = 354.17 g/mol, and logP = 3.58). The synthesis of M1 was achieved by reacting 4-hydroxybenzamide with 1-bromopropane with a yield of 64%, followed by the reaction with 1,3-dichloro-propan-2-one to form the oxazole derivative with a yield of 15%. For the reference standard preparation, the oxazole derivative was further reacted with 4-fluorophenethylamine with a yield of 5%. The synthesis of the precursor to [18F]M1 was accomplished by reacting the oxazole derivative with 4-bromophenethylamine to incorporate a pinacol boron ester to the aromatic ring in the last step with a yield of 2%.
Conclusion
Our preliminary data obtained for M1 demonstrated the ability of the predictive AI algorithms to produce synthetically valid structures. The scoring of the model was 50% showing the impact of having a small training dataset. The distribution analysis of the diversity of the structures is high in numbers because the score is dependent on the topology of the molecule rather than atomic analysis, which limits the actual novelty of the chemical structures. Future work includes the radiolabeling of M1 and biological evaluation for PET imaging of SCD-1. Also, evaluation and comparison towards SCD-1 of our own reported algorithm will be performed.3