DTM E4. Drug Discovery using AI/ML – Dr. Sebastian Raschka
From the Deep Tech Musings Podcast - Get actionable and tactical insights to take your Deep Tech startup from 0 to 1 [Idea to Traction]
Listen now on - Spotify, Apple, Google
Dr. Sebastian was the Assistant Professor of Statistics at the University of Wisconsin-Madison and is currently the Lead AI Educator at the very popular Lightning AI. He is also a bestselling author of the bestselling books like Python Machine Learning book, Machine Learning with PyTorch and Scikit-Learn and the very recent Machine Learning Q and AI. He is also the creator of the popular "mlxtend" python library. On the show, he talks about his research article highlighting how ML & AI approaches are being used for drug discovery, something quite prevalent & impactful in the COVID world.
Original episode air date - August 2020. Listen to the episode here,
or on the below platforms
Social Links:
Sebastian Raschka – Sebastian Raschka (LinkedIn), @rasbt (Twitter), sebastianraschka.com (Website)
Pronojit Saha, DTM Podcast - pronojitsaha (LinkedIn), @pronojits (Twitter)
Episode post on twitter - Link
Show Notes & Summary:
(1:23) An overview of Sebastian’s latest article on drug discovery
His approach with respect to ML and AI for drug discovery was published in Archive and Science Target in the journal Alzheimer's.
The article discusses drug and ligand discovery, focusing on small molecule drug discovery.
Ligands are small chemicals that bind to proteins in the body and can have various effects on protein function.
The example of coffee is used to explain how a small molecule (caffeine) can bind to a protein receptor (adenosine receptor) and block the effects of another small molecule (adenosine), preventing sleepiness.
The process of drug discovery involves finding molecules that bind to proteins and have desired effects, such as blocking receptors or activating proteins.
There are vast numbers of molecules to consider, so machine learning and deep learning methods are being explored to improve the efficiency of the discovery process.
(7:07) How to discover and design molecules: Fingerprint Vectors
The traditional approach is to use fingerprint vectors, which are fixed-size binary vectors consisting of zeros and ones.
Fingerprint vectors are generated using a black box function or with input from a domain expert who identifies specific substructures, atom types, or chemical groups in the molecule.
The choice of fingerprint vector size involves a trade-off between avoiding collisions and dealing with high dimensionality in machine learning.
Advanced methods exist beyond fingerprint vectors, but the focus in this discussion is on beginner-friendly approaches, with the option for interested listeners to explore more advanced methods in the associated paper.
(13:55) What are some other approaches that Sebastian has explored: Smile Strings
Sebastian discusses alternative approaches to representing molecules, such as smile strings.
Smile
dstrings are a string representation of a molecule that captures the 2D structure and stereochemistry.
Fingerprint vectors are not invertible, meaning you cannot go back to the original molecule from the fingerprint. Smiled strings, on the other hand, can be encoded back into the molecule, preserving all the information.
Smile strings are concise, capturing all the atom connectivity and information about the molecule in a compact representation.
Sebastian also mentions a paper by Zhu and Al that proposes using RNNs to learn fingerprints from smile strings, creating invertible fingerprints. They use an unsupervised approach to generate fingerprints and then employ traditional machine learning methods to predict certain properties based on the fingerprints.
(21:06) Details into the ML model: Autoencoders
Traditional machine learning methods were trained on fingerprint vectors to predict solubility, but the approach can be applied to other modeling tasks as well.
A more advanced approach discussed in a paper by Gomez and his colleagues involved using an autoencoder to generate new molecules with desired properties.
The autoencoder used a latent space where molecules with similar properties were closely packed together.
By sampling from the latent space, molecules with specific structures or properties could be generated.
The autoencoder approach is conceptually similar to matrix factorization and principal component analysis (PCA).
(27:03) How to evaluate the model's effectiveness and interpret the results?
The mentioned methods were evaluated against existing methods and outperformed them in terms of predicting solubility.
They also assessed the methods in terms of predicting synthetic accessibility and drug likeness.
The fingerprint and smile string approaches are considered workarounds as they cannot fully capture the 3D structure of molecules.
One limitation is that the methods only focus on the ligand and do not consider the protein it binds to, which is important for accurate predictions.
The use of graph neural networks is an advancement that can naturally process molecules represented as graphs, but incorporating protein information remains a challenge.
(33:13) Advice on how to get your papers accepted to journals
To increase the chances of getting papers accepted, it is important to have convincing experiments, a good story, and visually appealing figures.
Figures play a crucial role in explaining new or complex concepts and make it easier for readers to understand.
Making the paper look professional by following journal recommendations, including all required sections, and adhering to the formatting guidelines is important.
It is essential to be polite when responding to reviewer feedback and consider incorporating their suggestions if they make sense.
Inspiration for new ideas can come from looking at existing methods or literature in different fields and finding ways to improve or apply them to different domains. Cross-pollination between fields, such as applying deep learning to computational biology problems, can also lead to new ideas.
If you enjoyed this episode, please leave us a rating on Spotify, Apple, or wherever you listen to podcasts. It helps us reach more people who are interested in deep tech & grow the community.
Also, don’t forget to subscribe to Deep Tech Musings Podcast on pronojits.substack.com so you never miss an episode.
Thank you for listening! See you next time!