Yogesh Kalakoti
Ph.D. candidate, IIT Delhi
I am a Ph.D. student working in the field of Computational Biology at IIT Delhi under the supervision of Prof. D. Sundar. My work is primarly aimed at developing programs that harness the potential of high-throughpout biomedical data. It aims to elucidate the critical factors involved in the progression and prevention of a disease.
I am a Ph.D. student working in the field of Computational Biology at IIT Delhi under the supervision of Prof. D. Sundar. My work is primarly aimed at developing programs that harness the potential of high-throughpout biomedical data. It aims to elucidate the critical factors involved in the progression and prevention of a disease.
View Resume (Sep' 26, 2022)Year | Degree | Institution | Grade |
---|---|---|---|
2019 - current | Ph.D. Computational Biology | Indian Institute of Technology, Delhi | 8.1/10 |
2018 - 19 | MS Computational Biology | Indian Institute of Technology, Delhi | 8.2/10 |
2014 - 18 | B.Tech. Biotechnology | GB Pant University of Agriculture & Technology, Pantnagar | 7.3/10 |
SELECTED COURSES Molecular Biology, Probablilty/Statistics, Linear algebra, Machine learning
Experienced in most ML architectures and frameworks such as CNNs, reccurent nets, transformers, among others. Additionally, I have experine in CADD such as virtual ligand screening, ligand-based drug design, molecular dynamics and homology modelling
Our findings from the phenotype prediction models reinforce the idea that an integrative approach can make more accurate and personalized decisions for drug administration and improve general treatment strategy. Moreover, at the molecular level, we have demonstrated the effectiveness of NLP-based encoding strategies that are able to extract critical information from sequential data like protein and drug sequences to identify possible leads for a given protein target (and vice-versa)
Further, ML methods have been largely considered as black-box models that lacks interpretability behind the predictions, especially in identifying probable drug-target pairs. Geometric deep learning has evolved as a natural alternative for robust and interpretable models that is being actively pursued in the lab to incorporate interpretability as an essential feature of our prediction models.
The drug discovery pipeline aims at developing an end-to-end scalable framework that could understand the intricate relationships among drug–target pairs and make inferences for interactions among given drugs and targets using appropriate ML architectures. Also, all the in-silico solutions and tools are entirely open-source and are available in GitHub.
Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim.
Background: Lung adenocarcinoma (LUAD) patients majorly tend to poor clinical outcomes. A biomarker or gene signature built using multi-omics dataset along with clinical features that could predict survival in these patients would have a significant clinical impact, enabling earlier detection of mortality risk and personalized therapy.
Methods: To identify a novel multi-omics signature along with clinical features associated with overall survival, we analyzed LUAD patient's single omics datasets for Copy number variations (CNV), protein, methylation, mutation, RNA, mi-RNA that were extracted from The Cancer Genome Atlas (TCGA). Neighborhood component analysis, a feature reduction algorithm was applied to the large feature space for all the single omics data set to select the optimal number of combinations of best feature predictors. These selected features for each singe omics dataset were coupled to integrate multiple inputs and fed into an Support vector machine (SVM), Neural network pattern recognizer and RUS ensemble boost to build the survival prediction model. An external cohort was used to validate the prediction models.
Results: We identified a critical feature space for multi-omics-based integration that could effectively stratify these LUAD patients into our critical survival classes with 92.9% accuracy using our neural network-based model, and receiver operating characteristic (ROC) analysis indicated that the signature had a powerful predictive ability. Moreover, a predictive pipeline was established based on the above signature integrated with clinicopathological features. The performance in terms of prediction accuracy for single-omics data as input for validation was not as good as the performance of our model, as it requires multi-omics data as an input and improves performance accuracy of our classifier. Lastly, the signature was validated by an external cohort from excluded patients retrieved for Group I and II study on our best performing classifier, the neural network pattern recognizer. Conclusion: Finally, we developed a robust multi-omics signature as a self-sustaining factor to effectively classify LUAD patients into two survival classes, i.e., alive or dead with unprecedented accuracy of 92.9%, which might provide a basis for personalized treatments for these patients.
Link to Publication[2019]
Successfully demonstrated that numerical multi-omics data, transformed into latent representations, could identify genetic clusters coregulated in a diseased individual.
Background: Survival and drug response are two highly emphasized clinical outcomes in cancer research that directs the prognosis of a cancer patient. Here, we have proposed a late multi omics integrative framework that robustly quantifies survival and drug response for breast cancer patients with a focus on the relative predictive ability of available omics datatypes. Neighborhood component analysis (NCA), a supervised feature selection algorithm selected relevant features from multi-omics datasets retrieved from The Cancer Genome Atlas (TCGA) and Genomics of Drug Sensitivity in Cancer (GDSC) databases. A Neural network framework, fed with NCA selected features, was used to develop survival and drug response prediction models for breast cancer patients. The drug response framework used regression and unsupervised clustering (K-means) to segregate samples into responders and non-responders based on their predicted IC50 values (Z-score).
Results: The survival prediction framework was highly effective in categorizing patients into risk subtypes with an accuracy of 94%. Compared to single-omics and early integration approaches, our drug response prediction models performed significantly better and were able to predict IC50 values (Z-score) with a mean square error (MSE) of 1.154 and an overall regression value of 0.92, showing a linear relationship between predicted and actual IC50 values.
Conclusion: The proposed omics integration strategy provides an effective way of extracting critical information from diverse omics data types enabling estimation of prognostic indicators. Such integrative models with high predictive power would have a significant impact and utility in precision oncology.
Link to Publication[2021]
Framework aimed towards estimating apatient’s survival as well as response to common cancer drugs.
Background: The utility of multi-omics in personalized therapy and cancer survival analysis has been debated and demonstrated extensively in the recent past. Most of the current methods still suffer from data constraints such as high-dimensionality, unexplained interdependence, and subpar integration methods. Here, we propose SurvCNN, an alternative approach to process multi-omics data with robust computer vision architectures, to predict cancer prognosis for Lung Adenocarcinoma patients.
Results: Numerical multi-omics data were transformed into their image representations and fed into a Convolutional Neural network with a discrete-time model to predict survival probabilities. The framework also dichotomized patients into risk subgroups based on their survival probabilities over time. SurvCNN was evaluated on multiple performance metrics and outperformed existing methods with a high degree of confidence. Moreover, comprehensive insights into the relative performance of various combinations of omics datasets were probed.
Conclusion: Critical biological processes, pathways and cell types identified from downstream processing of differentially expressed genes suggested that the framework could elucidate elements detrimental to a patient’s survival. Such integrative models with high predictive power would have a significant impact and utility in precision oncology.
Link to Publication[2021]
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.
NOTE Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim.
Class | Description |
---|---|
.my-class-1 |
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. |
.my-class-2 |
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. |
.my-class-3 |
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. |
<div class="my-class-1">
<div class="my-class-2">
<div class="my-class-3">
<h3>...</h3>
...
</div>
</div>
</div>
NOTE Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
Lorem ipsum dolor sit amet, consectetuer adipiscing
elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim.
Class | Description |
---|---|
.my-class-1 |
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. |
.my-class-2 |
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. |
.my-class-3 |
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. |
<div class="my-class-1">
<div class="my-class-2">
<div class="my-class-3">
<h3>...</h3>
...
</div>
</div>
</div>
NOTE Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
Made by YOOtheme with love and caffeine.
Licensed under MIT license.