Yogesh Kalakoti

Ph.D. candidate, IIT Delhi

About me

I am a Ph.D. student working in the field of Computational Biology at IIT Delhi under the supervision of Prof. D. Sundar. My work is primarly aimed at developing programs that harness the potential of high-throughpout biomedical data. It aims to elucidate the critical factors involved in the progression and prevention of a disease.

View Resume (Sep' 26, 2022)

Year	Degree	Institution	Grade
2019 - current	Ph.D. Computational Biology	Indian Institute of Technology, Delhi	`8.1/10`
2018 - 19	MS Computational Biology	Indian Institute of Technology, Delhi	`8.2/10`
2014 - 18	B.Tech. Biotechnology	GB Pant University of Agriculture & Technology, Pantnagar	`7.3/10`

SELECTED COURSES Molecular Biology, Probablilty/Statistics, Linear algebra, Machine learning

Skills

Experienced in most ML architectures and frameworks such as CNNs, reccurent nets, transformers, among others. Additionally, I have experine in CADD such as virtual ligand screening, ligand-based drug design, molecular dynamics and homology modelling

Programming languages

Python, MATLAB, R, C++

Machine Learning

TensorFlow, PyTorch, OpenCV, Scikit-Learn

Molecular dynamics

Schrodinger Suites, PyMol, VMD, Chimera, 3D-Slicer

Experimentation

Cell culture, Basic microscopy, HPLC, DNA extraction

Projects

Our findings from the phenotype prediction models reinforce the idea that an integrative approach can make more accurate and personalized decisions for drug administration and improve general treatment strategy. Moreover, at the molecular level, we have demonstrated the effectiveness of NLP-based encoding strategies that are able to extract critical information from sequential data like protein and drug sequences to identify possible leads for a given protein target (and vice-versa)

Further, ML methods have been largely considered as black-box models that lacks interpretability behind the predictions, especially in identifying probable drug-target pairs. Geometric deep learning has evolved as a natural alternative for robust and interpretable models that is being actively pursued in the lab to incorporate interpretability as an essential feature of our prediction models.

The drug discovery pipeline aims at developing an end-to-end scalable framework that could understand the intricate relationships among drug–target pairs and make inferences for interactions among given drugs and targets using appropriate ML architectures. Also, all the in-silico solutions and tools are entirely open-source and are available in GitHub.

Label

Regulatory genomics

Lorem ipsum dolor sit amet

Machine learning

Lorem ipsum dolor sit amet

Structual Biology

Lorem ipsum dolor sit amet

Protein modelling

Lorem ipsum dolor sit amet,

NGS data analysis

Lorem ipsum dolor sit amet

Structual Biology

Lorem ipsum dolor sit amet,

Selected Publications

Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor. Aenean massa. Cum sociis natoque penatibus et magnis dis parturient montes, nascetur ridiculus mus. Donec quam felis, ultricies nec, pellentesque eu, pretium quis, sem. Nulla consequat massa quis enim.

Background: Lung adenocarcinoma (LUAD) patients majorly tend to poor clinical outcomes. A biomarker or gene signature built using multi-omics dataset along with clinical features that could predict survival in these patients would have a significant clinical impact, enabling earlier detection of mortality risk and personalized therapy.

Methods: To identify a novel multi-omics signature along with clinical features associated with overall survival, we analyzed LUAD patient's single omics datasets for Copy number variations (CNV), protein, methylation, mutation, RNA, mi-RNA that were extracted from The Cancer Genome Atlas (TCGA). Neighborhood component analysis, a feature reduction algorithm was applied to the large feature space for all the single omics data set to select the optimal number of combinations of best feature predictors. These selected features for each singe omics dataset were coupled to integrate multiple inputs and fed into an Support vector machine (SVM), Neural network pattern recognizer and RUS ensemble boost to build the survival prediction model. An external cohort was used to validate the prediction models.

Results: We identified a critical feature space for multi-omics-based integration that could effectively stratify these LUAD patients into our critical survival classes with 92.9% accuracy using our neural network-based model, and receiver operating characteristic (ROC) analysis indicated that the signature had a powerful predictive ability. Moreover, a predictive pipeline was established based on the above signature integrated with clinicopathological features. The performance in terms of prediction accuracy for single-omics data as input for validation was not as good as the performance of our model, as it requires multi-omics data as an input and improves performance accuracy of our classifier. Lastly, the signature was validated by an external cohort from excluded patients retrieved for Group I and II study on our best performing classifier, the neural network pattern recognizer. Conclusion: Finally, we developed a robust multi-omics signature as a self-sustaining factor to effectively classify LUAD patients into two survival classes, i.e., alive or dead with unprecedented accuracy of 92.9%, which might provide a basis for personalized treatments for these patients.

Link to Publication

Multi-omics Integration based Predictive Model for Survival Prediction of Lung Adenocarcinaoma `[2019]`

Successfully demonstrated that numerical multi-omics data, transformed into latent representations, could identify genetic clusters coregulated in a diseased individual.

Background: Survival and drug response are two highly emphasized clinical outcomes in cancer research that directs the prognosis of a cancer patient. Here, we have proposed a late multi omics integrative framework that robustly quantifies survival and drug response for breast cancer patients with a focus on the relative predictive ability of available omics datatypes. Neighborhood component analysis (NCA), a supervised feature selection algorithm selected relevant features from multi-omics datasets retrieved from The Cancer Genome Atlas (TCGA) and Genomics of Drug Sensitivity in Cancer (GDSC) databases. A Neural network framework, fed with NCA selected features, was used to develop survival and drug response prediction models for breast cancer patients. The drug response framework used regression and unsupervised clustering (K-means) to segregate samples into responders and non-responders based on their predicted IC50 values (Z-score).

Results: The survival prediction framework was highly effective in categorizing patients into risk subtypes with an accuracy of 94%. Compared to single-omics and early integration approaches, our drug response prediction models performed significantly better and were able to predict IC50 values (Z-score) with a mean square error (MSE) of 1.154 and an overall regression value of 0.92, showing a linear relationship between predicted and actual IC50 values.

Conclusion: The proposed omics integration strategy provides an effective way of extracting critical information from diverse omics data types enabling estimation of prognostic indicators. Such integrative models with high predictive power would have a significant impact and utility in precision oncology.

Link to Publication

Deep learning assisted multi-omics integration for survival and drug response prediction in breast cancer.`[2021]`

Framework aimed towards estimating apatient’s survival as well as response to common cancer drugs.

SurvCNN: a discrete time-to-event cancer survival estimation framework using image representations of omics data.`[2021]`

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Publications

NOTE Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.

Customization

Class	Description
`.my-class-1`	Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
`.my-class-2`	Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
`.my-class-3`	Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.

Example

Title

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Title

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Title

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Markup

<div class="my-class-1">
    <div class="my-class-2">
        <div class="my-class-3">
            <h3>...</h3>
            ...
        </div>
    </div>
</div>

NOTE Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.

Troubleshooting

Class	Description
`.my-class-1`	Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.
`.my-class-2`	Lorem ipsum dolor sit amet, consectetuer adipiscing elit.
`.my-class-3`	Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.

Example

Title

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Title

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Title

Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.

Markup

<div class="my-class-1">
    <div class="my-class-2">
        <div class="my-class-3">
            <h3>...</h3>
            ...
        </div>
    </div>
</div>

NOTE Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Aenean commodo ligula eget dolor.

About me
Projects
Awards and achievements
Selected publications
Presentations
More
Customizaton
Troubleshooting
FAQ

Made by YOOtheme with love and caffeine.
Licensed under MIT license.

Yogesh Kalakoti

About me

Skills

Programming languages

Machine Learning

Molecular dynamics

Experimentation

Projects

Regulatory genomics

Machine learning

Structual Biology

Protein modelling

NGS data analysis

Structual Biology

Selected Publications

Multi-omics Integration based Predictive Model for Survival Prediction of Lung Adenocarcinaoma

Multi-omics Integration based Predictive Model for Survival Prediction of Lung Adenocarcinaoma [2019]

Deep learning assisted multi-omics integration for survival and drug-response prediction in breast cancer

Deep learning assisted multi-omics integration for survival and drug response prediction in breast cancer.[2021]

SurvCNN: A Discrete Time-to-Event Cancer Survival Estimation Framework Using Image Representations of Omics Data

SurvCNN: a discrete time-to-event cancer survival estimation framework using image representations of omics data.[2021]

Publications

Customization

Example

Title

Title

Title

Markup

Troubleshooting

Example

Title

Title

Title

Markup

Multi-omics Integration based Predictive Model for Survival Prediction of Lung Adenocarcinaoma `[2019]`

Deep learning assisted multi-omics integration for survival and drug response prediction in breast cancer.`[2021]`

SurvCNN: a discrete time-to-event cancer survival estimation framework using image representations of omics data.`[2021]`