Gopal Krishna Ramadas Dhondalay PhD
Data Scientist / Bioinformatician / Computational Biologist / Human Geneticist
Project | 18
---------------------------------------------------------------------------------------------
Integrated plasma proteomic and single-cell immune signaling network signatures demarcate mild, moderate, and severe COVID-19
Summary
The biological determinants underlying the range of coronavirus 2019 (COVID-19) clinical manifestations are not fully understood. Here, over 1,400 plasma proteins and 2,600 single-cell immune features comprising cell phenotype, endogenous signalling activity, and signalling responses to inflammatory ligands are cross-sectionally assessed in peripheral blood from 97 patients with mild, moderate, and severe COVID-19 and 40 uninfected patients. To achieve these aims, we incorporated the LASSO model with a stack generalisation model, comparing LASSO regression with different regression strategies, including: ordinary least square regression, random forest regressor, elastic-net regression, and support vector machine regression. For univariate analysis, we chose ranked regression analysis for each feature relative to the severity of the patient at the time of sampling using a Spearman correlation test. Multi-omic factor analysis (MOFA) was applied simultaneously across plasma (Olink) and the single-cell proteomics data. For the multivariate analysis, a LASSO model was trained independently on each omics dataset independently using the caret and glmnet packages. An extreme case of the k-fold cross-validation called leave-one-out cross-validation (LOOCV) was also used. To characterise our predictions' separability in a multi-class setting, we used a combined metric of the area under the receiver operating curve (AUC) in both the training and validation models. We used a t-SNE layout for the visualisation of all the features. A post-hoc linear regression analysis was used as a statistical method to exclude the likelihood that certain clinical or demographic variables confounded the predictive accuracy of the severity model. We utilised Reactome for identifying biological pathways. Correlation analysis of bootstrap-selected features (10%) with “time since symptom onset” was calculated with a generalised additive model (GAM) using the R package “mgcv”. P-values of smooth terms for Mild, Moderate, and Severe groups were corrected for false discovery per data layer with the Benjamini-Hochberg method.
A publication in the journal of Cell Reports Medicine was published related to this project.