In force

Protein Biomarkers Metabolon platform for metabolomics analysis

Principal investigator
K. Suhre
Year approved
2017
Status
Completed
Themes
Growth Hormone (GH)

Project description

Code: ISF17SC01KS

Serum samples from 35 individuals (25 males, 10 females) who were administered Human Growth Hormone (hGH) or placebo for 3 weeks and followed for a total of 4 weeks pre-administration, the 3-week administration period and a further 6 weeks post-administration will be analyzed on the SOMAscan biomarker discovery platform for the detection of potentially novel biomarkers of hGH abuse.

Deliverables. The primary aims of this study were the following:

• Identify proteins that respond to hGH treatment.

• Identify proteins that distinguish the hGH treated samples from the untreated samples.

a. The Deliverables provided to Client will include:

i. Normalized and, if applicable, calibrated data, measured in Relative Fluorescence Units (RFU), for each of the analytes organized on a per-sample basis, delivered in an Excel-compatible format;

ii. For each marker that reaches significance threshold: boxplots with longitudinal profiles (line plots) for individuals

iii. Spreadsheet (CSV) containing model statistics for all markers tested ordered by significance of fixed effect for time.

The following analyses are proposed:

Analysis 1: Identify protein biomarkers that show a time- and dose-dependent change in protein signal in response to hGH treatment. Baseline, During and Post treatment logRFU will be fit with mixed models to account for the correlated structure of the time series data and to evaluate the fixed effects of time. Pair-wise comparisons (to baseline) based on difference in least squares means within significant fixed effects will be performed to identify specific time-dependent differences.

Additional Analysis to be Provided:

Building of classification models for selection of proteins that exhibit “stable” differential expression between treatment groups and over time of study protocol.

Main findings

We reported quantitative blood circulating levels for over 1,300 proteins in serum from 35 volunteers, sampled at 22 time points. The study includes controls and three levels of doping, and samples were collected at baseline, during the doping phase and in a follow-up period. We identified 66 proteins that displayed strong association with doping. We characterized these proteins with respect to the influence of diurnal variation, sex, and genetic background of the study participants and computed multiple ROC curves for each protein, depending on the targeted period and dosage. We then built multi-variate classification models and showed that models with an AUC of up to 93.3% can be constructed using a Random Forest approach with 66 proteins, and 82.2% using glmnet.lasso with stability selection based on five protein markers alone. Furthermore, we evaluated the 66 proteins on their potential as a doping biomarker based on the requirement that they do not identify any false positive, using the percentage of doped individuals that have been identified at least once as a quality criterion. Finally, we tested the performance of selected sets of composite markers. Using 5 markers and aiming at a zero false positive rate, around 25-35% of all doped samples could be identified in a real-world scenario. In a second multi-marker analysis we evaluated the performance of six multi-marker models under more realistic conditions. Table 4 provides the top performing models, as a function of the selection criteria. The provided supporting data and plots can now be used to further develop and select the most suitable candidate markers for development of targeted markers.