Student research project
The Metabolomics laboratory uses state-of-the-art tandem mass spectrometry to obtain metabolic/lipid profiles from cell and animal models in addition to clinically relevant human samples to develop new approaches to diagnosis, risk assessment and therapy for diabetes and cardiovascular disease.
The recent advent of high dimensional plasma lipidomes has proven to be a boon for lipid association analyses with clinical outcomes of interest. It also opens up previously imponderable data analysis questions, such as “How can we impute missing values?”, “How can we extrapolate the lipid levels from older datasets where they were not measured?” and “What is the minimal set of lipids we actually need to measure?”.
We hypothesise that the informational richness in these datasets will enable more advanced and accurate methods of data imputation and reduction, providing greater statistical power, increased dataset comparability, and more clinically-relevant choices of lipids to measure.
The project will contribute to various facets of this ongoing research to a level in adequacy with their skills and interests. The overarching aims include:
- Investigate existing statistical methodologies for missing value imputation [e.g. references 1–3], trialling them on several datasets available in the lab (pertaining for example to cardiovascular disease, obesity, diabetes, or Alzheimer’s disease), and inscribing one or more into the lab’s current data analysis pipeline.
- Extend the previous to the extrapolation of lipid levels for lipids not measured in earlier datasets.
- Investigate statistical methods for dataset reduction and their application to determining a minimal set of clinically-relevant lipids, and determining if these are pathology-specific, or globally applicable.
- Contribute to the resulting papers.
- If available, interface with our other student project on correlational analyses in lipids.
The student must be comfortable with the general notions behind frequentist multivariate statistics, and have some programming experience, preferably using R. Basic knowledge in organic chemistry and metabolism would be very helpful.
This project is suitable for a Masters, Honours or PhD student and will use our laboratories novel lipidomic approach to generate lipid profiles from patient cohorts at different stages of disease to identify those lipids and lipid profiles that are specifically associated with disease onset and progression.
- Multiple imputation by chained equations: what is it and how does it work? Int J Methods Psychiatr Res 2012.
- Missing value estimation methods for DNA microarrays Bioinformatics 2001.
- MIDA: multiple imputation using denoising autoencoders Advances in Knowledge Discovery and Data Mining, PAKDD 2018.