Using Machine Learning to Predict Cell-Type Specific Effects of Genetic Variants Which Influence Genome Regulation

Project Description

Applications are invited for a 3.5-year PhD studentship at the UK DRI Centre at Imperial College London in the research group of Dr Nathan Skene. The studentship is fully funded, providing tuition fees (Home/EU rate) and a tax-free stipend. The project is available with an immediate start date and we anticipate the successful candidate will take up the studentship before March 2021.

The project is based at Imperial’s new White City Campus, the Hub for convergent research. The candidate will be part of the UK DRI, working in the neurogenomics lab and under the supervision of Dr Nathan Skene.

The UK Dementia Research Institute

There are currently around 850,000 people with dementia in the UK. This is projected to rise to 1.6 million by 2040. The UK DRI at Imperial has been established to address a medical research area of the highest importance and future impact. As one of seven national centres of excellence embedded in major UK universities, we intend to transform the diagnosis, treatment and care of people with dementias. The Medical Research Council and charity partners the Alzheimer’s Society and Alzheimer’s Research UK have invested £290m in fulfilment of the ambition identified in the Prime Minister’s 2020 Challenge on Dementia.

The laboratory

The programme of Dr Nathan Skene is focused on the identifying the cell types, time points and regulatory mechanisms acted on by genetic variants associated with neurodegenerative diseases. The lab develops statistical methods to integrate single cell genomic data with genome-wide datasets on the genetic associations with brain disorders.


Alzheimer’s has a twin heritability of 79% indicating that genetics plays a significant role in the disorder. A major challenge in biology is to understand how genetic risk factors drive disease: because of the large number of genes now understood to be involved, the neurodegenerative diseases are now considered ‘complex traits’. While genetic studies have only identified 29 variants which are genome-wide significantly associated with the disorder, it is now recognized that almost all variants in the genome will affect disease risk to some degree (given sufficient power). To understand complex diseases, we then need to have good predictions for the functional role of millions of genetic variants on hundreds of regulatory factors (e.g. transcription factors, histone modifiers etc.). Machine learning techniques, such as long- short- term memory recurrent neural network and decision trees have shown promising results in their ability to do this.

The Project

This PhD project is focused on using machine learning techniques to develop novel classifiers for predicting how changes in DNA sequences alter genomic regulatory features. Many regulatory proteins recognise particular DNA sequences known as motifs, for instance, EcoRI only binds to GAATTC. DNA sequences can be converted into a machine interpretable format, using one-hot encoding. The candidate will use publicly available and inhouse datasets of genomic regulatory features to train models. Machine learning techniques will be used to predict the cell-type specific regulatory effects of genetic variants. We will provide several true-positive datasets, wherein the effect of genetic mutations on particular regulatory features has been measured. These will form validation datasets to evaluate how well the trained classifier works. We are interested in how improvements in the machine learning approach (e.g. use of transfer learning, recurrent attentional networks or graph convolution networks) can be used to improve upon existing methods. The candidate will use these techniques to identify causal pathways and candidate drug targets for neurodegenerative diseases.

The candidate will be encouraged to participate in the Turing Institute’s enrichment scheme and to build collaborations through the DEMON (Deep Dementia Phenotyping) network.

Funding Information

The award is for 42 months (full time) and covers course fees (Home/EU rate) and a tax-free stipend of £19,000 rising to £20,500.

Eligibility Requirements

Applicants must hold (or obtain by October 2020) a First Class or an Upper Second Class degree (or equivalent overseas qualification) in a quantitative discipline, such as mathematics, statistics, computer science or engineering. Imperial would normally expect successful applicants to hold or achieve a Master’s degree in a related field.

Prior experience with programming is essential, but no experience with biology is necessary. Experience using machine learning methods will be beneficial.

Application Process

For informal enquiries please contact Dr Nathan Skene ([email protected]). For application, please send a full CV, stating your nationality, and the full contact details of two academic referees to [email protected].

We regret that due to the large volume of applications received, we are only able to notify those shortlisted for interview. Applications will be considered until November 2020.

To apply for this PhD, please email

Before sending your email, please double check you have followed all guidelines in this listing and have included a reference number if asked to do so.