Phenotype preprocessing for a Canadian genome-wide association study
Project with Lloyd Elliott and Lin Zhang
The HostSeq project curates a database of 10,000 human genomes, phenotypes and outcomes for 10,000 Canadian. HostSeq has been used to study the host genetics of COVID-19. (While COVID-19 is caused by the virus SARS-CoV-2, through an aspect called host genetics, human genetic variation can modulate severity of the disease, or susceptibility to infection.) This project aims to allow HostSeq results to be included in global meta-analysis for diseases other than COVID-19. To this end, we must preprocess the phenotype data and form mappings from the surveys and electronic health records included in HostSeq, to the standard ICD-10 (a formal coding of disease). We will also explore operationalizing a long COVID-19 variable using HostSeq, and forming a polygenic risk score for COVID-19, and associating this score with outcomes.