Identifying copy-neutral loss-of-heterozygosity in the presence of runs of homozygosity
Introduction
Mosaic chromosomal alterations (mCAs) are structural variants known to drive the clonal expansion of mutated clones. Although these have been largely associated with aberrant leukocyte cell count and increased risk of hematological and lymphoid malignancies, mCAs can be present in any tissue. The acquisition and survival of mCAs can potentially lead to a cancerous state or provide evidence for the early manifestation of cancer in individuals. Loss-of-heterozygosity (LOH), a well-studied phenomenon in cancer, involves the deletion of one or part of the inherited chromosomes, leading to the inactivation of tumor-suppressor genes. Copy-neutral loss of heterozygosity (cn-LOH) confers no change in the normal diploid copy number and has been well-studied in cancer. Currently, we are capable of detecting cn-LOH in very low-tumor purity samples. Yet, when we reach a very high mutant cell fraction (MCF) level (>90-95%), cn-LOH becomes almost indistinguishable from constitutive runs of homozygosity (ROH). Given our increased understanding of the prevalence of mCAs in normal tissue, differentiating ROH from cn-LOH has additional value beyond mCA detection since ROH is used in human genetics for mapping germline risk variants and studying human demographic or evolutionary history. Differentiating these presentations of homozygosity is thus an open problem worthy of study.
Methods
We scanned for putative ROH using PLINK. We first simulated high-MCF cn-LOH to assess a statistical framework to differentiate high-MCF cn-LOH from ROH. We took a hypothesis testing approach, attempting to identify two groups of markers within an observed ROH, using a null hypothesis of ROH and an alternative as high-MCF LOH. The two groups are those we would expect under the alternative to be 1) true homozygotes, and 2) germline heterozygotes that appear as homozygotes due to LOH. We consider several ways to identify these two groups, including population allele frequencies since markers with two copies of a rare allele would be candidates for group 2. We then test for differences in “B allele” frequencies between these groups, under the assumption that markers from group 2 will show greater deviations from zero or one that will markers from group 1.
Results
We analyzed allele SNP array data previously obtained from genotyping of blood- and saliva-derived DNA from an open cohort of Mexican Americans (n=329) in the ages of increased risk of developing cancer and other chronic diseases. Preliminary results show abundant regions of homozygosity throughout the genome. A majority of the scanned homozygosity regions are short in size (<10 Mb), but we have also detected regions surpassing 100 Mb. Simulation studies have shown a good performance of classification in larger regions of homozygosity (>10 Mb), which has made us prioritize our efforts on the evaluation of shorter regions. Preliminary results using high-MCF cn-LOH simulated samples demonstrated excellent performance in differentiating our groups in large events (AUC=0.97) and moderate performance for short events (AUC=0.72) when incorporating probabilities to belong to either group. Extreme MCFs (0.99-1) and short events showed poor performance under all conditions (AUC=0.67).
Conclusion
We believe most of these regions could be consistent with ROH as they are short in size and were found in a context of a population of admixed origin which is typically characterized by them. Yet larger regions suggest acquired cn-LOH events, which might be relevant to cancer risk. We are interested in continuing to characterize and validate our detected cn-LOH and ROH events in the context of hematological cancer, especially in pathologically normal samples as a way of identifying potential biomarkers of interest.