Exome Sequencing to Identify Rare Mutations Associated with Breast Cancer Susceptibility


Background – Breast cancer predisposition has been known to be caused by hereditary factors. New techniques particularly exome sequencing have allowed/ helped us to identify new and novel variants that exhibit a phenotype.

Method – In this review we discuss the advantages of exome sequencing and how it could help in understanding the familial breast cancer. In particular, we will discuss about the studies by Noh et al.(1), Thompson et al.(2), and Kiiski et al.(3), on how they have identified new variants that could be a potential candidate for breast cancer susceptibility.

Result- Noh et al.(1), performed whole exome sequencing and by comparing the result using three different algorithms, they identified 7 new mutations, on a Korean family with non-BRCA breast cancer, which could be a potential candidate for high risk of familial breast cancer. Thompson et al.(2), illustrated by using intra-family exome sequencing, that FANCC and BLM genes could be potential breast cancer susceptibility genes as mutation in these genes leads to high risk of breast cancer in European population. Using exome sequencing Kiiski et al.(3), demonstrated that a nonsense mutation in FANCM, a DNA repair gene, could lead to a potential breast cancer predisposition in Finnish population with Triple Negative Breast Cancer (TNBC).

Conclusion: This is a compiled literature of exome sequencing where authors have used whole exome sequencing to identify different novel variants that are associated with breast cancer predisposition.

Keywords: Exome sequencing, breast cancer, BRCA1, BRCA2, Fanconi Anemia, variant calling, familial disease


Breast cancer is a heterogenous disease that is characterized by different types and subtypes. Breast cancer has become the leading cause of death among women and is exponentially increasing with time (4). According to American Cancer Society, breast cancer accounts for 30% of the new cancer diagnoses in women and about 62,960 cases of female breast carcinoma are expected to be diagnosed in 2019 (4). Moreover, for an American woman, the lifetime risk of developing breast cancer is 12.4% or 1 in 8 (4). Successful breast cancer treatment depends on the timely diagnosis with early detection. Diagnosis typically begins by looking at the symptoms, followed by taking the help of the imaging technology and finally get confirmed by histopathology of biopsy. These methods suffer due to high costs and lower sensitivity.

It is commonly believed that cancer arise due the mutations in the dividing cells, and thus genetically damaging the cell growth and affects the hormone production. Changes in the hormone receptors, Estrogen Receptor (ER), and Progesterone Receptor (PR) in women has been associated to the multiple molecular types of cancer (5). For example, Triple Negative Breast Cancer (TNBC) type or basal-like type are negative for ER, PR and HER-2 (human epidermal growth factor receptor 2) (6). Around 15-20% of the breast cancers are triple negatives and often are aggressive and have poor prognosis (7). On the other hand, Luminal A type cancer types are positive for ER and PR and negative for HER-2 (8). These cancers have the best prognosis, high 5-year survival rates, and low recurrence rates within overall breast cancer types. Conversely, Luminal B type cancers are negatives for ER and PR and positive for HER-2 and have poor prognosis (8).

It has been shown that BRCA1 and BRCA2 germline mutations are associated with breast cancer (9). But the identification of these genes through genetic testing are helpful in only 50% of the dominant breast cancer families (10). Other breast cancer genes such as BRIP1, ATM, CHEK2, and PALB2, are also associated with relative risk of breast cancer (11). Genome-wide association studies (GWAS) have identified common low-penetrance to high-penetrance breast cancer susceptibility genes (12–14). However, only one-third of the familial risk of breast cancer is explained by these genes.

Over recent years, the application, and advances of sequencing techniques helped immensely to understand breast cancer, which is majorly intermittent, but 30% are reportedly caused by hereditary factors (15). Few of the high-risk genes are BRCA1, BRCA2, PTEN, RAD51, PALB2, CHEK2 and TP53, and mutation in these genes increases the susceptibility of causing breast and ovarian cancer (16–21). Discovery and identification of new variants in high-risk genes would also help in the improved and profound understanding of the susceptibility of breast and ovarian cancer.

In the past, immense studies have done to perceive the root cause of breast and ovarian cancer, and technologies like microarray and whole genome sequencing helped us understand it better. In the present time whole exome sequencing is preferred over microarray and whole genome sequencing (WGS), due to easy data handling, cost-effective and higher quality data is produced. Whole exome sequencing is useful to detect novel disease-causing variant as only exome (protein coding region) is sequenced which consist of 1% of the whole genome and approximately 85% of the disease-causing variants are exist in the exomic region (22).

The three different platforms which have majorly been used for exome capture are Agilent, Illumina and NimbleGen (23–27). RNA is used as a probe in Agilent platform whereas DNA is used as a probe in NimblGen and illumina (23, 24, 27). 55-105 bp DNA probes are overlapped in NimbleGen platform and 114-126 bp RNA probes adjacent to each other are used for target region coverage, whereas small 95 bp DNA probes are used in Illumina with gaps in-between the target region(23–26). Agilent and NimbleGen use ultrasonication for the fragmentation of genomic DNA whereas for Illumina ultrasonication or transposon is used for this purpose(23–25). NimbleGen has comparatively high specificity and sensitivity and consistent coverage even in GC rich regions, though it produces more duplicate reads and alignment rate is also low (23, 25). Agilent on the other-hand gives least duplicate reads and has high alignment rate and good in recognizing indels (23–25). Illumina is better in identifying UTRs and miRNAs and has low target efficiency (24, 27).

Early studies done by Easton et al.(28), showed that factors responsible for breast cancer predisposition are family history and germline mutation, which is observed in approximately 20% and 5% of the breast cancer patients respectively. Although immense studies have been done demonstrating factors responsible considerably in breast and ovarian cancer, a lot more remains unknown.

Sequencing protein-coding regions would potentially benefit and play a major role in identifying novel variants, and therefore, exome sequencing could be a promising strategy to identify new genes/variants which could be a potential candidate for breast cancer susceptibility.

In this review we will discuss the previous approaches (exome sequencing) used by Thompson et al.(2), Kiiski et al.(3), and Noh et al.(1), to understand the genetic preposition of breast cancer and new variant detection. We will also discuss the future prospects and strategies that could enhance the better understanding of the disease.

Noh et al. (1), performed exome sequencing on three Korean sisters with breast cancer and mother with no cancer, although they were tested negative for BRCA1 and BRCA2 mutations. After comparing with three different algorithms (SIFT, PolyPhen-2 and MutationTaster), they identified 7 new variants (XCR1, DLL1, TH, ACCS, SPPL3, CCNF, and SRL) which could be a potential candidate for high risk of familial breast cancer.

Thompson et al. (2), identified unique mutations in FANCC and BLM gene, which are DNA repair genes, in 6 families out of 438 with previous history of breast cancer, by using intra-family exome sequencing approach. They concluded that these genes could be potential breast cancer susceptibility genes as mutations in these two genes are limited especially among Caucasian people (2).

Kiiski et al.(3), have demonstrated that a nonsense mutation in a DNA repair gene FANCM, could be a potential candidate for breast cancer predisposition, especially in triple-negative breast cancer patients, by exome sequencing in 24 people of 11 Finnish families with preexisting breast cancer.


Noh et al.(1), have used a Korean family with three sisters with existing breast or thyroid cancer and used their cancer-free mother as a control. They all have done genetic testing for BRCA and found negative for the test.

Thompson et al.(2), have selected 15 families with familial breast or ovarian cancer form kConFab and Peter MacCallum Cancer Centre Familial Cancer Centre from Australia and New Zealand (2). In this study, 238 Caucasian females were used as controls obtained from kConFab and Princess Anne Hospital, UK. When all are tested for BRCA mutation they resulted in negative.

Kiiski et al.(3), selected 24 patients from 11 families with familial breast cancer, and genotyped controls were selected from the same regions in Finland (Helsinki breast and ovarian cancer patients, breast cancer patients from Tampere and Iceland regions of Finland). In their study, all the subjects were proven to be BRCA negative breast cancer.

Whole Exome Sequencing (WES)

WES is defined as a technique which involves sequencing of only protein coding region in the genome which constitutes only 1% of the genome (26). It is used to identify genetic variants related to several hereditary diseases for example breast cancer and ovarian cancer (16, 29).

Noh et al.(1), used Agilent V4+UTRs exome enrichment kit for exome capture. In this study the enriched fragments were sequenced using 100 bp paired-end Illumina HiSeq 2000 platform for exome sequencing, followed by data analysis by Macrogen exome sequencing pipeline. Approximately 96% of target regions have coverage more than 10X and the average mean-depth of the target regions was 85X (1).

Thompson et al.(2), performed exome enrichment on Roche NimbleGen, followed by sequencing using Illumina HiSeq platform. In this study, sequence alignment and analysis were done by Burrows-Wheeler Aligner (BWA) and Genome Analysis Tool Kit (GATK) software respectively. An average of more than 70% reads (target bases) were mapped and aligned to the reference genome, and approximately 89% of target bases have more than 10X coverage (2).

Kiiski et al.(3), used Agilent SureSelect Human All Exon enrichment kit for exome capture followed by sequencing using 100-bp paired-end Illumina HiSeq2000 platform. In this study, equence alignment and analysis were done by Burrows-Wheeler Aligner (BWA), Picard and Samtools. Approximately 91% of target regions have coverage more than 10X and mean read coverage of 101 for nonsynonymous variants (3).

Variant calling and identification

After mapping the reads to the reference genome and analysis, filtering, recalibration and followed by variant calling is done. The purpose of variant calling is to determine how many mutations are present in a region and other various genomic variations in the genome, e.g., SNPs, SNVs, indels, etc.

Noh et al.(1), used SAMtools and ANNOVAR to identify and annotation respectively, followed by filtering of SNVs and indel. They used dbNSFP for the final annotation. Three different algorithms were used to identify variants and their effect on protein function (1).

Thompson et al.(2), used GATK unified Genotyper for identification of SNVs and indels, and Ensembl for annotation and realignment of single-end reads. Duplicate reads were filtered out before variant calling and quality score assessment by Exome Variant Server. Variants were further confirmed by traditional Sanger sequencing (2).

Kiiski et al.(3), used AmiGO and Annovar software for annotation, and Exome Variant Server for filtering (mean coverage 5’ helicase enzyme activity, which means unable to repair DNA damage (31). This is also a genetic disorder which is predominantly present in Ashkenazi Jews (32). In another comprehensive study for index cases of 438 BRCA-negative breast cancer families, FANCC and BLM mutation was reported in one more family for each from kConFab dataset, whereas one family was identified with FANCC variant in 957 BRCA1/2 uninformative breast cancer families from the Peter MacCallum Cancer Centre (2). The index case which had observed breast cancer at age of 60, carried FANCC variant as well as BRCA2 mutation, and genotyping revealed that individual could carry a breast cancer risk from any of these mutations (2). On the other hand, the index case of BLM variant observed breast cancer at 33 years of age, and their segregation analysis conferred that the father carried the mutation or the breast cancer risk instead of the mother though nobody from paternal generations developed cancer. Both the FANCC and BLM mutations were absent in 464 controls (2). Exome Variant Server (EVS), NHLBI Exome Sequencing Project, Seattle, WA reported 3 variants in 3,510 individuals for FANCC, and 4 variants in 3,510 individuals for BLM respectively (2). This showed that FANCC and BLM mutations were comparatively sporadic in the European individuals and one mutation was present in the control samples as it was chosen from a diverse group of individuals (2).

When 24 BRCA-negative patients from 11 breast cancer families (Finnish) were studied, Kiiski et al.(3), found 80,867 nonsynonymous variants. Further they have eliminated variants with mean coverage

Did you like this example?