The project was closely coordinated with two related Pilot programs in the ongoing 1000 Genomes Project, the Trio Sequencing Pilot and the Low Coverage Sequencing Pilot, enabling quality control and performance comparisons. After filtering out genes that could not be fully tested because of failed capture or low sequence coverage, and samples that showed evidence of cross-contamination, a final sequence data set was assembled that corresponded to a total of 1.43 Mb of exonic sequence (8,279 exons representing 942 genes) in 697 samples (see section 3, 'Data quality control' and Figure S3 in Additional file 1 for details of our quality control procedures). In order to aggregate the data for a comparison of analytical methods, a set of consensus exon target regions was derived (Materials and methods Figure S2 in Additional file 1). To attain these objectives, while simultaneously improving DNA enrichment methods, we targeted approximately 1,000 genes in 800 individuals, from seven populations representing Africa (LWK, YRI), Asia (CHB, CHD, JPT), and Europe (CEU, TSI) in roughly equal proportions (Table 1).įour data collection centers, the Baylor College of Medicine (BCM), the Broad Institute (BI), the Wellcome Trust Sanger Institute, and Washington University applied different combinations of solid-phase or liquid-phase capture, and Illumina or 454 sequencing procedures on subsets of the samples (Materials and methods). Whereas many monogenic diseases are typically caused by extremely rare ( 10%), intermediate (1% < AF < 10%) and low frequency (AF < 1%) sites. The allelic spectrum of variants causing common human diseases has long been a topic of debate. This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variation. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency. Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |