Scientists from Broad Institute of MIT and Harvard report in-depth analysis of 42 whole-genome bisulfite sequencing data sets across 30 diverse human cell and tissue types. This study tries to answer when, where, and how many CpGs are involved in genomic regulation.
Within a normal developmental context, the scientists observed dynamic regulation for only 21.8% of autosomal CpGs, most of which are situated away from transcription start sites. These dynamic CpGs co-localize with enhancers and transcription-factor-binding sites, allowing for identification of key lineage-specific regulators.
As evidenced by genome-wide association studies, differentially methylated regions (DMRs) frequently contain single nucleotide polymorphisms associated with cell-type-related diseases. The results also indicate the inefficiency of whole-genome bisulfite sequencing, since 70–80% of the sequencing reads across these data sets provided little or no relevant information about CpG methylation.
To demonstrate further the utility of their DMR set, they use it to classify unknown samples and identify representative signature regions that recapitulate major DNA methylation dynamics. Theoretically speaking, every CpG can change its methylation state, but their results suggest that only a fraction does so as part of coordinated regulatory programs. Subsequently, the scientists’ selected DMRs can serve as a starting point to guide new, more effective reduced representation approaches to capture the most informative fraction of CpGs, as well as to further pinpoint putative regulatory elements.