A Parameterized Enumeration Algorithm for Minimum SNPs Removal Model of the Haplotype Assembly Probl

来源 :第五届全国生物信息学与系统生物学学术大会 | 被引量 : 0次 | 上传用户:cainong_111
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
  Background: Given the sequenced fragments from a pair of chromosomes, the goal of the haplotype assembly problem is to reconstruct the two haplotypes for the chromosomes.Many heuristic algorithms and parameterized algorithms have been introduced for the problem.So far for a given input instance, the algorithms all aim to construct a pair of haplotypes that is optimal or near-optimal.When there are many optimal solutions, they can only choose one randomly or the one they find at first.Hence enumerating multiple optimal solutions to the problem is of practical importance.Methods: Based on the characteristics of real DNA sequence fragment data, we propose a parameterized dynamic programming algorithm to enumerate multiple optimal solutions to the Minimum SNPs Removal (MSR) model of the haplotypes assembly problem when the fragments contain no missing data.Since some sequencing technologies tend to produce more errors in some specific DNA sequence sites than others, assuming there are just a few errorprone sites, MSR try to build the real pair of haplotypes with minimum SNPs deleted.We construct initial optimal solutions for the sub-dataset consisting of the data at the first SNP site ; we then consider the data containing the first two SNP sites, and so on.When reaching the last SNP site, we will obtain multiple optimal solutions to the MSR model.Results: The algorithm can find out at most k optimal solutions to the MSR model in the timecomplexity O(nkk1k2+mlogm+mkl) and the space complexity O(mn+kn2), where n is the number of SNPs, m is the number of fragments, kl is the maximum number of SNP sites that a fragment covers, and k2 is the maximum number of fragments that cover a SNP site.Extensive experimental results show that for a general input dataset, the minimum SNP site set whose removal make building a pair of haplotypes feasible is not unique, and that the different haplotypes pairs reconstructed by removing different SNP sets usually show different haplotype reconstruction accuracies.Conclusions: We propose an exact enumeration algorithm for the MSR model of the haplotype assembly problem.The algorithm can effectively provide multiple optimal solutions to biologists for further analyses .
其他文献
会议
会议
  Backgroud: The rapid growth of high-throughput experimental data of biology is providing more and more valuable information on genome-wide molecular enrichm
  Background: Spatiotemporal variation of gene expression can happen extensively.Thanks to current applications of high throughput technologies, e.g.microarra
  Due to the limitation on the calculating power of the computer, it is very difficult to simulate the whole folding process and the large-scal functional mot
  Background: Previous results indicated that the CDK2/Cyclin E1 protein complex, which plays a key role in regulating the cell cycle, could be disrupted by t
  Background: Nucleosome positioning plays an important role in regulation of the gene activity in eukaryotic cell.DNA sequence is believed to be one of the m
  Background: Pathway databases, especially KEGG, have been widely used as a reference knowledge base for biomedical scientists to interpret their experimenta
  Background: Trans-action siRNA (ta-siRNA) is a type of small interfering RNA detected from plant and is reported to play an important role in post transcrip
  Background: Yersinia pestis is a highly pathogenic Gram-negative bacterium.Y.pestis infection causes three deadly diseases: pneumonic plague, septicemic pla