Windows操作系统下二代测序数据处理平台的建立及高通量信息分析标准流程在个人计算机上的实现

设为首页

收藏本站

网站地图 | English | 公务邮箱

NSTL服务站

详细信息查看全文 | 推荐本文 |

英文篇名：Assembly of a Windows~?-based NGS Data Processing Platform and Engagement of Standard Protocols for High-throughput Bioinformatic Analyses on a Personal Computer
作者：黄文秋 ; 刘旭 ; 王俐勇 ; 程杉 ; 叶海虹 ; 丁卫
英文作者：HUANG Wen-Qiu;LIU Xu;WANG Li-Yong;CHENG Shan;YE Hai-Hong;DING Wei;School of Basic Medical Sciences,Capital Medical University;Central Laboratory,Capital Medical University;
关键词：高通量测序 ; 生物信息学 ; 标准化流程 ; 个人计算机 ; Windows操作系统
英文关键词：high-throughput sequencing;;bioinformatics;;standard protocol;;personal computer;;Windows operation system
中文刊名：SWHZ
英文刊名：Chinese Journal of Biochemistry and Molecular Biology
机构：首都医科大学基础医学院;首都医科大学中心实验室;
出版日期：2019-06-20
出版单位：中国生物化学与分子生物学报
年：2019
期：v.35
基金：国家自然科学基金(No.81572704)资助~~
语种：中文;
页：SWHZ201906015
页数：6
CN：06
ISSN：11-3870/Q
分类号：118-123

摘要

当前二代测序数据的处理广泛使用基于标准版本的Linux操作系统分析方法。这一系统专业性强,成本较高,操作界面不够友好,严重限制了大多数科研人员对数据的自主分析。本文创建了一个基于微软Windows操作系统的全功能二代测序数据的生物信息学分析系统,利用该系统经优选实现当前多种高通量测序数据的主流标准化分析流程。通过RNA-Seq的代表性案例,演算实测数据与传统Linux系统驱动的数据分析结果相比较,结果显示,本系统的组件和流程在常用的数据分析过程中,可以基本取代目前主流的Linux服务器或云计算平台,在运行效率相近的情况下,其操作极为简便且成本大大降低。本系统与所配附的编译软件及流程脚本,不仅为测序数据的生物信息学分析实操演练提供全面的解决方案,而且可以直接应用于专业的测序数据分析中。
Current computation platforms for data processing and informatic analyses of next generation sequencing( NGS) are mainly based on popular versions of the Linux operating system. This system is highly professional,expensive,and difficult to operate,which seriously limits the majority of researchers to independent analysis of the data. In this paper,a full-functional bioinformatic analysis platform was established for NGS data running on the Microsoft Windows operating system. The procedures were optimized and tested for a series of standardized analyzing routines with high-throughput sequencing data.The performance results were compared with Linux-based protocols. With the appealing ease to operate and for management,the test outcome indicated that the protocols or components from the system could replace the currently dominant computation server/cloud solutions in most of the case applications,and yet the cost was greatly reduced. Our system can be utilized for professional bioinformatics operations under a variety of circumstances,and above all it can also be readily employed for practical training in high-throughput sequencing analyses.

引文

[1]Lander ES,Linton LM,Birren B,et al.Initial sequencing and analysis of the human genome[J].Nature,2001,409(6822):860-921
    [2]Venter JC,Adams MD,Myers EW,et al.The sequence of the human genome[J].Science,2001,291(5507):1304-1351
    [3]Cambiaghi A,Ferrario M,Masseroli M.Analysis of metabolomic data:tools,current strategies and future challenges for omics data integration[J].Brief Bioinform,2017,18(3):498-510
    [4]Oliver GR,Hart SN,Klee EW.Bioinformatics for clinical next generation sequencing[J].Clin Chem,2015,61(1):124-135
    [5]Langmead B,Trapnell C,Pop M,et al.Ultrafast and memoryefficient alignment of short DNA sequences to the human genome[J].Genome Biol,2009,10(3):R25
    [6]Langmead B,Salzberg SL.Fast gapped-read alignment with Bowtie 2[J].Nat Methods,2012,9(4):357-359
    [7]Li H,Durbin R.Fast and accurate short read alignment with Burrows-Wheeler transform[J].Bioinformatics,2009,25(14):1754-1760
    [8]Kim D,Langmead B,Salzberg SL.HISAT:a fast spliced aligner with low memory requirements[J].Nat Methods,2015,12(4):357-360
    [9]Li H,Handsaker B,Wysoker A,et al.The Sequence Alignment/Map format and SAMtools[J].Bioinformatics,2009,25(16):2078-2079
    [10]Pertea M,Pertea GM,Antonescu CM,et al.StringTie enables improved reconstruction of a transcriptome from RNA-seq reads[J].Nat Biotechnol,2015,33(3):290-295
    [11]Schmid MW,Grossniklaus U.Rcount:simple and flexible RNA-Seq read counting[J].Bioinformatics,2015,31(3):436-437
    [12]Feng J,Liu T,Qin B,et al.Identifying ChIP-seq enrichment using MACS[J].Nat Protoc,2012,7(9):1728-1740
    [13]Mc Kenna A,Hanna M,Banks E,et al.The Genome Analysis Toolkit:a MapReduce framework for analyzing next-generation DNA sequencing data[J].Genome Res,2010,20(9):1297-1303
    [14]Wang K,Li M,Hakonarson H.ANNOVAR:functional annotation of genetic variants from high-throughput sequencing data[J].Nucleic Acids Res,2010,38(16):e164
    [15]Robinson JT,Thorvaldsdóttir H,Winckler W,et al.Integrative genomics viewer[J].Nat Biotechnol,2011,29(1):24-26
    [16]Mortazavi A,Williams BA,Mc Cue K,et al.Mapping and quantifying mammalian transcriptomes by RNA-Seq[J].Nat Methods,2008,5(7):621-628
    [17]Trapnell C,Roberts A,Goff L,et al.Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks[J].Nat Protoc,2012,7(3):562-578
    [18]Pertea M,Kim D,Pertea GM,et al.Transcript-level expression analysis of RNA-seq experiments with HISAT,StringTie and Ballgown[J].Nat Protoc,2016,11(9):1650-1667
    [19]Sahraeian SME,Mohiyuddin M,Sebra R,et al.Gaining comprehensive biological insight into the transcriptome by performing a broad-spectrum RNA-seq analysis[J].Nat Commun,2017,8(1):59
    [20]Nicolas L Bray,Harold Pimentel,Páll Melsted,et al.Nearoptimal probabilistic RNA-seq quantification,Nat Biotechnology[J].2016,34:525-527
    [21]De Leeneer K,Hellemans J,Steyaert W,et al.Flexible,scalable,and efficient targeted resequencing on a benchtop sequencer for variant detection in clinical practice[J].Hum Mutat,2015,36(3):379-387

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700