23º SINAPE - Simpósio Nacional de Probabilidade e Estatística

Dados do Trabalho


Título

RQC: A BIOCONDUCTOR PACKAGE FOR QUALITY CONTROL OF HIGH-THROUGHPUT SEQUENCING DATA

Resumo

As sequencing costs drops with the constant improvements in the field, next-generation
sequencing becomes one of the most used technologies in biological research. Sequencing
technology allows the detailed characterization of events at the molecular level, including
gene expression, genomic sequence and structural variants. Such experiments result in
billions of sequenced nucleotides and each one of them is associated to a quality score.
Several software tools allow the quality assessment of whole experiments. However, users
need to switch between software environments to perform all steps of data analysis, adding
an extra layer of complexity to the data analysis workflow.

We developed Rqc, a Bioconductor package designed to assist the analyst during as-
sessment of high-throughput sequencing data quality. The package uses parallel computing
strategies to optimize large datasets processing, regardless the sequencing platform. We
created new data quality visualization strategies by using established analytical proce-
dures. That improves the ability of identifying patterns that may affect downstream pro-
cedures, including undesired sources technical variability. The software provides a frame-
work for writing customized reports that integrates seamlessly to the R/Bioconductor
environment, including publication-ready images. The package also offers an interactive
tool to generate quality reports dynamically.

Rqc is implemented in R and it is freely available through the Bioconductor project
(http://bioconductor.org/packages/Rqc) for Windows, Linux and Mac OS X operat-
ing systems.

Palavras-chave

next-generation sequencing, quality assessment, high-performance computing, R

Área

Estatística Aplicada em Ciências Médicas, Saúde e Meio Ambiente

Autores

Wélliton Souza, Benilton Sá Carvalho, Iscia Lopes-Cendes