Bioinformatics

Bioinformatics

Polo GGB offers sequencing services based on NGS technology (both short and long reads) and bioinformatics services.

The software infrastructure for the bioinformatics analysis service employs customized and open-source software to ensure a comprehensive and tailored data analysis. Polo GGB's bioinformatics services cover a wide range of NGS applications

Bioinformatics
services

Genomic Data Analysis

Genome-based research allows clinicians and biomedical researchers to considerably increase the amount of genomic data collected on large study populations, to develop improved diagnostics and to discover more effective therapeutic strategies. PoloGGB provides Genomic Data Analysis service thanks to our experience in helping solve a wide variety of bioinfomatics problems. Our software infrastructure for analysis is a combination of custom-built and open-source software to cover a wide range of genomic applications.

Transcriptomic Data Analysis

Transcriptome sequencing is often the method of choice for the analysis of differentially expressed genes, for the investigation on splicing patterns, splicing variants, gene isoforms, single nucleotide polymorphisms, post-transcriptional modifications and for monitoring the population of expressed transcripts in a given condition at a specific time. PoloGGB provides Transcriptomic Data Analysis service thanks to our experience in helping solve a wide variety of bioinfomatics problems. Our software infrastructure for analysis is a combination of custom-built and open-source software to cover a wide range of genomic applications.

Metagenomic Data Analysis

Metagenomics has been responsible for significant advances in microbial ecology. The analysis is a cost-effective solution for the identification and quantification of genetic material in uncultured microbial communities. It provides insights for phylogenetic and taxonomic studies of complex microbiomes and environmental niches that are hard or impossible to investigate in other ways.

Epigenomic Data Analysis

With improvements in epigenomic profiling assays and with increasing amounts of epigenomic data, new perspectives are available to understand normal epigenomes and their modifications. PoloGGB provides Epigenomic Data Analysis service thanks to our experience in helping solve a wide variety of bioinfomatics problems. Our software infrastructure for analysis is a combination of custom-built and open-source software to cover a wide range of genomic applications.

The Laboratory of
Bioinformatics

The Bioinformatics Laboratory is situated in Siena, located at the bio-incubator of the Toscana Life Sciences Foundation.

This facility grants access to technologies for conducting bioinformatics and biostatistics analyses for biomedical research. It has a particular focus on developing algorithms and tools for measuring parameters related to genomic analysis. Many of the applied algorithms are implemented based on freely available software accessible via the web, such as the Genome Analysis Toolkit (GATK).

The specialized skills within the Polo GGB staff enable the provision of not only standard bioinformatics services but also customized processing through the design of specific software modules dedicated to personalized genomics analysis.

Address:
c/o Toscana Life Sciences
Strada del Petriccio e Belriguardo 35, 53100 SIENA

Contact:
Chiara Leo ~ c.leo@pologgb.com

Phone:
+39 0577 381310

Genomic Data Analysis

Since the conclusion of the Human Genome project, there has been an unprecedented increase of genomic sequence data.

A direct consequence of such fact is that, in future, the medical discoveries will largely depend on our capability to process and analyze huge genomic data sets.
Genome-based research allows clinicians and biomedical researchers to considerably increase the amount of genomic data collected on large study populations, to develop improved diagnostics and to discover more effective therapeutic strategies. Therefore, it appears foreseeable that treatments will be tailored to a patient’s particular genomic makeup. Bioinformatics and new informatics approaches are crucial for the analysis of data streams and for a better understanding of the genetic bases of drug response and disease.

PoloGGB provides Genomic Data Analysis service thanks to our experience in helping solve a wide variety of bioinfomatics problems. Our software infrastructure for analysis is a combination of custom-built and open-source software to cover a wide range of genomic applications.

Our Genomics solutions

A whole genome assembly de novo consists of rebuilding a genome lacking any reference as no “map” is available to guide the assembly.

The assembler starts with the overlapping of the bunch of short sequencing reads in order to generate longer contigs that are joined together to create scaffolds. It is a computationally demanding process and needs some parameter tunings based on the specific organism.

Briefly, preprocessing and filtering steps are performed to detect and correct inaccurate reads before the analysis. Short read sequences are used to build an abstraction graph which can be reduced in order to create longer reads, then contigs are built and extended into scaffolds.

An assembled genome can be annotated with description of regions that can be called genes, including ORFs and the putative biological function of gene products. Some genome annotation databases comprise ENCyclopedia Of DNA Elements (ENCODE), Entrez Gene, Ensembl, GENCODE, Gene Ontology Consortium, GeneRIF, RefSeq, Uniprot, Vertebrate and Genome Annotation Project (Vega).

  WHOLE GENOME SEQUENCING
Whole exome sequencing consists of capturing the exons (EXpressed regiONS) of genes, which represent the coding region of the genome.

It is a quick and effective strategy to find disease-causing genes for rare Mendelian diseases and to outline all variants in complex disorders such as cancer, diabetes, Age-Related Macular Degeneration. Many variant calling softwares, as well as best practices methods, have been developed for exome sequencing variant calling, as SNVs and Indels are the most abundant and significant sources of variation in exons. New approaches to decrease False Positive and False Negative rates are some of the most key challenges. Exome sequencing analysis can be divided into these steps:

  1. Base calling and image analysis
  2. Alignment of exome data to reference genome
  3. Sorting, indexing and PCR duplicate removal
  4. SNP and small INDEL calling
  5. Variants annotation
INDELs and SNVs are annotated with consequences over genes or regulatory regions and the visualization of output data is possible with IGV (Integrative Genomics Viewer). WHOLE EXOME SEQUENCING
Single nucleotide polymorphisms and small insertions/deletions calling in the region of interest is one the most commonly implemented type of NGS analysis. Reads are firstly aligned to a reference genome, then nucleotides that diverge from the reference are highlighted.

Usually a related confidence score is given as NGS data will bring some error. This requires each nucleotide to be sequenced more times to be valid. A number of softwares use Phred score to discriminate if a variation is a SNP / Indel or sequencing instrument noise. This is usually done by modeling different error types under homozygous, homozygous variant and heterozygous states. Error probabilities are derived using raw sequence data, mapping quality, quality scores and models for correlating errors at a specific site.

Variant annotation provides supplementary data from public databases to identified variants. These annotations usually contain the definition of the variant, a measure of likelihood, the genotype, the location (e.g. gene, exon, coding region, …) and the consequence in the encoded amino acids.

  TARGETED RESEQUENCING ANALYSIS
Targeted analysis includes the amplification and sequencing of only genomic sequences of interest, by means of a capture-based method.

Our workflows for targeted resequencing analysis can be fully customized for specific projects and often include the following common steps:

  1. Experiment design consulting
  2. Quality control of sequencing data
  3. Alignment of sequences to a reference genome
  4. SNPs and small INDELs identification
  5. Frequency estimation of SNPs and INDELs
  6. Variants annotation and prediction of mutation effect on gene
  WHOLE GENOME SEQUENCING   WHOLE EXOME SEQUENCING   TARGETED RESEQUENCING ANALYSIS
Complete sequencing of single bacterial or fungal genomes. Indispensable for accurate microbial identification.

The complete sequencing of the microbial genome allows an integral evaluation of all the genetic characteristics of an isolated bacterium or fungus. It is a fundamental method for precise identification, for the creation of reference genomes and comparative genomic studies, for the identification of low-frequency variants and rearrangements of the genome. The shotgun de novo sequencing of the entire microbial genome has a vast range of applications including comparative genomics, which compares the sequence with that of a known reference and reveals important differences in the composition and organization of the genome, facilitating the identification of functional genes involved in important biological processes.

  MICROBIAL WHOLE GENOME ASSEMBLY

Transcriptomic Data Analysis

Understanding the transcriptome is needed for the interpretation of functional elements of the genome as well as for the knowledge of the underlying mechanisms of disease and development.

Transcriptome sequencing is often the method of choice for the analysis of differentially expressed genes, for the investigation on splicing patterns, splicing variants, gene isoforms, single nucleotide polymorphisms, post-transcriptional modifications and for monitoring the population of expressed transcripts in a given condition at a specific time. This is the reason why the processing of huge amount of transcriptome data requires dedicated bioinformatics programs and targeted scientific capabilities.

PoloGGB provides Transcriptomic Data Analysis service thanks to our experience in helping solve a wide variety of bioinfomatics problems. Our software infrastructure for analysis is a combination of custom-built and open-source software to cover a wide range of genomic applications.

Our Transcriptomics solutions

De novo whole transcriptome assembly from the total RNA sequencing, consists of the transcriptome assembling without the need of a reference genome, as long as there are sufficient paired end reads.

The reads are assembled into transcripts using a short read assembler and transcripts are assembled into longer contigs by merging overlapping reads.
De novo transcriptome assembly is the commonly chosen method to studying non-model organisms as it is much cheaper than to first building a reference genome.
After mapping sequence reads onto a reference genome or de novo assembling a transcriptome, it is possible to identify and quantify putative mRNA transcripts and to detect protein coding regions, as well as differential expression analysis can be performed.
Finally, the annotation of the transcriptome gives information on biological function of the transcripts and the proteins they encode for, using well know molecular function, gene ontology and pathway tools.

 

WHOLE TRANSCRIPTOME SEQUENCING

The essential first step before small RNAs can be identified and quantified is mapping the small RNA (miRNA, lincRNA, snoRNA, snRNA, tRNA) sequenced libraries onto their reference genome and public databases e.g. miRBase.

Short reads can then be annotated and classified into known small RNA categories, as well as differential expression analysis can be performed.
In particular, micro RNAs (miRNAs) are a class of small non-coding RNAs from 18 to 22bp long. Recent findings about the functions and roles of miRNAs brought great attention to the study of this new level of gene regulation, which is involved in development and some diseases. Novel miRNAs can be discovered by aligning a high-depth small RNA sequencing to the reference genome and predicting the secondary structure of the precursor

 

SmallRNA SEQUENCING

Long noncoding RNAs (lncRNAs) consitiutes a wide and diverse class of RNA molecules more than 200bp long that don’t encode for proteins.

lncRNAs are assumed to include approximately 30,000 diverse transcripts in humans, henceforth lncRNA transcripts are the most abundant portion of the noncoding transcriptome. Though a number of lncRNAs were functionally annotated, most of them remain to be characterized.
However, lncRNA discovery is at an initial phase and only a minor fraction of lncRNAs were studied. While we can try to classify diverse kinds of lncRNA functions, we are not yet capable to guess the function of new lncRNAs. PoloGGB proposes the expression profiling as a method to reveal the function of lncRNA. Detecting differentially expressed lncRNAs in specific experimental conditions could shed light on their possible functions.
Our chosen pipeline first aligns and assembles RNA-seq data to build a complete transcriptome assembly for all the samples. Then, using a series of filtering criteria based on gene annotations, sequence length, expression level, coding potential and other features, a list of putative lncRNA candidates is defined containing basic information that includes transcript size, genomic location, and optional differential gene expression.

 

LncRNA SEQUENCING
Differential expression analysis encompasses the identification and the quantification of genes or transcripts whose expression changes among samples and experimental settings.

Recent approaches to study RNA-Seq data comprise the quantification of expression within the margins of formerly published genes and algorithms designed to rebuild full-length transcripts.
Differential expression analysis means taking the normalised read count data and performing statistical analysis to discover quantitative changes in expression levels between experimental groups.

 

WHOLE TRANSCRIPTOME SEQUENCING

 

SmallRNA SEQUENCING

 

LncRNA SEQUENCING

Epigenomic Data Analysis

The emerging role of epigenomics brings to the forefront unpreceded opportunities for the development of new therapies, for the identification of unique targets for pharmacologic and other therapeutic interventions and for a better understanding of the pathologic basis of human diseases.

With improvements in epigenomic profiling assays and with increasing amounts of epigenomic data, new perspectives are available to understand normal epigenomes and their modifications. An increasing number of computational tools and methodological frameworks for predictive modeling have been developed in recent times to analyze these new sets of epigenomic data.

PoloGGB provides Epigenomic Data Analysis service thanks to our experience in helping solve a wide variety of bioinfomatics problems. Our software infrastructure for analysis is a combination of custom-built and open-source software to cover a wide range of genomic applications.

Our Epigenomics solutions

Chromatin immunoprecipitation (ChIP) followed by sequencing is a great method to identify DNA loci bound by specific proteins of interest (transcription factors, histones, chaperones and other nuclear proteins).

This assay is helpful in examining the role of protein-DNA interactions involved in gene expression regulation and other essential cellular processes, to fully understand biological processes and diseases.
Reads are first aligned to reference genome, then it is possible to predict the regions of the genome where the protein is bound (peak calling) by measuring the number of reads that map to the specified region.
Differential binding can be analyzed to determine which DNA regions are bound in different samples or conditions, while peaks can be annotated when in correspondence of known transcription start sites (TSS), promoters or intergenic regions.

 

ChIP-SEQ
Cytosine methylation is a known epigenetic mark that have significant consequences for regular and disease biological processes.

Bioinformatic tools for studying DNA methylation data generally involve the following main steps:

  1. Quality control
  2. Methylation calling
  3. Sequence mapping and methylation peaks visualization
  4. Statistics to identify and interpret sample specific differences

The bisulfite conversion method of unmethylated cytosines to uracil leads to detailed mapping of methylcytosine positions. The process includes sequence alignment and the quantification of absolute DNA methylation at single base resolution. To inspect global distribution of DNA methylation, selected regions can be visualized on a commonly used genome browser such as IGV (Integrative Genomic Viewer). Cluster investigation and the discovery of differentially methylated regions (DMRs) in sample groups can be executed manually or using different automated workflows.

 

WHOLE GENOME BISULFITE SEQUENCING

 

TARGETED BISULFITE RESEQUENCING

Metagenomic Data Analysis

Metagenomics has been responsible for significant advances in microbial ecology. The analysis is a cost-effective solution for the identification and quantification of genetic material in uncultured microbial communities. It provides insights for phylogenetic and taxonomic studies of complex microbiomes and environmental niches that are hard or impossible to investigate in other ways.

As more metagenomic datasets are generated, the availability of standardized procedures, data storage and analysis becomes increasingly important. With the growing numbers of data, the analysis of metagenome applies a suite of genomic technologies and bioinformatics tools to directly access the genetic content of whole communities of organisms.

PoloGGB provides Metagenomic Data Analysis service thanks to our experience in helping solve a wide variety of bioinfomatics problems. Our software infrastructure for analysis is a combination of custom-built and open-source software to cover a wide range of genomic applications.

Our Metagenomics solutions

Metagenomic analysis is a cost-effective technique for the identification and quantification of the genetic material from uncultured microbial communities with bioinformatics tools, offering a powerful lens for studying phylogeny and taxonomy of samples from complex microbiomes or environments that are difficult or impossible to study.

Next generation sequencing of 16S/18S/ITS rDNA is a well-established amplicon based sequencing method that allows the detection of most of bacteria and/or fungi present in the sample that may not be found using other methods and the assessment of their biological diversity.
Metagenomic annotation depends on organizing sequences to known taxonomic units based on homology queries against a previously deposited reference database to perform taxonomic assignments of reads.

 

16S/18S/ITS AMPLICON SEQUENCING

Metagenomic analysis is a cost-effective technique for the identification and quantification of the genetic material from uncultured microbial communities with bioinformatics tools, offering a powerful lens for studying phylogeny and taxonomy of samples from complex microbiomes or environments that are difficult or impossible to study.

Analysis of metagenomic whole genome shotgun data includes three major steps: assembly, annotation, and statistical analysis. If the goal is to analyze the genome of the microorganism rather than its community, short reads should be assembled into longer genomic contigs. Assembly approaches for metagenomic samples fall into two categories: reference based assembly and de novo assembly. Metagenomic annotation depends on organizing sequences to known taxonomic units based on homology queries against a previously deposited reference database to perform taxonomic assignments of reads.

 

METAGENOMIC SHOTGUN SEQUENCING
Contact us
×

 

Ciao

Clicca sul contatto per iniziare una chat con noi.

×