Next Generation Sequencing and Bioinformatics
at the University of Edinburgh


Bioinformatics for Next Generation Sequencing | NextGenBUG

Bioinformaticians in the GenePool aim to support users in getting the best out of their data. This can range from the preliminary quality control and validation of raw data through to intensive analyses and data mining. We use a range of custom-built tools and off-the-shelf programmes, and the power of the University of Edinburgh's Edinburgh Compute and Data Service (ECDF) computing cluster. We can also assist you in setting up analyses on your 'home' computer.


We coordinate an open discussion forum, NextGenBUG. We meet every two months in venues across Scotland - please see the NextGenBUG web site for information on forthcoming meetings. We also host an online discussion of many next-generation sequencing bioinformatics issues. NextGenBUG is sponsored by the Scottish Bioinformatics Forum.


Next-generation bioinformatics for next-generation sequence data

Next generation sequencing projects generate reads numbering from the hundreds of thousands to the tens of millions, and cannot be analysed as easily as small numbers of Sanger reads. We are skilled in analysis of sequence data from Sanger, Illumina, and Roche 454 sequencing platforms.

Base-calling using high quality algorithms and sequence processing

We perform high-throughput trimming of user sequences to remove vector/adapter and low quality base calls, and can format results as required for subsequent analysis or database submission.

Transcriptome analysis

We are experts at clustering Sanger- or Roche 454-sequenced EST datasets and transcriptome short reads into putative gene objects ('unigene sets'). These unigene sets can then be annotated by comparison to reference transcriptomes, and analysed for protein domain, enzymatic function or biological pathway content. Many users are aslo extracting microsatellite and single nucleotide polymorphism markers from transcriptome data.

Digital transcriptomics and ChIP-sequencing analysis

It is possible to use the massively parallel sequencing process of the Illumina platform to determine expression levels of mRNAs (using techniques called deepSAGE and RNA-Seq), or of the binding sites and occupancy levels of proteins interacting with DNA or RNA isolated by immunoprecipitation (ChIP-Seq and RNAIP-Seq). Analyses of these data are facilitiated by parallel processing and mapping to reference genomes and transcriptomes.

Assembly of smaller genomes and of large insert clones

De novo genome sequencing is rapidly achieved using the Illumina or Roche 454 platforms. To generate draft genome assemblies we use a variety of tools, and combine data from Illumina, Roche and Sanger platforms. Generation of assembled sequence for cosmids, fosmids and BACs from Sanger, Illumina or Roche 454 data is also possible.

Resequencing genomes

Next generation technologies are ideally suited to resequencing of even complex (mammalian, 3 Gbase) genomes. GenePool bioinformaticians can map sequence reads to reference genomes and produce annotated lists of single nucleotide polymorphisms and other changes.

Porting applications to High Throughput Computing

Many biologists are moving from one gene-one question problems to programmes of research that require parallel processing of tens of thousands of sequences. We offer skills in the parallelizing of bioinformatics tasks so that they can be carried out efficiently on large computer clusters.

Other bioinformatics applications, such as:

  • Automated primer design for resequencing projects
  • Phylogenetics analyses using in-house and institutional computing clusters (running, e.g. MrBayes).
  • Motif and microsatellite discovery:We have tools for discovery of end-user specified patterns in large sequence datasets.
  • Sequence-based biodiversity assessment using bacterial 16S or DNA barcode targets
  • Bespoke solutions for all your sequencing bioinformatics needs.
  • Web-available databases of results: To aid end-user biological data mining, we have developed web-ready database solutions that permit remote searching.

Please contact us to discuss your bioinformatics needs, and for current pricing.

The GenePool has core support from: GenePool admin site
The University of Edinburgh
School of Biological Sciences
The Darwin Trust
of Edinburgh

Back to the top

This is version 2.2 of the GenePool website (July 2009)