Next Generation Sequencing and Bioinformatics
Bioinformaticians in the GenePool aim to support users in getting the best out of their data. This can range from the preliminary quality control and validation of raw data through to intensive analyses and data mining. We use a range of custom-built tools and off-the-shelf programmes, and the power of the University of Edinburgh's Edinburgh Compute and Data Service (ECDF) computing cluster. We can also assist you in setting up analyses on your 'home' computer.
We coordinate an open discussion forum, NextGenBUG. We meet every two months in venues across Scotland - please see the NextGenBUG web site for information on forthcoming meetings. We also host an online discussion of many next-generation sequencing bioinformatics issues. NextGenBUG is sponsored by the Scottish Bioinformatics Forum.
Next generation sequencing projects generate reads numbering from the hundreds of thousands to the tens of millions, and cannot be analysed as easily as small numbers of Sanger reads. We are skilled in analysis of sequence data from Sanger, Illumina, and Roche 454 sequencing platforms.
Base-calling using high quality algorithms and sequence processing
We perform high-throughput trimming of user sequences to remove vector/adapter and low quality base calls, and can format results as required for subsequent analysis or database submission.
We are experts at clustering Sanger- or Roche 454-sequenced EST datasets and transcriptome short reads into putative gene objects ('unigene sets'). These unigene sets can then be annotated by comparison to reference transcriptomes, and analysed for protein domain, enzymatic function or biological pathway content. Many users are aslo extracting microsatellite and single nucleotide polymorphism markers from transcriptome data.
Digital transcriptomics and ChIP-sequencing analysis
It is possible to use the massively parallel sequencing process of the Illumina platform to determine expression levels of mRNAs (using techniques called deepSAGE and RNA-Seq), or of the binding sites and occupancy levels of proteins interacting with DNA or RNA isolated by immunoprecipitation (ChIP-Seq and RNAIP-Seq). Analyses of these data are facilitiated by parallel processing and mapping to reference genomes and transcriptomes.
Assembly of smaller genomes and of large insert clones
De novo genome sequencing is rapidly achieved using the Illumina or Roche 454 platforms. To generate draft genome assemblies we use a variety of tools, and combine data from Illumina, Roche and Sanger platforms. Generation of assembled sequence for cosmids, fosmids and BACs from Sanger, Illumina or Roche 454 data is also possible.
Next generation technologies are ideally suited to resequencing of even complex (mammalian, 3 Gbase) genomes. GenePool bioinformaticians can map sequence reads to reference genomes and produce annotated lists of single nucleotide polymorphisms and other changes.
Porting applications to High Throughput Computing
Many biologists are moving from one gene-one question problems to programmes of research that require parallel processing of tens of thousands of sequences. We offer skills in the parallelizing of bioinformatics tasks so that they can be carried out efficiently on large computer clusters.
Other bioinformatics applications, such as:
Please contact us to discuss your bioinformatics needs, and for current pricing.
|The GenePool has core support from:||GenePool admin site|
|The University of Edinburgh
School of Biological Sciences
The Darwin Trust