Insight into the Dekkera anomalus YV396 genome – part 1

It has been a while since I look at Brettanomyces genomes since there wasn’t that much data available to play around. And mainly B. bruxellensis data is available due to its importance in the wine industry. This all changed when I came across the deposited draft genome assembly of a Dekkera anomalus YV396 genome in June by Vervoort,Y et al. Since there was no annotation material available for this genome I quickly decided to give the annotation a shot myself. Simply because I am interested in certain pathways in Brettanomyces. Everything I needed was my Ubuntu notebook (which died during the annotation process), my new Ubuntu workstation (replacing the notebook) and some Python coding. No access to a cluster whatsoever. Is it possible to finish an entire annotation project at home? You will find out very shortly. As I am still compiling data for a post, I want to start sharing the material part as well as the first abstract. Just to give you a sneak-peek into the project. The remaining part of the genome & proteome project will get published very soon. Just give me some additional time to finish up the various pathway analysis and writing up the paper. Still a lot to be discovered in the new genome…


Screenshot from 2015-08-15 18:08:49


 

I – Methods

Genome assembly

The draft genome assembly of Dekkera anomalus strain YV396 (isolated from a Belgian brewery) was retrieved from GenBank (accession number LCTY00000000.1; June 2015) deposited in May 2015 by KU Leuven [Vervoort et al.]. Illumina HiSeq data (100x coverage) was assembled into a genome using SOAPdenovo v.1.05. The statistics for the obtained assembly are summarized in Tab. 1.

Screenshot from 2015-08-15 18:18:48Gene prediction

Gene prediction on contigs was performed using the AUGUSTUS web-service (AUGUSTUS parameter project identifier: pichia_stipitis, UTR prediction: false, report genes on both strands, alternative transcripts few, allowed gene structure: predict any number of (possibly partial) genes, ignore conflicts with other strand: false) [Stanke et al.2006, 2008]. The gene prediction statistics are summarized in Tab. 2.

Screenshot from 2015-08-15 18:21:35Gene annotation

Gene annotation was performed by Blast2GO including remote blastx on NCBI and InterProScan for domain predictions [Conesa et al, 2005]. GO-term mapping and annotation performed by Blast2GO pipeline. Close to 3,000 out of the predicted 4,160 could be annotated by Blast2GO (Fig 3). Another subset of about 600 sequences could be mapped to a biological function without a GO term and about 460 sequences only resulted in BLAST hits which could not be further associated with a protein function.

blast2go_statistics_20150811_2050Most abundant species associated with the best blastx hits were Dekkera bruxellensis, Ogataea polymorpha and Pichia kudriavzevi (not shown).

References

  • Conesa, A., Götz, S., García-Gómez, J. M., Terol, J., Talón, M., and Robles, M. (2005). Blast2GO:a universal tool for annotation, visualization and analysis in functional genomics research. Bioinformatics, 21(18):3674–3676.
  • Stanke, M., Diekhans, M., Baertsch, R., and Haussler, D. (2008). Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics, 24(5):637–644.
  • Stanke, M., Schoffmann, O., Morgenstern, B., and Waack, S. (2006). Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics, 7(1):62.
  • Vervoort,Y., Herrera-Malaver,B., Mertens,S., Guadalupe Medina,V., Duitama,J., Michiels,L., Derdelinckx,G., Voordeckers,K., Verstrepen,K.J. (2015) Purification and characterization of a novel Brettanomyces anomalus beta-glucosidase enzyme suitable for food bioflavoring – unpublished.