Find genes in metagenomes, using Hidden Markov Models
MetaHMM takes multiple UniProt protein accession numbers, and searches for similar genes in a metagenome. It works by building a Hidden Markov Model on the aligned sequences. more...
How it works?
MetaHMM is a free online tool for finding genes in metagenomes, using Hidden Markov Models.
The input should be a list of UniProt accession numbers (such as P12345), which refer to supposedly homologous proteins. MetaHMM builds a model on these proteins. Then it tries to find genes in metagenomes which code sequentially similar proteins.
You can choose one or more metagenomes from a list. All the metagenomes are from the iMicrobe portal.
It is now also possible to search in a custom metagenome. You have to upload it to your own HTTP or FTP storage as a FASTA file, and provide a direct link to that. Your file may be compressed with gzip (.gz) or bzip2 (.bz2), but this is optional. The maximum allowed size is 1GB uncompressed. If the file is greater than that, only the first 1GB will be considered. Note that links to popular file hosting sites might not work, only direct HTTP or FTP links are acceptable. Sequence formats other than FASTA are not supported.
Technical details: MetaHMM aligns the input sequences using Clustal Omega, builds a Hidden Markov Model (HMM) using hmmbuild, and uses hmmsearch to search for similar coded proteins in the metagenomes. The toolchain is run on our high-performance multiple core server computer.
If you publish anything using MetaHMM, you are kindly requested to cite our publication:
Balazs Szalkai, Vince Grolmusz: MetaHMM: A Webserver for Identifying Novel Genes with Specified Functions in Metagenomic Samples; Genomics, Vol. 111, No. 4, pp. 883-885, (2019) https://doi.org/10.1016/j.ygeno.2018.05.016
References
- The UniProt Consortium: Update on activities at the Universal Protein Resource (UniProt) in 2013, Nucleic Acids Res. 41: D43–D47 (2013)
- Sun et al: Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource, Nucleic Acids Research 2010; doi: 10.1093/nar/gkq1102
- Sean R. Eddy: A new generation of homology search tools based on probabilistic inference, Genome Inform. 2009 Oct 23(1):205–11
- Sievers et al: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Molecular Systems Biology 7 Article number: 539
- Goujon et al: A new bioinformatics analysis tools framework at EMBL-EBI (2010), Nucleic acids research 2010 Jul, 38 Suppl: W695–9
Terms of use
You can use this service only if you accept the following terms: We do not guarantee anything about this service: We do not state anything about the usability of this service, and we do not state that the results that we may return can be used for any purpose. We cannot guarantee that this service will be available in the future, and we cannot guarantee that your query would generate any output at all.
Privacy: We will not give out your data to anyone, and, regularly, only you can retrieve the results to your query using the unique webpage identifier generated for you. However, we cannot guarantee that others do not intercept the traffic between you and our server. Therefore, do not use our webserver for proprietary data analysis, we cannot guarantee the data integrity and safety for you.