Find genes in metagenomes, using Hidden Markov Models

MetaHMM takes multiple UniProt protein accession numbers, and searches for similar genes in a metagenome. It works by building a Hidden Markov Model on the aligned sequences. more...

[?]UniProt accession numbers, separated by spaces, newlines, commas or semicolons. The model will be built from these proteins.
Example: Q07911, Q9ZMV8, P02968
Tip: Do a UniProt search, and copy-paste the whole results page here. The accession numbers will be recognized from the text when you submit the job.

Metagenome (where to look?): [?]Please select a metagenome where you want to discover genes coding similar proteins.
You may provide a direct HTTP or FTP link to your custom metagenome, which must be a FASTA file. Gzipped (.gz) and bzipped (.bz2) FASTA files are also supported. Maximum allowed size is 1GB uncompressed.
You may also search in one or more example metagenomes, which are from the iMicrobe portal.

How it works?

MetaHMM is a free online tool for finding genes in metagenomes, using Hidden Markov Models.

The input should be a list of UniProt accession numbers (such as P12345), which refer to supposedly homologous proteins. MetaHMM builds a model on these proteins. Then it tries to find genes in metagenomes which code sequentially similar proteins.

You can choose one or more metagenomes from a list. All the metagenomes are from the iMicrobe portal.

It is now also possible to search in a custom metagenome. You have to upload it to your own HTTP or FTP storage as a FASTA file, and provide a direct link to that. Your file may be compressed with gzip (.gz) or bzip2 (.bz2), but this is optional. The maximum allowed size is 1GB uncompressed. If the file is greater than that, only the first 1GB will be considered. Note that links to popular file hosting sites might not work, only direct HTTP or FTP links are acceptable. Sequence formats other than FASTA are not supported.

Technical details: MetaHMM aligns the input sequences using Clustal Omega, builds a Hidden Markov Model (HMM) using hmmbuild, and uses hmmsearch to search for similar coded proteins in the metagenomes. The toolchain is run on our high-performance multiple core server computer.

If you publish anything using MetaHMM, you are kindly requested to cite our publication:

Balazs Szalkai, Vince Grolmusz: MetaHMM: A Webserver for Identifying Novel Genes with Specified Functions in Metagenomic Samples; Genomics, Vol. 111, No. 4, pp. 883-885, (2019)


  • The UniProt Consortium: Update on activities at the Universal Protein Resource (UniProt) in 2013, Nucleic Acids Res. 41: D43–D47 (2013)
  • Sun et al: Community cyberinfrastructure for Advanced Microbial Ecology Research and Analysis: the CAMERA resource, Nucleic Acids Research 2010; doi: 10.1093/nar/gkq1102
  • Sean R. Eddy: A new generation of homology search tools based on probabilistic inference, Genome Inform. 2009 Oct 23(1):205–11
  • Sievers et al: Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega, Molecular Systems Biology 7 Article number: 539
  • Goujon et al: A new bioinformatics analysis tools framework at EMBL-EBI (2010), Nucleic acids research 2010 Jul, 38 Suppl: W695–9

Terms of use

You can use this service only if you accept the following terms: We do not guarantee anything about this service: We do not state anything about the usability of this service, and we do not state that the results that we may return can be used for any purpose. We cannot guarantee that this service will be available in the future, and we cannot guarantee that your query would generate any output at all.

Privacy: We will not give out your data to anyone, and, regularly, only you can retrieve the results to your query using the unique webpage identifier generated for you. However, we cannot guarantee that others do not intercept the traffic between you and our server. Therefore, do not use our webserver for proprietary data analysis, we cannot guarantee the data integrity and safety for you.