iMetDB

iMetDB (infection Metagenomics DataBase), a database optimized for metagenomic diagnosis of human infectious diseases, which consists of human and microbial data (bacteria, fungi, protozoa and viruses) records with curated taxonomy annotations.

 

 

Download iMetDB

 

Each database formatted for BLAST. Please download databases from below links.

identification 
database for nucleotide
database for amino acid
iMetDB iMetDBnt (MD5:a98719f85db543bed9d94d4c17400e9a) iMetDBnr(MD5:752295d60f7800780e3ff57a46bd5a8a)
Bacteria iMetDBntb(MD5:22107b575ea0f6c949f4c1954b24e859) 
iMetDBnrb(MD5:42838d22ca1bb551d7b3197c20707a5d) 
Fungi iMetDBntf(MD5:d2a2b74495bcee71ef7a3b0aad52ff14) iMetDBnrf(MD5:27b8bff6d5221a79d4ecc75733282716)
Protozoa iMetDBntp(MD5:c241f17610f7f262a52fc77f2065fc75) iMetDBnrp(MD5:0d41f9c4865132fa31a4bc5265c33b30)
Viruses iMetDBntv(MD5:9be63d5eefa49f24b6cfd383844df278) iMetDBnrv(MD5:a694c2ed51edfe65f52560f204462eec)

* Databases were updated at 2nd September 2013


Usage for BLASTn searching

The metagenomic analysis is based on similarity searching and taxonomic assignment. First, you should to performe BLAST searching against the iMetDB.

For metagenomic diagnosis,

blastall -i input.fasta -o output.out -d iMetDBnt -p blastn -b 1 -v 1 -m 7                

For amplicon analysis of bacterial 16S rRNA,

blastall -i input.fasta -o output.out -d iMetDBntb -p blastn -b 1 -v 1 -m 7               

 

Usage for taxonomic assignment

The relative abundance of detected organisms can be estimated from the header information by a smple sort procedure.

For the taxonomic classification at species level,

cat output.out | grep Hit_def | awk -F ";" '{print $8}' | sort | uniq -c | sort -k1rn            

If you want to perform classification at the other taxonomic rank,

please change "$8" to "$7" (genus), to "$6" (family),  "$5" (order), "$4" (class), "$3" (phylum), "$2" (kingdom).

 

> The example of taxonomic classification (top10, species) for a human fecal sample

cat output.out | grep Hit_def | awk -F ";" '{print $8}' | sort | uniq -c | sort -k1rn | head -10 

  27573 Clostridium perfringens
   6237 Lactobacillus fermentum
   5975 Alistipes finegoldii
   5821 Streptococcus lutetiensis
   5166 Clostridium sp. AN-AS4B
   4795 Streptococcus infantarius
   4293 Saccharofermentans acetigenes
   4244 Enterococcus durans
   3884 Oscillibacter valericigenes
   3793 Norwalk virus

The first line means that "27,573" sequence reads hit "Clostridium perfringens".

 

> For genus

cat output.out | grep Hit_def | awk -F ";" '{print $7}' | sort | uniq -c | sort -k1rn | head -10 

  59861 Clostridium
  13344 Streptococcus
   9526 Enterococcus
   9491 Eubacterium
   9069 Lactobacillus
   9051 Alistipes
   7380 Ruminococcus
   5776 Bifidobacterium
   4901 Bacteroides
   4704 Saccharofermentans

 

> For family

cat output.out | grep Hit_def | awk -F ";" '{print $6}' | sort | uniq -c | sort -k1rn | head -10 

  32680 Clostridiaceae
  20272 Clostridiaceae
  11177 Clostridiaceae
   8992 Ruminococcaceae
   7285 Eubacteriaceae
   6733 Rikenellaceae
   6297 Streptococcaceae
   6204 Enterococcaceae
   5934 Streptococcaceae
   5343 Bifidobacteriaceae

 

> For order

cat output.out | grep Hit_def | awk -F ";" '{print $5}' | sort | uniq -c | sort -k1rn | head -10

 117204 Clostridiales
  38280 Lactobacillales
  17929 Bacteroidales
   8273 Bacillales
   3907 Viruses,Other,Other,Other
   2633 Bifidobacteriales
   2567 Primates
   2444 Bifidobacteriales
   2232 Selenomonadales
   2008 Thermoanaerobacterales

"Viruses,Other,Other,Other" means that there are no information for "phylum", "class", and "order".

 

> For class

cat output.out | grep Hit_def | awk -F ";" '{print $4}' | sort | uniq -c | sort -k1rn | head -10

 120897 Clostridia
  46553 Bacilli
  17929 Bacteroidia
   7765 Actinobacteria
   5104 Gammaproteobacteria
   3912 Viruses,Other,Other
   2567 Mammalia
   2276 Betaproteobacteria
   2247 Negativicutes
   1933 Mollicutes

 

 

> For phylum

cat output.out | grep Hit_def | awk -F ";" '{print $3}' | sort | uniq -c | sort -k1rn | head -10

 169947 Firmicutes
  18886 Bacteroidetes
  10027 Proteobacteria
   7765 Actinobacteria
   3942 Viruses,Other
   2567 Chordata
   1933 Tenericutes
    596 Basidiomycota
    389 Synergistetes
    343 Eukaryota,Other


Cite iMetDB

Motooka D., Gotoh K., Goto N., Yasunaga T., Horii T., Nakaya T., Wichukchinda N., Iida T., and Nakamura S., submitted.