Overview: KOMODO & GROWREC
Culturing microorganisms is a classic microbiology challenge. Most microorganisms in nature still cannot be cultured (~99% by some estimates).
Best-practices for culturing new organisms have been developed, and are embedded in guides such as Bergey's Manual of Systematic Bacteriology.
However, even with these best practices, the typical procedure for culturing a new microorganism still requires a great deal of experience and trial and error.
In recent years, some culturing efforts, particularly for difficult-to-culture organisms, have begun to include genome and pathway analysis,
as well as high-throughput technologies for determining microbial nutrient needs. Metagenomic sequencing technology, meanwhile,
is now enabling the amassment of huge quantities of data about currently nonculturable organisms. Integrating all of these areas will require fresh
approaches to rapidly bring new organisms into culture.
To this end, we provide here a large catalogue of lab media that have been manually developed to date,
and tools to explore what insight these known media can give into predicting new organism-media pairings.
The lab media we report are from the collection of proven culture media in the Leibniz Institute DSMZ , a German non-profit center that stores
and disseminates microbes. The DSMZ media catalogue contains around 1500 media, as well as many individualized variations.
These media are listed for around 23,000 microbial strains. Thus, the DSMZ collection covers the majority of media in common use today, and a large portion of named strains.
We have included in this database the entire collection of DSMZ (with only a few technical exceptions).
Building KOMODO: the Known Media Database
In their original listing (http://www.dsmz.de/?id=441),
the DSMZ media are provided as recipes for producing media in the lab. For analysis purposes, we wished to know the exact nutrient compositions of these media.
Therefore, we did a partially manual, partially automated process to extract this information from the provided PDF files on the DSMZ website.
The result of this extraction process was KOMODO, the large known media database that is provided on this website. In brief, the building process looked like this:
For detailed information on the building of KOMODO, please refer to the supplement of the accompanying paper (LINK after publish).
The result of this work is a relational database of known lab media, their compositions, and the organisms that grow on them.
This database enables a first-of-its-kind analysis of broad features and trends in proven lab media. Here is an overview of the contents of KOMODO:
Development of GROWREC: a predictor of new organism-media pairings
In analyzing KOMODO, we found that there is a strong correlation between the phylogenetic or ecological closeness of two organisms and their chance to share a lab medium:
Using this property, we developed a phylogeny-based predictor of new organism-media pairings. The predictor, which we call GROWREC,
operates on the principle of collaborative filtering .
Collaborative filtering is, for example, popularly employed by Amazon.com, Inc., to recommend products to consumers, based on products known to have been
purchased by other consumers with similar past buying habits. Similarly, given an input 'test' organism for which we aim to predict growth media,
we first select a set of organisms from within KOMODO that are phylogenetically close to the 'test' organism (which is not required to be in KOMODO);
next, we integrate the known medium preferences of those organisms into a 'collaborative score' that indicates which media the test organism is likely to grow on.
The GROWREC schema looks something like this:
Predicted organism-media pairings from GROWREC are graded by 'collaborative scores', which correspond to how likely the predictions are to be accurate.
These scores are calculated as a weighted sum of the number of organisms within some phylogenetic cutoff of the 'test' organism that are known to grow on the given medium,
weighted by the phylogenetic closeness of those organisms to the 'test' organism (the test organism is organism 3, in the example above).
In leave one out analysis, we found the collaborative score to correspond strongly to the number of known organism-medium pairings that were predicted:
Therefore, we recommend that users of GROWREC who are attempting to culture organisms try to use predictions with the highest collab scores.
In laboratory tests, we obtained accuracies up to 90% with our predictions, across a range of collab scores. Some of those results can be seen here:
What we provide in this website
KOMODO in a browsable form
A GROWREC-based tool, which allows users to upload a 16S rDNA sequence or a NCBI taxon ID for any organism, and get predictions of likely lab media (from the known catalogue from DSMZ).
Please note: If you upload a 16S rDNA sequence for prediction by GROWREC, we perform a BLAST search to look for the closest species that are in our database,
in order to do analysis. If you upload an NCBI taxon ID that is already in our database, then we directly check that taxon ID against its neighbors in the taxonomic tree
from NCBI (using the collaborative filtering algorithm explained in the homepage).
If you encounter problems please send their details to firstname.lastname@example.org