This software suite was developed by Dr. Eithon Cadag for the purpose of predicting whether a protein sequence is associated with virulence, using a statistical classifier which has been first trained on virulent and non-virulent proteins [Cadag E., Tarczy-Horoch P. & Myler P.J., BMC Bioinformatics (2012) 13:321] The tool collects annotations from various public repositories, such as EntrezProtein, UniProt, KEGG, PDB and InterPro, and produces a network of inter-related data about the protein of interest. Subsequently, the annotations are weighed and resulting scores are submitted as input to a statistical classifier based on the PyML framework for machine learning [Ben-Hur A., 2008]. This strategy resulted in the selection of 876 targets for the SSGCID pipeline from eight bacterial and one eukaryotic genomes (Internal Target Batch_09).

