Prediction of secreted proteins

EffectiveELD predicts secreted proteins based on eukaryotic-like domains (ELD). These domains, occur in eukaryotic genomes and have a higher frequency in genomes of host-associated bacteria compared to non host-associated bacteria. This observation indicates that proteins encoding such domains might be used as effectors and thus play their biological role in the host cell.

Besides the update of the genome repository and the protein domain database we have changed the presentation of ELD in the Effective web portal. Mean and standard deviation of the domain frequency in not host-associated genomes are now shown and can be exported into different file formats. This is mainly relevant for the analysis of proteins from metagenomic samples, in which assembly artifacts may artificially increase the frequency of typically single-copy non-effector genes as house keeping genes. In these cases, the reported Z-score would indicate significant enrichment of such genes, which are certainly no effectors. This type of false positive matches can now be easily detected and ignored.

In the “protein mode” of Effective, analyzing arbitrary collections of protein sequences, only ELD with significant enrichment in at least one host-associated genome from the Effective genome repository are reported. In the new “genome mode” of Effective, the Z-scores for the enrichment of ELD are automatically calculated de novo for all protein domains occurring in eukaryotic genomes. This allows the prediction of novel ELD that have not yet been observed in any of the host-associated genomes from the Effective genome repository.