Prediction of secreted proteins

An updated version of EffectiveT3 facilitates the recognition of N-terminal signal peptide in Effective. For the update we have assembled new training datasets, combining 504 verified secreted proteins from T3SEdb along with our original training data. Also the new model is based on a Naive Bayesian Classifier, just trained with more data. When performing a leave-one-out cross validation test, this yielded in an accuracy of 0.87 that is comparable to our previous report. In addition, a leave-one-taxon-out test was applied to prove that the model is still based on ubiquitous features of the signal and can thereby be applied to any taxon. In this test always all proteins from one taxon are kept out from the training and are then exclusively used as test data. Overall, an average area under the curve (AUC) of 0.80 was obtained.

The new model is now embedded into Effective and also available for download. The default minimal score from the Naive Bayesian Classifier for the class ‘secreted’ is 0.9999 in the new model. This default value is called ‘selective’ at the webpage, whereas 0.95 is called ‘sensitive’. The threshold can also be freely chosen.

Evaluation of the new EffectiveT3 model 2.0 and comparison to the previous model 1.0

Receiver operating curves (ROC) with model scores for EffectiveT3 models 2.0.1 (new) and 1.0.1 (old)

Performance of the EffectiveT3 model 2.0.1 in cross validation:

  • TN 322 / FN 47 / TP 127 / FP 26
  • specificity= 93%
  • sensitivity= 73%
  • accuracy= 86%
  • F1_score= 78%
  • MCC= 0.66

Release notes:

  • 2015, Sep 21: Model 2.0.2 released (re-compiled for compatibility with Java 1.6, no changes otherwise)
  • 2015, Sep 15: Model 2.0.1 released