PGExpress

Predicting Gene Expression from nucleotide sequence.




Help


The PGExpress server is a regression method that predicts the log2-fold-change of the gene translation efficiency (L2TE) with respect to its median value (2,355) from nucleotide sequence. The server takes in input multiple pairs of DNA sequences including the Ribosome Binding Site (RBS) and the Coding Sequence separated by comma. Multiple predictions can be performed by providing multiple lines in input.

The PGExpress server internally runs RNAfold and RNAduplex to calculate the 6 folding and 6 anti Shine-Dalgarno binging energies. For each pair of RBS and Coding sequences our method returns the predicted L2TE The returned prediction is converted in a binary classification output cosidering the prediction displayed with Low and High those with L2TE≤0 and L2TE>0 respectively. Thus, the translaton efficiency (TE) threshold for the binary classification is set to 2,355 which represent the media value of the translation efficiency in our training set.

The output of PGExpress server returns a table including in each row the following data:


  • RBS: the Ribosome Binding Site sequence.

  • Coding: the Coding sequence of the gene.

  • Folding: the folding free energy for the RBS + the first 33 nucleotide of the coding sequence (C33) calculated using RNAfold.

  • Binding: the hybridization free energy of the anti Shine-Dalgarno sequence (CCTCCTTA) for the RBS + the first 33 nucleotide of the coding sequence (C33) calculated using RNAduplex.

  • Prediction: either Low (L2TE≤0) or High (L2TE>0).

  • L2TE: log2-fold-change of the gene translation efficiency with respect to the median value (2,355).


With a click on the + button each row can be expanded showing the 6 folding and 6 hybridization free energies features calculated for each prediction. They correspond to the 6 block in which the RBS and coding sequences are segmented (see Methods). An example of output is reported below.