ThermoScan

Scan biomedical publications to retrieve protein thermodynamic data.



Server Usage


The web interface of ThermoScan allows to scan a set of articles from the PubMed Central Open Access Subset to retrieve protein thermodinamic data. All the details about the featues of the ThermoScan server are decribed below.


Server Input

ThermoScan server takes in input a list of the following publication identifiers:

  • PMCID: PubMed Central identifier

  • PMID: PubMed identifier

  • DOI: Digital Object Identifier

The server internally converts PMIDs and DOIs to PMCIDs. Nevertheless for reducing the computing time the usage of PMCIDs is strongly reccommended. The Browse button allows to upload a file containing the list of identifiers separated by newlines. A maximum number of 1,000 articles for each job are allowed.
To allow the analysis of articles with restricted access, we also developed a Google Chrome extension that allows to submit a the visualized fulltext HTML page of a single article directly to the ThermoScan server. The Google Chrome extension is available online on GitHub


Server Output

When a list of PMCIDs is provided, the server runs ThermoScan over each article in the list. The output of the ThermoScan webserver consists of two tables (see the example). The first table provides information about the status of the submitted job including the jobID the order in the queue system and the process ID. When the job is completed the process id is rep[laced with link which allows to download the results of job. The second table reports a set of scores corresponding to each article. In detail this tables includes the following scores:

  • pubid: the identifier of the article;

  • elements: the number of detected elements in the articles (paraagraphs/tables);

  • mean: the average score for the detected elements;

  • max: the maximum score for the detected elements;

  • binding: the binding score related to the presence of protein-protein interaction data;

  • computational: the computational score indicating the presence possible in-silico data.

For each article a plus button allows to visualize the information about the article and a link to the output page of each in individual article. To facilitate the analysis of the results a link to the output file allows to download the scores associated with each article in tsv format. An example of the ThermoScan webserver output is reported below.






Single Manuscript Output

When a fulltext HTML file is provided ThermoScan searches for significant words regarding protein thermodynamic concepts (see Methods) in paragraphs and tables. If one of the terms (i.g. two-state, unfolding, denaturant, midpoint, dichroism) is found, the server returns the total score assigned to the manuscript and partial scores assigned to the detected elements.
In details the server output is composed of 2 main tables. The first table includes 5 rows reporting the authors, title, journal, volume and the identifier of the manuscript. The row "Summary" contains the total number of detected elements (Paragraphs/Tables), the total score of the manuscript, a binding score and the maximum score associated to the detected elements. A score window menu allows to select only paragraphs and tables with score greater equal than a given threshold. The last row includes a link that allows to download the output of ThermoScan in plain text format. An example of the summary table is reported below.

The second table includes the paragraphs and tables extracted from the full text manuscript. The table is composed by 4 columns:

  • element: The type of match (paragraph/table)

  • number: The progressive number indicating the element number in the fulltext

  • score: The score associated to the selected element

  • text: The text extracted from the fulltext HTML

In the text extracted from the paper (4th column) the words relative to thermodynamic concepts, units and parameters are highlighted in red. Computational concepts (with negative scores) are reported in blue. Words in green represent terms related to the presence of protein mutation data. Protein binding concepts are highlighted in brown. To facilitate the extraction of the data from the manuscript a link to the output file allows to download the scanning report in text format. Furthermore a link to the Table fild allow to download a table in tsv format. An example of output table is reported below.