ThermoScan

Scan biomedical publications to retrieve protein thermodynamic data.

Server Input

ThermoScan server takes in input HTML or XML files from PubMed Central (PMC). This infomation can be provided thuogh the the follwoing identifiers:

PMCID: PubMed Central identifier
PMID: PubMed identifier
DOI: Digital Object Identifier

The HTML file can be directly uploaded through the Browse button. In case the user has an Elsevier API Developer key the fulltext HTML file can be retrieved through the DOI identifier.
To allow the analysis of articles with restricted access, we developed a Google Chrome extension that allows to submit the visualized fulltext HTML page directly to the ThermoScan server. The Google Chrome extension is available online on GitHub

Server Output

When a fulltext HTML file is provided ThermoScan searches for significant words regarding protein thermodynamic concepts (see Methods) in paragraphs and tables. If one of the terms (i.g. two-state, unfolding, denaturant, midpoint, dichroism) is found, the server returns the total score assigned to the manuscript and partial scores assigned to the detected elements.
In details the server output is composed of 2 main tables. The first table includes 5 rows reporting the authors, title, journal, volume and the identifier of the manuscript. The row "Summary" contains the total number of detected elements (Paragraphs/Tables), the total score of the manuscript, a binding score and the maximum score associated to the detected elements. A score window menu allows to select only paragraphs and tables with score greater equal than a given threshold. The last row includes a link that allows to download the output of ThermoScan in plain text format. An example of the summary table is reported below.

The second table includes the paragraphs and tables extracted from the full text manuscript. The table is composed by 4 columns:

element: The type of match (paragraph/table)
number: The progressive number indicating the element number in the fulltext
score: The score associated to the selected element
text: The text extracted from the fulltext HTML

In the text extracted from the paper (4th column) the words relative to thermodynamic concepts, units and parameters are highlighted in red. Computational concepts (with negative scores) are reported in blu. Words in green represent terms related to the presence of protein mutation data. Protein binding concepts are highlighted in brown. An example of output table is reported below.