PDB Decomposition Tool
Decompositions for all of the current PDB entries are available for download here: decomp.tar.gz (Size: about 10GB) In this version all the options available below were enabled, plus missing atoms are inserted.
NOTE: requests are handled on a first-in first-out basis. If there are a lot of jobs queuing, your job may take a long time to start, please be patient. In one request 20 PDB codes are allowed, no IP may start more than 5 requests within 24 hours, and an IP may not start a new job until it has an unfinished job.If you have a large number of PDB codes to decompose, please download the weekly updated decomp.tar.gz file.
The PDB Structural Decomposition Tool repairs and decomposes files from the Protein Data Bank PDB formats.
The Protein Data Bank started to function as the depository of the crystallographic data, complementing journal publications: researchers solved the structure of a protein, wrote a paper on the result, and deposited the data of the solution in the publicly available PDB.
The irregularities of the structure deposited (such as lacking atomic coordinates, broken chains, unidentified substructures) are mostly remarked in the cited publications and also in the remark-fields of the PDB file. The textual annotations in the scientific publication elsewhere or in the remark-fields in the very same PDB-file, however, make the automatic processing of the protein-structures very difficult.
This statement may be a little bit confusing, since atoms, carrying the HET label are not supposed to be in the peptide-chain, so those structures that contains HET atoms other than the oxygen of the water would qualify for being a complex. Unfortunately, this is not the case. Metal ions, modified residues (in a surprisingly large number), and small molecules added in the crystallization all contain hetero-atoms, and they are not considered to be ligands.
What the program does?
With our program, protein-ligand complexes are identified reliably, and the ligands are deposited in separate files. Missing residues and atoms in chains are handled properly, that is, even if several atoms are missing from a chain our algorithm will still not recognize the parts as distinct chains. Placeholders are inserted into chains for missing residues/atoms, denoting that the objects were not measured crystallographically, but - according to the more reliable sequence information - they should be there; this way our algorithm "repairs'' faulty PDB's, or recognizes that flexible chain sequences are present. We should remark, that missing atoms are usually a sign of mobile loop or string in the protein-crystal, since flexible atoms will not give usable electron density maps. Consequently, mapping missing atoms this way may help to automatically identify flexible protein parts. Ligands are identified without using the HET-atom labels, properly handling modified residues and small artifacts, due to crystallization protocols. CONECT records of the ligand-atoms are computed automatically (these records for the ligands generally are not present in the PDB file)
How to separate ligands, ions and co-factors?
Our program first selects atoms from the PDB entry that are parts of a protein or DNA chain (note, that for this we do not use the chain-identifier, but rather the SEQRES data and some refined graph-theoretical algorithms). First we select the water molecules -- the ones with residue name HOH --and remove them from the set of possible ligand atoms. Then metal and other small ions are selected, that will not be considered as ligands. A complete list of residue names, that were considered as ions (so not as ligands) is given in the file ion_list.txt. All the remaining atoms will form the set of ligand atoms. Within this set, we use a graph-theoretical component detecting algorithm, so a ligand is defined as a connected component of the graph formed by the ligand atoms as vertices and the covalent bonds between the ligand atoms as the edges.
Simply provide a list of PDB codes in the appropriate box, and check the desired options. (PDB codes should be separated either by spaces or newline characters.) After pressing the "Schedule job" button your request will be inserted into a cue. Please be patient as it may take some time for our server to get to your request. Progress is monitored in the "Log window". The result will be a link in the "Log window" to a tar.gz file.
The result file contains one directory for each of the pdbs listed. Each of these directories contains an error log with ".pdb.error" extension, the decomposed pdb file with ".pdb" extension, and if "Export ligands" or "Export ions" option was specified, than a separate file is present for each of the ligands or ions. If there was a fatal error while processing the PDB file, than only the error file is present!
Note: Result files are stored for 3 days, and log files are stored for 30 days.
You can use this decomp tool and the related database freely for any non-commercial research and education purposes, no registration is necessary. However, you are kindly requested to cite the following publication in any subsequent article using the output generated by the tool:
Zoltan Szabadka, Vince Grolmusz: High Throughput Processing of the Structural Information of the Protein Data Bank, Journal of Molecular Graphics and Modeling.25 (2007) pp. 831-836 http://dx.doi.org/10.1016/j.jmgm.2006.08.004
If you use this decomp tool, or the output generated by the tool, for commercial research (contract research done at a non-profit institution, e.g., at a university, or any research done at a for profit institution, e.g., a pharmaceutical company) you are required to pay a registration fee.
Please contact us for payment options (commercial registration is handled by our commercial partners).
Both for non-commercial and for commercial users, the use of this tool is subject to the following conditions; if you do not agree these conditions, please do not use this tool and the database:
LIMITATIONS OF LIABILITY: IN NO CASE SHALL OUR LIABILITY EXCEED THE AMOUNT OF THE REGISTRATION FEES ACTUALLY PAID BY USER. IN NO EVENT WE, OR OUR DIRECT OR INDIRECT SUPPLIERS BE LIABLE FOR ANY INDIRECT, SPECIAL, INCIDENTAL, CONSEQUENTIAL OR PUNITIVE DAMAGES OR OTHER PECUNIARY LOSS ARISING OUT OF OR IN CONNECTION WITH ANY USE OR INABILITY TO USE THIS SOFTWARE TOOL, EVEN IF WE OR SUCH SUPPLIERS HAVE BEEN ADVISED OF THE POSSIBILITIES OF SUCH DAMAGES. WE AND OUR SUPPLIERS ARE NOT RESPONSIBLE FOR ANY COSTS INCLUDING, WITHOUT LIMITATION, LOSS OF BUSINESS INFORMATION, COST OF RECOVERING SUCH INFORMATION, BUSINESS INTERRUPTION, LOSS OF BUSINESS PROFITS, THE COST OF SUBSTITUTE SOFTWARE, OR CLAIMS BY THIRD PARTIES.