meta-PPISP Input Format: PDB


Protein structures in PDB format can be downloaded from The Protein Data Bank.

Here is an example of the PDB format (a line of numbers at the top is for marking columns only; it is not part of the PDB file):


12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM      1  N   ASP E   1      36.400  74.848  33.180  1.00 50.94           N
ATOM      2  CA  ASP E   1      36.906  75.294  34.515  1.00 50.47           C
ATOM      3  C   ASP E   1      35.879  76.187  35.223  1.00 49.81           C
ATOM      4  O   ASP E   1      34.970  75.684  35.893  1.00 50.55           O
ATOM      5  CB  ASP E   1      38.248  76.022  34.359  1.00 50.93           C
ATOM      6  N   GLN E   2      36.015  77.505  35.050  1.00 48.24           N
ATOM      7  CA  GLN E   2      35.110  78.493  35.659  1.00 46.03           C
ATOM      8  C   GLN E   2      33.746  78.491  34.960  1.00 44.06           C
ATOM      9  O   GLN E   2      32.850  79.274  35.320  1.00 44.42           O
ATOM     10  CB  GLN E   2      35.700  79.914  35.555  1.00 47.00           C
ATOM     11  CG  GLN E   2      37.155  80.078  36.028  1.00 47.43           C
ATOM     12  CD  GLN E   2      38.178  79.456  35.064  1.00 48.02           C
ATOM     13  OE1 GLN E   2      37.851  79.116  33.909  1.00 47.98           O
ATOM     14  NE2 GLN E   2      39.424  79.298  35.538  1.00 48.25           N
ATOM     15  N   PRO E   3      33.602  77.600  33.973  1.00 41.02           N
ATOM     16  CA  PRO E   3      32.391  77.479  33.163  1.00 37.84           C
ATOM     17  C   PRO E   3      31.095  77.246  33.944  1.00 34.34           C
ATOM     18  O   PRO E   3      31.037  76.403  34.849  1.00 34.24           O
ATOM     19  CB  PRO E   3      32.580  76.398  32.082  1.00 39.10           C
ATOM     20  CG  PRO E   3      31.488  76.381  31.000  1.00 40.44           C

The fields in the PDB file are: record type, atom number, atom name, residue name, chain ID, residue number, (x, y, z) coordinates, occupancy, temperature factor, and atom type.
Note that the chain ID is at the 22nd column (which is "E" in the example). In some PDBs, the 22nd column is blank; in these cases, you should leave an underscore "_" as chain input.
In meta-PPISP, only lines that begin with 'ATOM' are used. In addition, records after (x, y, z) coordinates are irrelevant for prediction purpose.