Modeling protein glycosylation

AllosMod can also be used to model protein glycosylation. For full details, see Guttman et al. 2013 Structure. This tool is meant to sample the range of conformations accessible to sugars. There are a few ways to run the protocol depending on the desired conformational flexibility of the protein and/or sugars:

  1. Input: a protein without sugars. Output: a semi-rigid protein (motions will be limited to loops and surface side chains) with flexible sugars.
  2. Input: a protein with sugars. Output: a flexible protein with rigid sugars.
  3. Input: a protein without sugars. Output: a flexible protein with flexible sugars.

For option 1, input the protein sequence only and add a file describing sugar connectivity (glyc.dat). See below for more information.

For option 2, run option 1 and then use a single structure from the output as input for another run. In the input sequence, specify the protein sequence and a “.” for each sugar. The “.” specifies a block residue in MODELLER and will remain rigid. Such rigid bodies can cause problems during dynamics so care should be given here. Also include the allosmod.py file that was generated in the first step.

For option 3, first run simulations of the protein only using AllosMod. Then take the resulting structures and do option 1 (one for each structure) using AllosMod. Take all the glycosylated protein structures as input into the FoXS server, if desired.

Glycosylation input files

Glycosylation is controlled by adding a file glyc.dat to the files used in the regular AllosMod protocol. For help making the glyc.dat file from a PDB file that includes sugars, see the GlycoSciences webpage and set the “Assign connections by atom distances” to “HETATOM only.” This tool will also assess the structural quality of the sugars.

Parameter file

A number of options in the AllosMod parameter file (input.dat) pertain specifically to glycosylation. They are all optional.

ATTACH_GAPS = True/False (default True)
Gaps will be inserted into the alignment file at the sugar attachment sites. This allows more flexibility of the protein to accommodate the sugars.
SAMPLING = X
defaults to moderate_cm: Sampling is performed using simulated annealing at moderate temperatures. The energy function for the protein is composed of restraints from MODELLER while the energy function for sugar molecules is a combination of dihedral terms from CHARMM and harmonic restraints as described in the published paper (Guttman et al. 2013 Structure).
REPEAT_OPTIMIZATION = X
is the number of optimization steps to be performed. Default is one.

Glycosylation file

The glycosylation file (which must be called glyc.dat) contains one line per sugar monomer grouped by chains of monomers. The line with the protein-bonded monomer specifies the beginning of a chain and is followed by all monomers within the chain. Three columns define the sugar types and connectivity: 1) monomer name, 2) O1 bond type, and 3) O1 attachment residue index:

  • monomer name:

    • NAG - β-N-Acetyl-D-Glucosamine
    • NGA - β-N-Acetyl-D-Galactosamine
    • GLB - β-Galactose
    • FUC - α-Fucose
    • MAN - α-Mannose
    • BMA - β-Mannose
    • NAN - α-Neuraminic acid
  • O1 bond type:

    • NGLA or NGLB - axial or equatorial bond to ASN*

    • SGPA or SGPB - bond between alpha or beta position and SER*

    • TGPA or TGPB - bond between alpha or beta position and THR*

    • 16ab - (i) 1->6 (i-1) axial at C1 and equatorial at C6

    • 16fu - (i) 1->6 (i-1) axial at C1 and equatorial at C6

    • 14bb - (i) 1->4 (i-1) equatorial at C1 and equatorial at C4

    • 13ab - (i) 1->3 (i-1) axial at C1 and equatorial at C3

    • 13bb - (i) 1->4 (i-1) equatorial at C1 and equatorial at C3

    • 12aa - (i) 1->2 (i-1) axial at C1 and axial at C2

    • 12ba - (i) 1->2 (i-1) equatorial at C1 and axial at C2

    • sa23 - og sialic acid alpha 2->3 equatorial

    • sa26 - og sialic acid alpha 2->6 equatorial

      *entry specifies the beginning of a new chain

  • O1 attachment residue index:

    For every new sugar chain, specify the amino acid index of the residue bound to the first O1 atom in the sugar. The amino acid index must correspond to the input sequence in the alignment, i.e. the first residue in the input sequence is 1 and the Nth is N. For sugars within the chain, specify the index of the sugar monomer bound to the O1 atom defined as such: the first listed monomer in a chain has index 1, the second listed monomer in a chain has index 2,…

Output

On output a directory is created for each run, each containing a glycosylated structure named pm.pdb.B99990001.pdb.