HELP:

Input:

The program expects a PDB file as input file. This PDB file must, at least, contain the fields: HEADER, SEQRES and ATOM. It may contain several chains, only one chain will be executed and its name is mandatory. If the analyzed protein does not have a PDB code, the PDB code in the header field (first subfield) must be replaced by a dummy code, e.g., 1zzz

Example of Input:

Domain 690-785 of 1jjcB (file 1JJCB_1zzzB.txt.ent)
	HEADER    LIGASE                                  04-JUL-01   1zzz
	SEQRES  54 B  785  HIS PRO ALA ALA PHE ARG ASP LEU ALA VAL VAL VAL PRO
	...
	ATOM   7499  N   HIS B 690      42.868  12.459  -8.971  1.00 82.16          
	ATOM   7500  CA  HIS B 690      42.787  11.012  -9.167  1.00 82.02          
	...
	TER    8252      PRO B 785                                                  
	END 

Output:

When the task is completed, an email is sent that allows the user to connect to the server web site, check out the results and download the folder "tarball results file". This folder contains the following files:

*.mathlab

A matlab file containing the list of all the cliques found by VAST with a Pcli >= -10 and a rmsd < 5 A by decreasing order of Pcli. Each clique is provided with the following information, from left to right, columns:
     1 = name of Query.
     2 = name of Target.
     3 = clique number.
     4 = number of aligned residues.
     5 = number of residues in Query.
     6 = number of residues in Target.
     7 = rms.
     8 = rmsd.
     9 = Zgib.
     10 = Pcli.
     11 = Scli.
     12 = Number of aligned SSE.
     13 = Number of SSE in the Query.
     14 = Number of SSE in the Target.
     15 = strings+ of res. of the Query clique in VAST numbering (SEQRES).
     16 = strings+ of res. of the Target clique in VAST numbering.
     17 = strings+ of res. of the Query clique in pdb numbering (resSeq).
     18 = strings+ of res. of the Target clique in pdb numbering.

+ N terminal followed by the C terminal residue number of each aligned elements (SSE plus some parts of aligned loops when the Gibbs sampling is activated)

*Domains.txt

Contains the domain parsing by the three methods PCM, SMF and SVD. They are independent programs and the domains D1, D2,..., Dn do not correspond to each others, they may also differ by their boundaries or even their number (see reference Tai et al, below). The view of the N matrix and the list of structural neighbours can help to make a decision (to be published).

*SN.txt

Contains the list of structural neighbours in decreasing order of the percent of aligned residues, Ar/Tr. For each structural neighbour is indicated, from left to right:
	 1 = clique index in *mathlab file
	 2 = query name
	 3 = target name
	 4 = clique number
	 5 = number of aligned residues (Ar)
	 6 = query number of residues (Qr) 
	 7 = query clique length after including gaps less than 
           40 residues (Qc)
	 8 = target number of residues (Tr)
	 9 = target clique length after including gaps less than 
           40 residues (Tc)
	10 = ratio Ar/Tr
	11 = Tc/Tr
	12 = Ar/Qc
	13 = Qc/Qr
	14 = 2*Ar/(Qc+Tc)
	15 = rmsd
	16 and following columns = N-terminal and C-terminal residues of the query 
clique including gaps (pdb numbering). When gaps are larger than 40 residues,
 the clique is subdivided into sub-cliques, as many plus one as there are gaps 
greater than 40 residues, with the print of their N and C termini for each one.

*SN.png

Alignments of the structural neighbours in decreasing order of the ratio of aligned residues (Ar/Tr) on the query sequence including gaps (pdb numbering). Red for a ratio greater than 70%, yellow for a ratio between 70 and 50%, green for a ratio between 50 and 45 % and blue for a ratio between 45 and 40%. The alignments are presented in series of 20 (see fig.3 in description) with a link to PDBsum and the VAST structural alignment by clicking on the pdb name and the corresponding line of the alignment respectively. In the VAST alignment window, by clicking on "3 D Alignment" one gets a Jmol representation of the 3D structure. For the residue-residue alignments, the aligned residues are in upper case, b stands for beta strand, H for helix, X for residue without X ray coordinates and - for gap respectively. The color code for the 3D structures is: red for the query, orange for the corresponding aligned sequence of the query on the target (upper case residues), blue for the target and cyan for the corresponding aligned sequence of the target on the query (upper case residues).

*_Nmatrix.png

A heat-map of the co-occurrence N matrix with its code bar and residue numbers from SEQRES

*_Nmatrix_contour.png

A contour map of the co-occurrence N matrix with its code bar and residue numbers from SEQRES.

*ent

Your input file
Please cite : 

Tai CH, Sam V, Gibrat JF, Garnier J, Munson PJ and Lee BK. 
Protein domain assignment from the recurrence of locally similar structures. 
PROTEINS: Structure, Function, and 
 Bioinformatics 2011; 79;853-866

Franck Samson; Richard Shrager; Chin-Hsien Tai; Vichetra Sam; Byungkook Lee; Peter J. Munson; Jean-Francois Gibrat; Jean Garnier. 
DOMIRE: a web server for identifying structural domains and their neighbors in proteins. 
Bioinformatics 2012; doi: 10.1093/bioinformatics/bts076
For comments and suggestions please contact: jean.garnier@jouy.inra.fr with a copy to franck.samson@jouy.inra.fr.