Phagonaute is a Genome Browser that displays the genetic context of a selected gene in parallel to the genetic context of its homologs in other sequenced phage genomes.
This tools relies on the homologies detected by the HHSEARCH software (Söding 2005). This kind of homology relationship is infered from the comparison of proteins profiles instead of sequences, which allows to link together genes more divergent than what traditional "sequence vs sequence" homology can afford. This is why those homologies are sometimes called "remote homologies".
In the graphical output produced by Phagonaute, the reference gene as well as its homologs in other genomes are coloured in red and aligned in the center. Homologs of the neighbouring genes are displayed in matching colours to allow easy identification of conserved gene neighbours across genomes.
This tool allows the user to explore remote homologies by varying the probability threshold to declare a match.
Distant homologies between each phage gene and Pfam domains (Finn et al 2014) are also indicated upon mouse over. To examine more closely these Pfam matches, see Neighbourhood parameters section.
The phagonaute project currently has two databases, which can be choosen on top of the main page form:
Browser requirements: Your browser must support SVG, it is advisable to use Chrome or Firefox.
Phagonaute can produce svg outputs, which can be opened with softwares such as Inkscape or Libreoffice
How to use the Phagonaute FormThe main information required to view the genetic context of a particular reference gene are:
Other parameters can be left as default. If you want to refine the search parameters, you can click on the icon next to each section to display more parameters.
To select a gene, you must select of genome among those of the phage dropdown (the genomes are sorted by host then alphabetically). then the gene dropdown menu is actualized to display the genes of the selected phage. The genes are sorted according to their position in the genome. The number right to the name is the probability of the best match as given by the HHSEARCH software for this gene. They are also coloured to reflect this number (the greener the merrier).
The Neighbourhood window size represents the number of neighbouring genes on each side of the reference gene.
By clicking on the the user can access supplementary parameters:
The Minimum Family Size permits to limit the number of colors displayed. The user specifies the minimum family size for colour coding display. This option purposefully simplifies the visualisation for screens with too many shades of colors. For example, if three genes among a Phagonaute vizualisation share HHR homology together, they form a 'family'. As the default minimum family size is 2, this family gathers a sufficient amount of genes to be coloured. However, if the minimum family size was set to 5, this family wouldnt be large enough, so the genes would not be coloured.
The Display HMM-HMM relationships dropdown menu allows to select the search method used to colour the neighboring genes. By default, the genetic context is coloured by HHSEARCH homology relationships Among phage proteins. But each phage protein profile has also been compared with HHSearch to Pfam, using a fixed, 95% probability threshold (a mouse-over on any gene will indicate if a Pfam match occurred). In case an interesting annotation shows up with Pfam, by selecting Between phage proteins and Pfam domains , the user can visualize exactly which domain of the protein is matching to the Pfam family. Note however that the central genes (in red), that have been selected for drawing the scene, will continue to display their initial phage-to-phage relationship.
This section specifies the probability cut-off values for the reference gene as well as for its neighbours.
The supplementary options unlocked by clicking on the are the following:
The Number of iterations parameter fixes the number of steps in the remote homogy search. A value of 1 means that a given protein profile is searched just once against all other phage protein profiles with HHsearch. But sensitivity can increase if a second search is added, whereby all proteins retrieved in the first step are again compared with all others, and new members are added to the list of remote homologs. Experience has shown that a value of 2 (two iterations) is sufficient, and brings to convergence (the user can go up to 3 iterations)
The Restrict by Host input field allows to define a list of bacterial hosts (as listed in the scroll-down phage list, and in the tabs content of database) so as to restrict the search process to the phages infecting these hosts. Genera or species names can be chosen, they must be separated by a semi-colon followed by a space.
Phagonaute can actually perform two types of searches : one leading to a vizualisation, accessed by the visualise context button, and another leading to a tsv table file (which can be opened with a spreadsheet software such as Excel). Informations about how to interpret those outputs are given in the following paragraphs.
The display page shows the genetic context of the reference gene(s) as well as genetic context of its orthologs in other phage genomes. The genetic neighbourhood across the genomes are colour coded to identify conserved gene neighbours.
The 3-letter code above each gene is the suffix of its locus-tag or gene name in the Genbank file. The large number above the central gene (in red usually) indicates which HHSearch iteration brought the homology signal.
This page allows further research ; clicking on a new gene will place its phage and gene names in the Gene selection box, so that by pressing next the Visualize context button, the new result will be computed. Other parameters can also be changed directly on the Display Page before launching a new query. You can thus start a new search from this page
You can also download the Genome Context as an SVG file (manageable with Libre Office) by clicking on the download svg button.
A tabular output of all distant homologs collected with the request gene can be generated by clicking on the Download table of distant homologs button. Each line corresponds to one target and lists the informations about the HHSEARCH match, including which protein has made the connection (after the first iteration, it will be a protein different from the request).PFAM matches are also shown. The 'genetic conservation index' gives the number of gene families in common between the homolog and the request gene, considering the given neighborhood window size and minimum family size.