iHMMune-align web-based access

Help, FAQs and Further Information

iHMMune-align Output

The output provided by iHMMune-align can be presented in two main formats. The format selected depends on how the results are to be used. If the results are going to be used for further analysis the 'CSV' format is the most useful as data may be placed into a text file for manipulation such as via scripting languages or can be imported into an application such as a spreadsheet program. If the results are just to be viewed within the web-browser than the HTML based outputs are more appropriate.

CSV Output

A brief summary of each sequence is also provided on the results page along with the complete CSV results. The summary displayed on the web-page as part of the results doesn't give full details of all data contained within the CSV results. Green summary boxes show the best matching result for an input sequence, pink boxes display further results in the event that a number of alternative IGHV alignments were requested.

The summary boxes display the germline gene names and the number of mutations for the IGHV, IGHD and IGHJ. If an acceptable D-REGION is found the N1, D and N2 nucleotides are displayed ('.' in the D represent nucleotides lost from the IGHD). In cases where there is a high likelihood that the D-REGION found may be in error the complete sequence between the IGHV and IGHJ ends is shown. The conditions required for D-REGION identification vary depending on the IGHD Identification criteria being used. Please see here for a description of these options. The productivity of the input sequence is also shown indicating if the sequence is in-frame and if it contains any stop codons.

The CSV results can be accessed from the text area at the bottom of the results page. The CSV format gives the results for each sequence on a single line. Each individual piece of data is enclosed within quotation marks (") and is separated from the next piece of data by a semi-colon (;). The data can be easily imported into Microsoft Excel by using the 'Text Import' function and selecting semi-colon as the delimiter. The file format also easily lends itself to parsing using scripting languages such as Perl or VBScript to manipulate the data. To save the CSV data that is displayed within the text area to a file on your local computer simply click the 'Save to File' button. This will bring up a dialog to ask for a location to save to.

Mutation details and alignment display

The mutation details and alignment display will produce a table showing comparison of the input sequence to its germline components. Where possible the input sequence V-REGION is gapped to allow for IMGT numbering of the input sequence. This is only possible when using either the iHMMune-align Expressed IGHV Repertoire or the Combined Repertoire provided on the iHMMune-align site. Template sequences are used to introduce gaps at the appropriate positions into an input sequence. The framework (FR) and complementarity determining regions (CDR) displayed use a broad CDR and FR definition that was arrived at through taking the outer most borders of the CDR regions from combining IMGT and Kabat definitions (FR1: 1 -> 26, CDR1: 27 -> 40, FR2: 41 -> 55, CDR2: 56 -> 74, FR3: 75 -> 104, CDR3: 105 ->). Codons are numbered according to IMGT numbering and the amino acid for the germline sequence is displayed.

The input sequence is displayed complete with N1 and N2 regions if present. Matches between an input nucleotide and a germline nucleotide are indicated by a '.' on the 'Germline' line. At postions where there are differences between the input sequence and the germline the nucleotides are displayed and these are highlighted in red. The '-' character is used to pad the start of non-full length sequences, while the gaps introduced for IMGT number are '.' within the input sequence. A '?' is used to indicate N nucleotides that are not sourced from the germline.

The region that a nucleotide falls into is indicated by the 'States' line. There are four different states. 'V' indicated that the parititioning process has assigned this nucleotide to the V-REGION, 'D' to the D-REGION and 'J' to the J-REGION. 'N' indicates either N1 or N2 nucleotides. The presence of 'hotspots' within the germline sequences are indicated on the final line of the display. If a hotspot is present at a position the target nucleotide is displayed in blue. Hotspots are defined as DGYW, WRCH and WAN. They are only denoted within the germline regions and not within the N-REGIONs.

In addition to the alingment display, details of the mutations that occur within the input sequence are given. These are called the 'Mutation Paths'. The region that the codon lies within is shown in brackets after the codon position. For each codon that contains at least a single mutation, the details at that mutation are given. When only a single mutation has occurred within a codon the output will indicate whether the mutation lead to a silent (S) or replacement (R) mutation at the amino acid level. The codon's trinucleotide and amino acid are shown.

When a codon contains more that a one mutation, the possible paths that could have lead to that mutation are displayed. For example, if a codon contains two mutations. There is no means of knowing the order in which the mutations occured. The mutation paths therefore considers both possible orders and the consequences of each. Take for instance a double mutation at the codon AGT, where the first position is changed to a G and then second position to a C. The overall result is a replacement mutation of Serine to Alanine. However, the intermeadiate may have been via either Glycine (if the A->G was first, followed by the G->C) or Threonine (if the order was G->C and then A->G). Each path for a double mutation will indicate the type of mutation at each step, so the example of AGT to GCT would be two RR paths. A triple mutation of a codon will result in six paths being displayed with three changes in each.

HTML Output

HTML output will display a basic summary of the results for a sequence. This will include the names of the IGHV, IGHD and IGHJ genes that were found to be the best matches to the germline set and the number of mutations within each. The junction region, including the IGHV end, N1, IGHD, N2 and IGHJ are also displayed. Other data displayed are the number of unknown (either N or X) nucleotides that were included in the input sequence, whether the IGHJ is in-frame and the number of stop codons present. The tabulated HTML based results will print a table that contains details of the alignment.

Please note that there is currently no facility for the display of mutation information and complete alignments of the input sequence to the germline genes when using the HTML based output. When multiple IGHV genes are requested for each input sequence these will be displayed directly after the first result for the input.
Contact

For queries about iHMMune-align:
Andrew Collins
Katherine Jackson