Alignment of the 27 sequences.

 From: HIV-1 and HIV-2 LTR nucleotide sequences: assessment of the alignment by N-block presentation; « retroviral signatures» of overrepeated oligonucleotides, and a  probable important role of scrambled stepwise duplications/deletions in molecular evolution

Ivan Laprevotte, Maude Pupin, Eivind Coward, Gilles Didier, Christophe Terzian, Claudine Devauchelle, and Alain Hénaut

Mol. Biol. Evol. 18(7):1231-1245. 2001

The 27 aligned nucleotide sequences are listed at the left column and regrouped as in Table 1. The alignment is exhaustively described in the text. The sequences are numbered line by line: numerals, representing the last base of each line, are placed to the right of the lines; the first base of each sequence is numbered + 1. The three regions of the LTR are indicated as in Fig. 1. PPT and PBS are indicated at the upper line of their corresponding sector in the alignment, at positions CIVCG-11 and 682, respectively. The U3 region, indicated in the same way, extends from the beginning of the LTR (position CIVCG-28) to the U3/R junction (CIVCG-500/501) where the transcription starts at the 5? LTR. The R region ends at the R/U5 junction (CIVCG-598/599) where the transcription ends at the 3? LTR. The U5 region ends at the end of the LTR (CIVCG-680). Each of the two « HIV-1 » and « -2 » consensuses is indicated beneath the corresponding group. The HIV-1-2 nucleotide and amino acid alignment (the stop codons are printed in italics) was constructed manually (Materials and Methods). To begin with, it was based on eight consensus nucleotide elements (or groups of elements), that is 18 positions highlighted in primate lentiviral LTR sequences (Frech, K., Brack-Werner, R., and T. Werner. 1996. Common modular structure of lentivirus LTRs. Virology. 224 : 256-267).These are referred to within boxes at the lower line of their corresponding sector (18 positions); the actual published nucleotide segments are underlined as well as the references (left column) of the prototype sequences in which they have been delineated (Frech et al., 1996). Eventually, the alignment is based on the nucleotides printed on the line referred to as « common sectors ». These nucleotides cover 643 positions (~58%) of the alignment. In the corresponding columns, the letters identical to these nucleotides, are printed in bold-face type. The conserved amino acids are presented, immediately below the second nucleotide of the corresponding codon, by using the one-letter code (line « am. acid com. se. »). In order to assess and to locally correct the nucleotide alignment while increasing the signal-to-noise ratio, more-than-four-letters alphabets are additionally used: that of the polypeptide alignment in the coding reading frame (as just mentioned), and that obtained by the 8-12-ranked N-block presentation for the whole of the sequences (Materials and Methods). Obviously, there is good agreement between the polypeptide and the nucleotide alignments, except for a few locations: for example, the leucine codon at position CIVCG-36, ends at positions 38, for « HIV-1 » group, and 39 for « HIV-2 »-; a serine is encoded by TCC or TCA for « HIV-1 » group at position CIVCG-45, and by AGT at position 48 for « HIV-2 »-; a leucine codon YTN is located at CIVCG-72, for « HIV-1 », and shifted at 73 for « HIV-2 », in order to preserve a N-block presentation match. The nucleotides matching with a conserved codon are printed in bold-face type. All of the aligned sequences have been coded by using a 12-ranked-, and an 8-ranked-N-block presentation (the latter less stringent). Nucleotides matching in the same column (the « common sectors » consensus sequence included), that are, in addition, identically renamed with the 8-ranked N-block presentation, are highlighted with the same colour, for instance, with a green colour for common letters in the « HIV-1 » group, with blue for common letters in the « HIV-2 », with red for common « HIV-1-2 » letters, with yellow for common letters in RESIVSMM, RESIMM251, L07625, and/or X61240. 541 out of the 643 positions of the consensus sequence, are highlighted in red. In the « HIV-1 » group, the first occurrence of NF-KB (CIVCG 397-407) is highlighted by a red colour, as is the second that is aligned with the « HIV-2 » NF-KB (CIVCG 409-418); this indicates the perfect homology between these three copies, after coding the sequences by the 8-ranked N-presentation, as well. The more stringent 12-ranked N-block presentation shows highly homologous aligned sectors (boxed at the line « common sectors »): the polypurine tract and the 5? end of the LTR (CIVCG 10-37), the sectors aligned with CIVCG (105-136), (168-181), (215-229), and (284-296), the NF.KB sites (CIVCG 397-407, and 409-418), the polyadenylation signal (CIVCG 570-581), and the primer binding site (CIVCG 682-705). Obviously, the N-block presentation corroborates the homology blocks (in addition, local corrections of the alignment have been made possible).

The alignment (PDF format)