OUP user menu

Phylogenetic analysis of the first complete hepatitis E virus (HEV) genome from Africa

Hélène van Cuyck, François Juge, Pierre Roques
DOI: http://dx.doi.org/10.1016/S0928-8244(03)00241-4 133-139 First published online: 1 November 2003


Hepatitis E virus (HEV) is globally distributed, transmitted enterically and between humans and animals. Phylogenetic analysis has identified five distinct HEV genotypes. The first full-length sequence of an African strain (Chad) is presented and compared to 31 complete HEV genomes available, including the fulminant hepatitis strain from India, swine strains and a strain from Morocco. The two African strains are more closely related to genotype 1 than to any other genotypes and together they possibly form a sub-genotype or sixth genotype. The first evidence for recombination between divergent HEV strains is presented.

  • Hepatitis E virus
  • Hepatitis E virus genome sequence
  • Phylogenetic analysis
  • Recombination

1 Introduction

Hepatitis E virus (HEV) is the major cause of enterically transmitted non-A, non-B, non-C hepatitis and is responsible for significant morbidity and mortality in developing countries [1].

Outbreaks of hepatitis E have been described in Asia, Africa and Mexico [24] while sporadic cases have been noted in the United States and Europe [57]. It has been shown that HEV is a zoonotic disease of swine [810]. A HEV-related virus in chickens (avian HEV) has been described [11]. Avian HEV shares 50–60% nucleotide sequence with the human and swine viruses [12]. HEV is endemic in human populations in contact with pigs [13,14] and recently many swine HEV strains have been genetically identified [9,15].

HEV has been removed from the Calicivirus family and reclassified as a separate HEV-like-virus genus. The virus is non-enveloped and has a single positive RNA strand. The genome has three open reading frames (ORFs) and is approximately 7200 nucleotides long. The overall length differences between strains are due to the variation of the poly(A) tail length, and the sequenced segment of the 5′ end of the genome. The largest (ORF1) is 5 kb long and encodes a non-structural polyprotein containing consensus elements of methyl transferase, protease, helicase and RNA-dependent RNA polymerase [16]. The 3′ part of the genome corresponds to ORF2, which encodes the major capsid antigenic proteins. ORF3 is 1kb long, overlaps the ORF1 and ORF2 junction and encodes a protein of unknown function.

Several classification schemes have been published, but non-officially accepted, five genotypes have been described (roman numbers were proposed by Tsarev et al. [26]; arabic numbering was proposed by Wang et al. [43]): genotype I (1) includes Asian (India, Burma, Nepal, China-Xinjiang, Pakistan) [1723] and African strains (Chad, Algeria, Tunisia, Morocco, Egypt, Namibia) [2427], genotype III (2) includes Mexican and African (Nigeria) strains [28,29], genotype II (3) includes US and Japanese strains [33], genotype IV (4) includes Chinese (Shangaï) and Japanese strains [33], and finally genotype V (5) includes European strains [5,7]. These designations are based on the analysis of partial genome sequences despite the existence of complete sequences from China [20,30], India [17,31], Pakistan [21,22], Nepal [23], Mexico [28], US [6] and from Japanese patients [32,33]. Until recently, no full sequence from Africa has been characterized.

In this study we present the first comparison of complete African genome sequences of HEV: a strain from Chad and a strain from Morocco, and we analyze their relationship to previously described HEV genotypes, including the swine strains.

2 Materials and methods

The Chad T3 feces sampled in 1984 from a patient with hepatitis E was used as the source for extraction of RNA as previously described [22]. The sequencing of the RNA was performed with RT-PCR (reverse transcription-polymerase chain reaction) fragments using primers from the Burma sequence or the sequence previously obtained. A consensus sequence was created using the Sequencer 5.0 program and submitted to the GenBank.

The sequence comparison of the newly derived Chad strain to the other HEV strains from India, China, Pakistan, Mexico, US, Burma, Japan, Nepal, and Morocco was performed by comparing complete genome sequences of 31 strains isolated from human and swine. The following sequences were retrieved from the GenBank: M73218-Burma-7194 nucleotides (nt); D10330-Myanmar-7194 nt; M74506-Mexico-7170 nt; M80581-Sar55 Pakistan-7138 nt; D11092-China A Xinjiang-7194 nt; M94177-China B Xinjiang-7194 nt; L25595-China C Xinjiang-7193 nt; D11093-China D Xinjiang-7194 nt; X99441-India Madras-7194 nt; AF051830-Nepal TK15-92-7193 nt; AF185822-Abb-2B Pakistan-7143 nt; AF076239-India Hyderabad-7129 nt; X98292-Fulminant India-7193 nt; AF060668-HEV US-1-7186 nt; AF060669-HEV US-2-7251 nt; HEE272108-Genotype 4 China Shangaï-7232 nt; AB074917-JKK-Sap Japan-7223 nt; AB074915-JAK-Sai Japan 7224 nt; AB074920-JMK-Ham Japan 7228 nt; AB074918-JKN-Sap Japan-7244 nt; AY230202-Morocco-7192 nt; AF0082843-USA-S-7207 nt; AB073912-swJ570 Japan-7225 nt; AB080575-HE-J14 Japan-7171 nt; AB089824-HE-JA10 Japan-7244 nt; AB097812-HE-JA1 Japan-7240 nt; AP003430-JRA1 Japan-7230 nt; AY115488-CanswArkell Canada-7242 nt; AF459438-North India Jameel-7191 nt; L08816-China XingYang-7170 nt; AB097811-swJ13–1 7258 nt. Sequence alignments were constructed using Clustal W version 1.8. The alignment was altered according to the reading frame using the alignment editor Se-Al version 2.0 (A. Rambaut; http://evolve.zoo.ox.ac.uk). Ambiguities in the sequences were removed. All gap-containing sites were stripped from the alignment. Xia's test was used to assess saturation using DAMBE ([34], http://aix1.uottawa.ca/çxxia/software/sofware.htm). Phylogenetic trees based on distance and maximum likelihood methods were built using PAUP 4.0 (D.L. Swofford, Sinauer Associates, Sunderland, MA, USA). The exact models were selected by model testing using the AIC criterion (Akaike information criterion) or hLRTs (hierarchical likelihood ratio tests) as described in the Modeltest version 3.06 [35]. Similarities between HEV strains were tested with similarity plotting using Simplot version 3.2 (R. Stuart, Johns Hopkins University School of Medicine, http://sray.med.som.jhmi.edu/RaySoft/Simplot).

3 Results

3.1 Chad and Moroccan strains belong to genotype 1, and form a sub-genotype

The complete genome of the Chad T3 strain was sequenced and submitted to GenBank with the accession number AY204877. The complete genome was 7163 nucleotides long (7150 nt without the poly(A) tail) and 40 nucleotides from the 5′ end compared to the Burmese sequence. The genome included the three ORFs, ORF1, 2 and 3, encoding the full viral proteins and polyprotein. The genome was compared to 31 full-length HEV genomes; excluding gaps and ambiguities there were 6939 nucleotide sites in the alignment. Except in the 3′ end of the alignment and in the hypervariable region (from base 1760 to 2330) no gaps had to be inserted. Test of saturation [34] using DAMBE, indicates an index of substitution saturation (ISS)=0.2301 and a critical value ISSc=0.8304, indicating little saturation of the data (ISS<ISSc). Therefore, the sequences of HEV are suitable for phylogenetic analysis.

The similarity plotting, between the different full HEV genomes revealed that the Chad strain is more similar to genotype 1 cluster (Asian genotype), than to the other three genotypes: genotype 2 (Mexico), genotype 4 (China-Shangaï, Japanese), and genotype 3 (US, Japanese) (Fig. 1). Within genotype 1, the Chad strain seems to be divergent from the other genotype 1 strains (Fig. 1A). The RNA polymerase region and ORF2 are very conservative and represent the stable part among all strains as previously stated [26]. Differences in relative identity between strains from different genotypes remain the same across the genome. For example, the identity increases in the polymerase region (4000–5000) and in the ORF2 part of the genome (5000–7000) for all strains and genotypes (Fig. 1B). The same approximate relationships were found whatever sequence was used as the query.

Figure 1

Test of similarity between the sequences of the different full HEV genomes using SIMPLOT. A: The test of similarity was performed using the 84-Chad T3 sequence as query. Genotype 1 includes the Asian (Burmese)-like strains, genotype 2 includes the Mexican-like strains, genotype 3 includes the US-like strains and genotype 4 includes the Shangaï-like strains. The parameters used were the following: window size of 400 nt, increment of 60 nt, gap-strip on, Kimura 2 parameter correction, T/t of 4.0. The region of the genome corresponding to 1760–2330 nt was deleted showing a linear pattern in that zone. B: Bootscan analysis using SIMPLOT and the 84-Chad T3 strain as query.

Phylogenetic tree, constructed using PAUP, confirmed that African strains, Chad T3 and Moroccan strains are closest to genotype 1, and form an independent lineage distinct from the Chinese and Burma/India cluster, and they could be clustered within a new genotype (Fig. 2).

Figure 2

Comparison of 31 HEV genomes (6939 nt) using distance HKJ to obtain the neighbor-joining tree, with 100 bootstrap replicates. Only bootstrap values higher than 70 were considered as significant.

3.2 Recombination between HEV strains

Bootscan analysis of Chad T3 sequence against the other sequences was then performed to refine the relationships within the genotype. The breakpoint locations delimited (61–1000, 1000–3000, 3000–3500, 3500–4000, 3500–5000 and 5000–7000) were used to construct neighbor-joining trees (Fig. 3). For all regions the Chad sequence clusters basal to genotype 1. Several strains cluster differently in the phylogenetic trees from the different genomic regions. In particular China D and Indian-fulminant strains cluster differently depending on the part of the genome studied, as well as the Nepalese strain TK15–92 within the Indian cluster. The changes in the phylogenetic position of other strains within the Asian genotype 1 seem to be a consequence of this. These results indicate a possible recombination between divergent HEV viruses. The change in the phylogenetic position of the genotype 4-like cluster of strains (Japanese JKK Sap and JAK Sai) based on the ORF1 part of the genome (61–3500) could also be related to a recombination event. Repeating the bootscan analysis and reconstructing phylogenetic analysis excluding the putative HEV recombinants indicate that the Chad sequence is not an obvious recombinant. Further analysis has revealed, for the first time, very strong evidence for recombination between divergent HEV strains (data not shown).

Figure 3

Comparison of HEV genome fragments (61–1000; 1000–3000; 3000–3500; 3500–3500; 3500–5000; 5000–7000 length) using distance HKJ to obtain the neighbor-joining tree, with 100 bootstrap replicates. When two strains were 100% similar (India 90 and Hyderabad; HE North India and Jameel N. India; China 1986 and China B 1993), one sequence from both was included in the analysis (India 90; Jameel N. India; China B 1993). Only bootstrap values higher than 70 were considered as significant.

4 Discussion

The Chad strain of HEV was fully sequenced and compared to 31 other complete HEV genomes available in the GenBank, including another African strain from Morocco. No special editing signal seems to be involved for translation of ORF2 and 3 [36]. Clustering with genotype 1 of the Chad strain based on the phylogenetic analysis of its complete genome sequence confirmed previous analysis using a partial sequence in more conservative regions of the genome [24]. Phylogenetic analysis with the Chad strain confirmed that HEV strains cluster in four genotypes: genotype 1 (Chad/African; Chinese from Xinjiang; India; Burma; Nepal; Pakistan), genotype 2 (Mexico), genotype 3 (US and Japan) and genotype 4 (Chinese from Shangaï; Japan). However, the comparison using the two complete sequences from Africa reveals that African strains are clustered together, possibly forming a new sub-genotype of genotype I, or a genotype VI (or 6, depending on the numbering of future classification). Interestingly, this new partition is supported by phylogenetic analysis of partial sequences in conservative regions proposed previously, which thus remains useful for clustering in order to perform identification and epidemiological surveillance of HEV [24,26].

The distribution of strains within these genotypes is not geographically restricted but might be related to migration of populations within and between continents. Thus, relationships are observed between Chinese and Pakistanis or Japanese strains [22,30,37], or Nigerian and Mexican strains [29], or Japanese and US strains [33].

Inter-genotype recombination was suspected within genotype 1; strain China D, isolated in the Xinjiang Province in the Northwest of China appeared to be a recombinant. Moreover, co-infection in the same patient has been described recently [33]. Two different genotypes of HEV were involved, implying that recombination between genotypes is possible. Such recombination events have been described for other plus-strand RNA viruses such as the poliovirus, alphaviruses and more recently, the Dengue virus [3842].

In conclusion, for the first time a strain from Africa, the 84-Chad T3 strain, was fully sequenced and shown to belong to the Asian genotype. The comparison with another African full-length genome indicated the possibility of new genotyping of HEV. The genotype VI, or 6, is proposed to cluster the African-Asian-related strains in two sub-genotype.


This work was supported by the French Army Health Service. We are grateful to Dr. Jacques Viret and Dr. Dominique Dormont for promoting this work, and to David Robertson and Philippe Lemey for valuable comments on the results. We gratefully acknowledge Dr. Annemie Vandamme for organizing the Eighth European Workshop on Virus Evolution and Molecular Epidemiology.


  1. [1].
  2. [2].
  3. [3].
  4. [4].
  5. [5].
  6. [6].
  7. [7].
  8. [8].
  9. [9].
  10. [10].
  11. [11].
  12. [12].
  13. [13].
  14. [14].
  15. [15].
  16. [16].
  17. [17].
  18. [18].
  19. [19].
  20. [20].
  21. [21].
  22. [22].
  23. [23].
  24. [24].
  25. [25].
  26. [26].
  27. [27].
  28. [28].
  29. [29].
  30. [30].
  31. [31].
  32. [32].
  33. [33].
  34. [34].
  35. [35].
  36. [36].
  37. [37].
  38. [38].
  39. [39].
  40. [40].
  41. [41].
  42. [42].
  43. [43].
View Abstract