Molecular Biology and Human Genetics, MRC Centre for Molecular and Cellular Biology and the DST/NRF Centre for Biomedical TB Research, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa
Erika De Wit
Molecular Biology and Human Genetics, MRC Centre for Molecular and Cellular Biology and the DST/NRF Centre for Biomedical TB Research, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa
Eileen G. Hoal
Molecular Biology and Human Genetics, MRC Centre for Molecular and Cellular Biology and the DST/NRF Centre for Biomedical TB Research, Faculty of Health Sciences, Stellenbosch University, Tygerberg, South Africa
The historical impression that tuberculosis was an inherited disorder has come full circle and substantial evidence now exists of the human genetic contribution to susceptibility to tuberculosis. This evidence has come from several whole-genome linkage scans, and numerous case-control association studies where the candidate genes were derived from the genome screens, animal models and hypotheses pertaining to the disease pathways. Although many of the associated genes have not been validated in all studies, the list of those that have been is growing, and includes NRAMP1, IFNG, NOS2A, MBL, VDR and some TLR. Certain of these genes have consistently been associated with tuberculosis in diverse populations. The future investigation of susceptibility to tuberculosis is almost certain to include genome-wide association studies, admixture mapping and the search for rare variants and epigenetic mechanisms. The genetic identification of more vulnerable individuals is expected to inform personalized treatment and perhaps vaccination strategies.
human genetic susceptibility
Tuberculosis is one of the most influential diseases in the history of mankind due to its devastating effect on health and its high mortality rate throughout the world. Although tuberculosis has been studied for centuries, it is still responsible for more human deaths than any other single infectious disease, and was declared a global public health emergency by the World Health Organization in 1993. The causative bacterium in humans is Mycobacterium tuberculosis, and although the development of effective antibiotics, together with the improvement of socioeconomic circumstances during the 19th and 20th century, helped reduce mortality rates in developed countries, tuberculosis continues to be a major threat, particularly in the developing world, with an estimated 9.27 million incident cases in 2007 (World Health Organization, 2009).
Although infection with M. tuberculosis is necessary, it is not sufficient to cause tuberculosis in most people. This is substantiated by the observation that only 10% of infected persons who are immunocompetent will ever develop the active disease, while the majority of the population control the bacterium effectively. Human factors governing whether the infected individual will progress to active tuberculosis disease or not are usually assumed to be those governing the immunological state of the host, which are generally determined by the host's genetic makeup. Even the effects of stress and nutrition on the immune response can be influenced by genetics (Ising & Holsboer, 2006; Ferguson et al., 2007). Several studies to date have proven that genetic factors contribute to the outcome of tuberculosis, with an estimated heritability ranging from 36% to 80% (Kallmann & Reisner, 1942; Comstock, 1978; Jepson et al., 2001; Newport et al., 2004b; Kimman et al., 2006). Before this finding was widely accepted, the misconception existed that other diseases, such as cardiovascular disease, cancer and diabetes were influenced by genetic factors, but that death from infection was due to unfavourable conditions or misfortune (Bellamy, 1998). A classic epidemiological study on the premature death of adoptees in Denmark suggested that the genetic contribution to infectious disease is greater than that for cancer or cardiovascular disease (Sorensen et al., 1988). The elucidation of the genetic control of susceptibility to tuberculosis is expected to provide new and more effective tools for prevention and control of this problematic disease. However, to date, this field has achieved limited success and many of the genes and mechanisms that determine susceptibility to tuberculosis still remain unidentified.
Tuberculosis disease in a family does not follow a Mendelian pattern and is polygenic and multifactorial. Genetic susceptibility studies in tuberculosis are additionally complicated by the presence of two different genomes, the bacterium and the host, and the influence their interaction can have on the disease. Although several genes have been identified as susceptibility genes for tuberculosis, it is necessary to keep in mind that other genes, the environment (including socioeconomic conditions) and M. tuberculosis itself have an influence on the disease, this being the reason why no single major susceptibility gene has been identified for tuberculosis (McShane, 2003), nor do we expect it to be.
The past: evidence for a genetic influence in human susceptibility to M. tuberculosis
The idea that a genetic influence is present in susceptibility or resistance to tuberculosis has long been established and differences in susceptibility to tuberculosis in diverse populations were recognized even 100 years ago (Bellamy, 1998). Before the discovery of M. tuberculosis, it was observed that tuberculosis frequently occurred in several members of the same family, appearing to be hereditary. However, the discovery of the bacterium focussed attention on the importance of the pathogen, while the host genome was largely ignored.
The exposure of naïve populations, such as the inhabitants of the Qu'Appelle Indian Reservation in Canada in the 1890s, illustrates the role of the host genome in tuberculosis. When they were originally exposed to tuberculosis, almost 10% of the population died of the disease per annum. The high prevalence of tuberculosis in this population was, in part, a consequence of the lack of ‘innate’ resistance through the absence of previous historical exposure to the tubercle bacillus. After 40 years more than half of the families were eradicated, but the death rate had fallen to 0.2%, a decrease that could be attributed to the strong selection against susceptibility genes for tuberculosis (Motulsky, 1960). This is a plausible explanation, because it has also been observed that Europeans have greater resistance to tuberculosis than populations from sub-Saharan African descent, most likely due to the longer time that European populations have been exposed to the bacterium (Dubos & Dubos, 1952). These population differences are not only due to socioeconomic factors, because a study in a nursing home in the United States determined that individuals from African descent were twice as likely as individuals from European descent in the same environment, to be infected with M. tuberculosis (Stead et al., 1990).
One of the oldest observations, and arguably the most important, is the wide range of responses seen in individuals exposed to M. tuberculosis. Some individuals are never infected, some are infected but never develop clinical disease and others are infected and present with active tuberculosis. A tragic example of this variable outcome of disease was provided in 1926 by the inadvertent immunization of children with a virulent strain of M. tuberculosis in Lübeck, Germany. The same dose of bacteria was given to 251 infants within the first 10 days of life. Of these children, 77 died, 127 had radiological signs of disease and 47 showed no evidence of tuberculosis (Alcaïs & Abel, 2004).
More convincing evidence that host genes are important in tuberculosis susceptibility was provided by twin studies. Monozygous and dizygous twins were compared with respect to how often both members develop the disease (concordance). It was found that monozygous twins (essentially identical in their genetic makeup) are more likely to show concordance for the development of tuberculosis than dizygous twins (Simonds, 1963; Comstock, 1978; Sorensen et al., 1988). According to Comstock (1978), this indicated that genetic factors are the major contributors to the progression of the disease, because twins share a similar environment. A recent reanalysis of the same data (Van der Eijk et al., 2007) showed that environmental factors contributed more to the development of tuberculosis than hereditary factors, but this study did not preclude a role for host genetic factors in tuberculosis susceptibility. More recent data showing higher concordance in monozygous twins in terms of cellular immune responses to mycobacterial antigens indirectly support the original conclusion (Newport et al., 2004b).
Mendelian susceptibility to mycobacterial infection is a rare human syndrome where affected individuals are exceptionally susceptible to otherwise nonpathogenic mycobacteria, including BCG, and Salmonella. Numerous studies (Al-Muhsen & Casanova, 2008) have identified the mutations in these cases, mostly in genes that are vital for immunity against intracellular pathogens in the interleukin (IL)-12/IL-23/interferon (IFN)-γ axis. The existence of these individuals implied that the normal human genome could contribute to susceptibility to tuberculosis and the findings suggested candidate genes to investigate in the general population.
Several animal models for tuberculosis exist, such as the mouse, rabbit, guinea-pig, fish and nonhuman primates (Flynn, 2006); however, mice and rabbits are most often used in studies assessing the genetic component of tuberculosis susceptibility. Rabbits are relatively resistant to tuberculosis (Gupta & Katoch, 2005), but Lurie and colleagues used the rabbit as an animal model (Allison et al., 1962) and developed inbred resistant and susceptible strains (which were subsequently lost, unfortunately). These strains were infected with aerosols of human and bovine tuberculosis, and more viable bacteria were present in the susceptible rabbits. It was concluded that this was primarily due to the bactericidal effectiveness of the alveolar macrophages of the resistant strain (Gupta & Katoch, 2005). Inbred strains of mice show different patterns of susceptibility after infection with M. tuberculosis: resistant mice can control bacterial replication, have reduced lung injury and survive longer (Chackerian & Behar, 2003). The use of the mouse model led to the identification of the first tuberculosis susceptibility gene, discussed below.
The present: linkage, association and interaction studies
The ideal study design for evaluating genetic susceptibility in tuberculosis might be based on the complete resequencing of the human genome in a large collection of cases and controls, because this would allow genotyping of all the variations, rare or common, in an individual's genome (Hirschhorn & Daly, 2005). However, at this stage, applying such a method would be time consuming, expensive and computationally impractical. Given these constraints, several alternative study designs and approaches have been devised for complex diseases (Fig. 1). These include genome-wide linkage and candidate gene association analyses, which have been used in the majority of genetic susceptibility studies in tuberculosis. All the susceptibility genes involved in tuberculosis cannot be identified by applying a single study design, because all study approaches have limitations. Genes involved in complex disease can be identified either with or without an a priori hypothesis. Genome-wide approaches, which have no prior hypothesis and include linkage analysis, can identify a region containing an unknown potential susceptibility gene. Candidate genes can also be selected by hypotheses that are based on the functions of putative susceptibility genes in animals or humans. This approach utilizes association studies and screens for mutations or polymorphisms in the gene of interest.
Linkage studies are used to trace chromosomal regions containing putative susceptibility genes, either by a genome-wide scan, which ensures that all major genomic regions involved in disease susceptibility are identified, or by concentrating on a candidate region. This provides the opportunity to find new genes and pathways that might not have been suspected previously to contribute to the disease studied. This approach can be very successful in monogenic diseases, because it allows the fine mapping of the gene of interest. However, this is not the case in complex disease studies, where (in general) several regions containing many genes may be identified. Linkage studies assume that chromosomal regions segregate nonrandomly with the disease of interest in large affected families and aim to identify these regions (Alcaïs & Abel, 2004). Model-based or parametric linkage analysis by the logarithm of odds (lod) score method needs a defined model that specifies the relationship between the phenotype and factors that may influence its expression. The model is provided by segregation analysis. Model-free or nonparametric linkage studies are used when little is known about the relationship between the phenotype and the gene, which is generally the case with complex diseases. Once evidence for linkage has been found, fine genetic and physical maps are constructed to narrow down the interval on the chromosome and allow gene identification by candidate gene selection (when the function of the gene is known) or by positional cloning (when the function is unknown) (Abel & Dessein, 1997).
The chromosome 17q11–q21 candidate region, syntenic to mouse chromosome 11, which was previously identified as a susceptibility region for another intracellular pathogen (Roberts et al., 1993), was chosen for a tuberculosis linkage study (Jamieson et al., 2004). This indicated that the chromosome 17q11–q21 region contained tuberculosis susceptibility genes, and further evaluation of this region suggested that four genes contributed separately to susceptibility (Jamieson et al., 2004). The chromosome 2q35 region was investigated in an extended family of aboriginal Canadians where a linkage study genotyped microsatellite markers flanking the natural resistance-associated macrophage protein 1 gene [NRAMP1, renamed solute carrier family 11A member 1 (SLC11A1)] located in this region (Greenwood et al., 2000). The individuals were grouped into different risk classes using various parameters, such as age, vaccination, previous disease and tuberculin skin test (TST) results. These risk classes were critical to obtain significant evidence for linkage with a microsatellite distal of NRAMP1 as well as a haplotype consisting of NRAMP1 intragenic markers.
Six genome-wide linkage scans have been performed in human tuberculosis genetics to date (Table 1), and although areas of putative linkage have been found, none of the linkage peaks identified by these studies have reached genome-wide significance. The first of these studies in tuberculosis was performed using sibpairs from Gambia and South Africa (Bellamy et al., 2000) and identified chromosomes 15q and Xq as containing possible susceptibility genes. Fine mapping of chromosome 15q11–q13 suggested that ubiquitin protein ligase E3A (human papilloma virus E6-associated protein, Angelman syndrome) (UBE3A) or a nearby gene may be involved in tuberculosis pathogenesis (Cervino et al., 2002). A study in Brazilians, which examined tuberculosis and leprosy families, suggested three regions (10q26.13, 11q12.3 and 20p12.1) (Miller et al., 2004).
Chromosomal regions identified by genome-wide linkage studies of tuberculosis (TB)
Bellamy et al. (2000)
Miller et al. (2004)
Baghdadi et al. (2006)
Cooke et al. (2008)
Stein et al. (2008)
Mahasirimongkol et al. (2009)
↵TB, current or previous microbiologically confirmed tuberculosis.
↵CA, ordered subset analysis with minimum age at onset of disease as covariate.
In Morocco, a genome scan recognized chromosome 8q12–q13 as containing a major dominant acting tuberculosis susceptibility gene (Baghdadi et al., 2006), as yet unidentified. A fourth study, using South African and Malawian populations, identified areas on chromosomes 6p21–q23 and 20q13.31–33 (Cooke et al., 2008) and follow-up association studies implicated the cathepsin Z (CTSZ) and melanocortin 3 receptor (MC3R) genes in tuberculosis susceptibility. The CTSZ protein is mostly expressed in immune cells such as macrophages and monocytes (Journet et al., 2000; Garin et al., 2001), and a role for the protein in the immune response has been hypothesized (Kos et al., 2005). The MC3R protein is expressed in peripheral tissues such as immune cells (Wang et al., 2008) and plays a role in several biological systems, regulating energy homeostasis, fat metabolism, the cardiovascular system and inflammation (Versteeg et al., 1998; Getting et al., 2006; Wang et al., 2008). However, no specific immune functions have been identified for CTSZ or MC3R.
A genome scan performed in Ugandan tuberculosis patients considered a number of distinct phenotypes and found linkage to two of these, namely (1) tuberculosis disease compared with latent or no tuberculosis infection and (2) persistently negative TST (PTST−) individuals compared with latent M. tuberculosis infection (Stein et al., 2008). Chromosomes 2q21–2q24 and 5p13–5q22 showed linkage in PTST− individuals, while chromosomes 7p22–7p21 and 20q13 (a validation of the region found by Cooke et al., 2008) were identified as susceptibility regions in tuberculosis patients. The latest genome-wide linkage study was performed in Thailand and used single-nucleotide polymorphisms (SNPs) instead of microsatellites to perform linkage analysis (Mahasirimongkol et al., 2009). Chromosome 5q23.2–q31.3, which contains numerous candidate genes, such as IL-3, -4, -5 and -13, was shown to be linked to tuberculosis susceptibility. Using an ordered subset analysis, which used age at onset of tuberculosis as a covariate, chromosomes 17p13.3–13.1 and 20p13–12.3 showed significant linkage with earlier onset of the disease.
The minimal overlap between susceptibility regions observed in these studies is probably because linkage of a genotype for a genetic marker with tuberculosis may be unique to a specific family or population and thus impossible to identify in other studies. These studies are also exquisitely sensitive to the phenotype definition. Alternatively, nonreproducibility could be due to false-positive or -negative associations caused by low sample numbers. This could be controlled for by carrying out power calculations using simulations during the planning of linkage studies (Teare & Barrett, 2005).
Candidate gene association studies
Population-based case-control association studies are the classic association studies where the allele frequency of a specific marker is compared between unrelated cases (affected individuals) and controls (unaffected individuals) when Hardy–Weinberg equilibrium holds (Lander & Schork, 1994). Case-control association studies can detect weak effects and several of these studies replicated or validated associations found in previous studies or detected novel pathways that might be involved in pathogenesis (Maartens & Wilkinson, 2007). However, an association can arise for one of three reasons: (1) when the specific allele is the actual cause of the disease, (2) when the allele does not cause the trait but is in linkage disequilibrium with the actual cause of the disease and (3) as an artefact of population admixture (Lander & Schork, 1994). It is very important, therefore, to address population stratification and to replicate or validate association studies to confirm the results.
The most popular study design for investigating host susceptibility to tuberculosis is association studies, because it was shown that this design has much greater power for detecting genes of small effect than linkage studies (Risch & Merikangas, 1996), provided the sample size is adequate. The genetic component of tuberculosis susceptibility seems to be scattered across many genes and it has been studied extensively in the past few years, resulting in a vast amount of information (Fig. 2). Several genes, among others, HLA, NRAMP1, IFNG, NOS2A, SP110, CCL2, MBL, CD209, VDR and TLR, have been associated with tuberculosis, some repeatedly, whereas others were not associated with every population examined or could not be replicated at all (Burgner et al., 2006). This emphasizes the complexities of characterizing host susceptibility in different ethnic populations living in diverse environments (Ardlie et al., 2002). Subtly different definitions of tuberculosis disease (phenotype) could also be a problem in these studies. Certain candidate genes that have consistently been associated with tuberculosis in multiple studies will be discussed here (Table 2).
Genetic involvement in the tuberculosis disease process. A simplified representation of different outcomes after Mycobacterium tuberculosis infection and some of the genes that may be involved at various stages. The bacteria enter the respiratory system of the host via inhaled droplets and are engulfed by macrophages (MΦ) and dendritic cells. There are three potential outcomes after inhalation of M. tuberculosis: (1) infection develops into active tuberculosis, (2) M. tuberculosis is immediately killed by the pulmonary immune system or (3) infection does not progress to active disease, because the bacteria are contained in granulomas. This containment can last for decades, or the lifetime of the individual. Mycobacterium tuberculosis can disseminate from granulomas, causing active tuberculosis (reactivation disease) (reproduced from Kaufmann & McMichael, 2005).
The human leucocyte antigen (HLA) region consists of approximately 200 genes, many of which are involved in antigen presentation. Genes that are involved in protective immunity show greater variance than other genes (Murphy, 1993) and this is also the case for the HLA region, which varies between populations and is highly polymorphic. It has been postulated that this is in fact the effect of different selection pressures, such as infectious disease (Lombard et al., 2006a). The HLA class I and class II genes are involved in antigen presentation to T cells and each protein binds a different range of peptides (Janeway et al., 2001). There are three class I α-chain genes in humans, namely HLA-A, -B and -C. There are four subclasses of genes in the HLA class II region, named HLA-DR, -DP, -DM and –DQ, respectively. HLA genes have been examined in several tuberculosis susceptibility studies and were some of the first genes to be investigated and associated with the disease. The HLA-DR2 is the most consistently associated with tuberculosis in several populations such as India (Meyer et al., 1998; Ravikumar et al., 1999), Thailand (Vejbaesya et al., 2002), Indonesia (Selvaraj et al., 1998a) and Russia (Khomenko et al., 1990). It was found that the HLA-DQB1*0503 influence the progression of tuberculosis in the Cambodian population (Goldfeld et al., 1998), whereas the DQB1*0601 was associated with tuberculosis in the Thai and South Indian population (Ravikumar et al., 1999; Vejbaesya et al., 2002). DRB1*1302 was associated with tuberculosis susceptibility in a Venda population (Lombard et al., 2006b).
The first susceptibility gene to be identified from a mouse model of mycobacterial disease was NRAMP1 (SLC11A1), which plays a role in macrophage activation. NRAMP1 is a divalent transporter localized to the late endosomal membrane that regulates cytoplasmic cation levels by specifically regulating the iron metabolism in the macrophages, leading to possible containment of early mycobacterial infections (Wyllie et al., 2002). It has also been hypothesized that variants in the NRAMP1 gene may alter the function of this important protein (Gruenheid et al., 1997). Genetic variants of NRAMP1 have been associated with tuberculosis in several studies (Bellamy et al., 1998a; Hoal et al., 2004), leading to the meta-analysis of NRAMP1 polymorphisms, which confirmed their involvement in tuberculosis susceptibility (Li et al., 2006). The 5′ (GT)n variant has been shown to be of functional importance as it influenced NRAMP1 expression in a luciferase reporter assay (Searle & Blackwell, 1999). The 118 allele [(GT)9] has been associated with higher expression and resistance to tuberculosis whereas the 120 allele [(GT)10] has been associated with lower promoter activity and susceptibility to tuberculosis (Awomoyi et al., 2002; Hoal et al., 2004). The 3′UTR variant has also been linked to a significantly increased risk of pulmonary tuberculosis in West Africa (Bellamy et al., 1998a), Asia (Ryu et al., 2000) and South Africa (Hoal et al., 2004).
The IFN-γ pathway is one of the most well known in tuberculosis because it is a flagship Th1 cytokine and plays a vital role in the protective immune response against M. tuberculosis infection (Vidyarani et al., 2006). Several polymorphisms have been identified in IFNG and in the α and β chains of the IFN-γ receptor (IFNGR) gene that were mapped to chromosome 12 and 6, respectively (Zimonjic et al., 1995; Papanicolaou et al., 1997). The functional IFN-γ and IFNGR form a vital complex essential for containing M. tuberculosis (Dorman et al., 2004). One of the most studied polymorphisms in IFNG is located in the first intron (+874 T/A) and has been associated with tuberculosis susceptibility in several populations such as Spanish (Lopez-Maderuelo et al., 2003), Chinese (Tso et al., 2005) and South African, where it was found in both a case-control and transmission disequilibrium test study (Rossouw et al., 2003). A meta-analysis performed on this particular SNP indicated a significant protection conferred by the T allele, to tuberculosis in different population groups (Pacheco et al., 2008). The +874 T allele provides a binding site for nuclear factor-κB, the transcription factor that induces IFN-γ expression, and the +874 AA genotype was predictive of a lower likelihood of sputum conversion in Japanese patients (Shibasaki et al., 2009).
The nitric oxide synthase 2A (inducible, hepatocytes) gene (NOS2A) is induced in response to infections and cytokines and produces the inducible nitric oxide synthase protein (Hobbs et al., 2002). This protein generates nitric oxide (NO) by converting l-arginine to l-citrulline with the help of several cosubstrates (Kun et al., 2001). Among its other biological functions, NO has been recognized as a mediator of immunity to tuberculosis (MacMicking et al., 1997) and other infectious organisms. The specific mechanism by which NO controls M. tuberculosis is not clear, but may involve disruption of bacterial DNA, proteins, signalling and/or induction of apoptosis of macrophages that contain the bacterium (Chan et al., 2001). It may also play a role in the formation of protective granulomas (Facchetti et al., 1999). Two SNPs (rs2779249 and rs2301369) in the promoter of NOS2A were associated with tuberculosis in a family-based study in 92 Brazilian families, where they contributed separate main effects to allelic associations (Jamieson et al., 2004). A study, in Mexicans, on a different promoter SNP (rs1800482) did not detect an association with disease (Flores-Villanueva et al., 2005), whereas an NOS2A microsatellite promoter polymorphism was associated with tuberculosis in Colombia (Gomez et al., 2007). In the South African coloured population, an association was found with NOS2A haplotypes that consisted of two functional promoter polymorphisms, namely rs9282799 and rs8078340 (Möller et al., 2009), and another recent study in African Americans detected an association with 10 NOS2A SNPs, of which rs2274894 and rs7215373 showed the strongest association (Velez et al., 2009). Together with functional evidence, the genetic associations found between NOS2A and tuberculosis adds to the body of data, confirming the important role of NO in human disease.
Recently, a mouse strain predisposed to tuberculosis was used to identify a susceptibility region on chromosome 1 of the mouse. This region was named susceptibility to tuberculosis 1 (sst1) and a candidate gene from this region, intracellular pathogen resistance 1 (Ipr1), was found to mediate resistance to tuberculosis in mice (Pan et al., 2005). The closest human homologue of Ipr1 is SP110b, a protein encoded by the SP110 nuclear body protein gene (SP110). The initial association study in West Africa (Tosh et al., 2006) identified three polymorphisms in the gene that appeared to influence genetic susceptibility to tuberculosis. Subsequently, however, three large studies in Ghanaian (Thye et al., 2006), Russian (Szeszko et al., 2006) and South African (Babb et al., 2007a) populations failed to replicate this association. This illustrates a common phenomenon in association studies, where the first report is usually a positive association and subsequent studies are often negative (Healy, 2006). Unfortunately, because of publication bias, negative association studies are seldom if ever published.
Chemokines play an important role in the development of immune responses against tuberculosis. The C-C chemokine ligand-2 gene (CCL2) encodes monocyte chemoattractant protein-1 (MCP-1), which is essential for the recruitment of monocytes, T lymphocytes (Hasan et al., 2005) and natural killer cells (Allavena et al., 1994) to the site of mycobacterial infection. It may also take part in the localization of tuberculosis in the lungs by contributing to granuloma formation (Hasan et al., 2005) and possibly have a role in T cell differentiation (Luther & Cyster, 2001). Mice deficient in CCL2 are more susceptible to tuberculosis during the early stages of the infection (Kipnis et al., 2003). The functional CCL2 promoter polymorphism rs1024611 (Tabara et al., 2003) was associated with increased susceptibility to pulmonary tuberculosis in Mexicans and in Koreans (Flores-Villanueva et al., 2005). Monocytes from individuals with a GG genotype, associated with a higher risk of progression to disease, were stimulated with M. tuberculosis antigens and produced higher concentrations of MCP-1 and lower concentrations of IL-12p40 than monocytes from individuals with the AA genotype (Flores-Villanueva et al., 2005). However, this polymorphism was not associated with tuberculosis in a smaller study of Brazilians (Jamieson et al., 2004) or in a large case-control association study in the South African coloured population (Möller et al., 2009).
The transmembrane C-type lectin, dendritic cell-specific intracellular adhesion molecule (ICAM)-grabbing nonintegrin (DC-SIGN), or CD209, located on chromosome 19, is known to be the major M. tuberculosis receptor in human dendritic cells, and as such, functions in the pulmonary innate immune system (Ji et al., 2005). Phagocytes represent the first line of cellular defence in the alveoli, the surface of which is rich in C-type lectin pattern recognition receptors such as DC-SIGN. On the basis that DC-SIGN might mediate intracellular signalling events leading to cytokine secretion, it has previously been proposed that this C-type lectin could be used by pathogens such as M. tuberculosis to their own advantage as part of an immune strategy (Geijtenbeek et al., 2003; Tailleux et al., 2003). Two promoter variants, −871 A/G and −336 A/G, have been studied regarding susceptibility to tuberculosis in South Africa and the −871G/−336A haplotype is significantly more frequent among healthy controls (Barreiro et al., 2006). Interestingly, this allelic combination is also found at a higher frequency in European populations.
The vitamin D receptor gene (VDR) mediates the effects of the active metabolite of vitamin D, 1,25-dihydroxyvitamin D3, which suppresses the growth of M. tuberculosis in vitro (Denis, 1991; Rockett et al., 1998) by stimulating cell-mediated immunity and activating monocytes (Rook et al., 1986). Conflicting results have been found in association studies of VDR in tuberculosis (Bellamy et al., 1999; Selvaraj et al., 2000; Wilkinson et al., 2000; Bornman et al., 2004). A meta-analysis of studies indicated that results were inconclusive and that the studies were underpowered (Lewis et al., 2005). A recent study in South Africans determined that the ApaI ‘AA’ genotype and ‘T’-containing TaqI genotypes predicted a faster response to tuberculosis treatment, but did not detect an association with tuberculosis in a case-control analysis (Babb et al., 2007b). This again illustrates the importance of the specific phenotype used in the analysis.
The family of mammalian Toll-like receptors (TLRs) consists of at least 13 proteins, each with a distinct function, which initiate the innate immune response. TLRs are central components of the innate immune response to mycobacterial infection (Akira & Takeda, 2004; Quesniaux et al., 2004) and act as part of the pattern recognition system to signal the presence of M. tuberculosis in the host (Ferwerda et al., 2005). The receptors are expressed on T cells and may modulate T cell activation by TLR ligands (Imanishi et al., 2007).
Two common nonsynonymous polymorphisms in TLR1, namely rs4833095 (N248S) and rs5743618 (I602S), have been implicated in tuberculosis susceptibility in the African–American population (Ma et al., 2007), with the latter SNP occurring in the transmembrane domain of the protein, resulting in impaired receptor trafficking and function. A second family-based TLR1 association analysis showed that the allele encoding the TLR-248S variant was significantly over-represented among diseased children (Ma et al., 2007). TLR2 knockout mice are highly susceptible to M. tuberculosis infection (Reiling et al., 2002; Drennan et al., 2004) with a shorter survival time and a higher burden of mycobacteria when compared with mice with the functional gene (Heldwein et al., 2003). An association between tuberculosis and TLR2, with an SNP named Arg677Trp (Lorenz et al., 2000) that appeared to be functional in mycobacterial disease (Bochud et al., 2003), proved to be false, after it was established that Arg677Trp was an artefact due to the presence of a pseudogene region (Malhotra et al., 2005). The rs5743708 SNP (Arg753Gln) was investigated in Turkey and it was suggested that it may contribute to the risk of developing tuberculosis (Ogus et al., 2004). There was a slight deviation from Hardy–Weinberg in the control population, perhaps due to the high incidence of consanguinity in the Turkish population (Ogus et al., 2004). A microsatellite polymorphism in intron 2 of the TLR2 gene was identified and associated with tuberculosis (Yim et al., 2006). Shorter GT repeats were present more often among tuberculosis patients, and in validation samples from Korea, and were associated with weaker promoter activity and reduced expression of TLR2 on CD14+ peripheral blood monocytes.
Two missense mutations in TLR4 affect the extracellular domain of the TLR4 protein and were associated with hyporesponsiveness to lipopolysaccharide (Arbour et al., 2000). One of these polymorphisms was investigated in Gambian tuberculosis patients, but no influence in lipopolysaccharide responsiveness or susceptibility to the disease was found (Newport et al., 2004a). The second SNP was not present in Gambians (Allen et al., 2003). Four TLR8 polymorphisms were associated with tuberculosis in Indonesian males and the association was validated in Russian males (Davila et al., 2008). Functional analysis showed that TLR8 transcripts were upregulated in tuberculosis patients with acute disease and that protein expression was increased in macrophages after BCG infection.
Gene–gene and gene–strain interaction studies
Because of the complex nature of the immune system and the polygenic nature of complex diseases, it has become increasingly evident that gene–gene interactions play a far more important part in an individual's susceptibility to a complex disease than single polymorphisms would on their own (Williams et al., 2000; Ritchie et al., 2001; Tsai et al., 2003). Methods for studying gene–gene interactions are based on a multilocus and multigene approach, consistent with the nature of complex-trait diseases, and may provide the paradigm for future genetic studies of tuberculosis. This is still a fairly new approach to elucidate susceptibility to complex diseases, and only one study has been published regarding gene–gene interactions in tuberculosis, which suggested that NOS2A SNPs may interact with polymorphisms in TLR4 and IFNGR1 (Velez et al., 2009).
In addition to the host genotype conferring susceptibility, it has been hypothesized that the interaction between the genotypes of the human host, and the bacterial strain, could determine both the progression to disease and perhaps the type of disease seen. Several M. tuberculosis strains, such as Latin American and Mediterranean (LAM), Haarlem, C strain and X strain exist all over the world, and the W-Beijing family of strains has been studied extensively (Malik & Godfrey-Faussett, 2005). An investigation into 875 strains from 80 different countries led to the hypothesis that ancestral M. tuberculosis originated and drifted together with humans ‘out of Africa’ (Gagneux et al., 2006). Remarkably, evidence exists that six lineages of M. tuberculosis have adapted to specific populations (Maartens & Wilkinson, 2007) and Hanekom (2007) showed that strains from a defined sublineage may have been selected by a human population in a defined geographical setting. This argument is supported by the fact that the HLA allele frequencies vary widely between human populations with different historical backgrounds, with some alleles completely absent in certain populations (Lombard et al., 2006a) and that the HLA genotype has been associated with susceptibility to M. tuberculosis. Only one study has looked at the interaction between bacterial strains and host genotypes thus far and an association was found between the C allele of the TLR2 597 T/C SNP of the host and the bacterial genotype in M. tuberculosis disease (Caws et al., 2008).
Genome-wide association studies (GWAS)
GWAS allow the genotyping of the most frequent genetic polymorphisms in the genome without making assumptions about the genomic location of the causal variants. Because most of the genome is surveyed, it eliminates the disadvantages of the single polymorphism or candidate gene approach, where only a few polymorphisms are investigated (Hirschhorn & Daly, 2005). The completion of the human genome sequence, the deposition of SNPs into public databases, the rapid improvements in SNP genotyping methods (such as the development of microarray platforms) and the International HapMap project have allowed the genetic association field to progress to the stage that this approach is feasible. Previous investigations (Daly et al., 2001; Patil et al., 2001; Gabriel et al., 2002) and the International HapMap project have shown that most common variation in the genome can be represented by approximately 300 000 SNPs (Balding, 2006) in White populations. African and other populations with greater variation and less linkage disequilibrium need more SNPs (Hirschhorn & Daly, 2005) to ensure coverage of the entire genome.
When this study design was first suggested, one of the major theoretical stumbling blocks was the problem of statistical correction for multiple questions. GWAS have the potential to identify many false associations and therefore replication or validation in independent populations is essential (Burgner et al., 2006). However, these studies are very expensive, and the best approach to adjust for multiple comparisons has not been determined. A possible strategy for the elimination of false-positive results is to use a two-step study design and to adopt strict rules for declaring significant associations. A few individuals are genotyped genome-wide in the first stage of the study. In the second stage, promising SNPs are genotyped in the remainder of the study population (Balding, 2006). In cases where a significant association is determined, the results are verified in another population, preferably using another genotyping method to exclude technical artefacts. These studies need a large sample size to ensure adequate power. An additional challenge for GWAS is that it is currently based on the ‘common disease, common variant’ (CDCV) hypothesis, which means that rare susceptibility variants may not be detected.
Admixture mapping, a novel approach for disease gene discovery, is an additional methodology for which the resolution is higher than that of linkage analysis, but lower than that of association studies (Chakraborty & Weiss, 1988; Stephens et al., 1994). This technique involves using a population that has arisen from two or more genetically different parent populations, where the frequency of the disease, and therefore presumably of the underlying risk variants as well, is different in the founding populations. The aim is then to localize the parts of the genome inherited from a specific ancestral population in the cases in order to identify the locus responsible for the phenotype of interest. The primary tools are the genetic markers that occur with significantly different allele frequencies in different population groups. When risk alleles vary across populations, genetically mixed individuals with the disease under investigation are likely to have a higher probability of having inherited the loci near the disease loci from the population at higher risk of the disease. The study design has recently been used successfully in a variety of complex diseases, for example, hypertension (Zhu et al., 2005), multiple sclerosis (Reich et al., 2005) and prostate cancer (Freedman et al., 2006) in African Americans. Preparation for studies in admixed populations from Mexico (Bonilla et al., 2005; Martinez-Marignac et al., 2007) and Hispanic/Latino populations (Mao et al., 2007; Price et al., 2007) are also underway, but none of these studies have involved tuberculosis. There is a disparity in the rates that European and Black individuals are infected with tuberculosis and progress to disease (Stead et al., 1990). This is not merely a reflection of socioeconomic circumstances, but it has been speculated that it is the result of centuries of exposure to tuberculosis in Europe, which may have resulted in a degree of selection for a more resistant population, whereas sub-Saharan Africa was exposed to the disease only relatively recently. Although the environmental and social factors are difficult to control for, it does appear that the underlying susceptibility is different.
Admixture mapping may serve as an excellent approach for an initial genome scan because of several advantages when compared with linkage or association studies (Darvasi & Shifman, 2005). This technique has a higher statistical power to detect genes of modest effect than linkage, and if these risk alleles are differentially distributed between ancestral populations, and if the frequencies differ greatly across populations, the power approaches that of association mapping (Montana & Pritchard, 2004). In addition, admixture mapping studies are not as affected by allelic heterogeneity as are other study designs (Martinez-Marignac et al., 2007).
This approach could yield novel susceptibility genes for tuberculosis. An admixture mapping study with respect to tuberculosis is underway at present for an admixed population in South Africa, by our group.
A hypothesis that genetic polymorphisms are not the only inherited features of DNA that may influence disease susceptibility has been considered for several years (Richards, 2006). Epigenetic mechanisms, such as aberrant DNA methylation and histone acetylation, regulate the transcription rate and/or tissue-specific expression of certain genes without altering the primary nucleotide sequence of the DNA.
DNA methylation plays an important role in the arrangement of some key biological activities such as imprinting and silencing of chromosomal domains (Reik & Walter, 2001; Wang & Leung, 2004). It occurs at the cytosine residue in the context of a CpG dinucleotide and in the promoter regions, where it can act as an important modifier of transcription. The relationship between the methylation of the CpG islands and gene expression is very complex and some reports suggest that the change of methylation intensity of the promoter CpG islands is negatively correlated with gene expression levels (Futscher et al., 2002; Song et al., 2005), while others have observed no correlation (Weber et al., 2007; Illingworth et al., 2008).
Histone acetylation is crucial for gene transcription. Histone acetylation sites and histone acetyltranferases are required for chromatin folding and gene activity, as opening of the chromatin conformation allows binding of transcription factors. Similarly, histone deacetylation plays an important role in transcription (Shahbazian & Grunstein, 2007). It was found that signalling by M. tuberculosis, or its 19-kDa lipoprotein, inhibited IFN-γ-induced class II transactivator (CIITA) expression and this process was linked to histone deacetylation at the CIITA gene (Pennini et al., 2006). Acetylation may therefore contribute to tuberculosis susceptibility, but this remains to be investigated.
These DNA or chromatin features have been shown to be transmitted to subsequent generations, but they may also be influenced by the environment (Richards, 2006). However, this inheritance is not as stable as DNA-based inheritance and it is not completely elucidated. It has been shown that M. tuberculosis may interfere with epigenetic regulation (Pennini et al., 2006). Therefore, once inheritance is understood, it is possible that these epigenetic mechanisms may be found to play a role in host susceptibility to tuberculosis.
Copy number and rare variants
The majority of the methods above are based on the CDCV hypothesis, and are designed to detect common genetic markers. However, a number of sceptics have postulated that this hypothesis is not true, and will therefore not lead us to some of the more meaningful susceptibility-associated genes (Goldstein, 2009).
DNA copy number variations (CNVs), which can be defined as 1 kb and larger stretches of DNA that display copy number differences compared with the normal population, have generated remarkable enthusiasm in the science community due to their role in functional variation (Iafrate et al., 2004; Scherer et al., 2007). It was observed that CNV hotspots exist (Lee et al., 2008) and that CNV segments were considerably enhanced among sequences with low or moderate SNP content (Cutler et al., 2007). CNVs in individuals have been associated with disease or susceptibility to several diseases (Baldini, 2004; Friedman et al., 2006; Marshall et al., 2008; Stefansson et al., 2008). It has also been shown that the copy number amplification of regions containing the MBL, pfmdr1 and gch1 genes influenced the susceptibility to bacterial infection and malaria in the zebrafish (Jackson et al., 2007).
Individuals with the complex disorder of schizophrenia were recently found to owe their predisposition to many individually rare mutations, including rare mutations and duplications (Walsh et al., 2008). If this scenario is also true for tuberculosis, it will require a more intense search for the rare variants governing susceptibility.
The idea that tuberculosis is not only influenced by the bacterium but also by both genetic and environmental factors was formally stated 57 years ago (Dubos & Dubos, 1952). Dubos was also of the opinion that medical solutions alone would not prevent or cure tuberculosis. Thus far, history has proved this opinion correct. However, the authors did not consider the possible effect of research in the field of human genetic susceptibility to tuberculosis. Even though this research is years from being clinically applied, the era of personalized medicine is already upon us, and it may not be long before the admittedly controversial benefits of personalized genetics can be extended from conditions such as cardiovascular disease to infectious diseases such as tuberculosis. If individuals can be identified as potentially more vulnerable, they may require different vaccination strategies, a higher index of suspicion if exposed to tuberculosis, and prophylactic treatment.
Complex diseases may in all likelihood continue to earn their name, and susceptibility could prove to be due to the interaction between both common and rare genetic variants in the host, variation in the bacterium, and the environment, requiring highly sophisticated algorithms to predict susceptibility.
M.M. and E.d.W. contributed equally to this work.
The authors thank Paul van Helden for critical reading of the manuscript and helpful suggestions.
(2004) Application of genetic epidemiology to dissecting host susceptibility/resistance to infection illustrated with the study of common mycobacterial infections. Susceptibility to Infectious Diseases: The Importance of Host Genetics (BellamyR., ed), pp. 7–33. Cambridge University Press, Cambridge.
(1962) Host–parasite relationships in natively resistant and susceptible rabbits on quantitative inhalation of tubercle bacilli. Their significance for the nature of genetic resistance. Am Rev Respir Dis 85: 553–569.
(1988) HLA typing in the Hong Kong Chest Service/British Medical Research Council study of factors associated with the breakdown to active tuberculosis of inactive pulmonary lesions. Am Rev Respir Dis 138: 1616–1621.
(2005) Association of HLA-DR and HLA-DQ genes with susceptibility to pulmonary tuberculosis in Koreans: preliminary evidence of associations with drug resistance, disease severity, and disease recurrence. Hum Immunol 66: 1074–1081.
(2002) Genotype frequencies of the+874T–>A single nucleotide polymorphism in the first intron of the interferon-gamma gene in a sample of Sicilian patients affected by tuberculosis. Eur J Immunogenet 29: 371–374.
(2006) Relationship between single nucleotide polymorphisms of NRAMP1 gene and susceptibility to pulmonary tuberculosis in workers exposed to silica dusts. Zhonghua Lao Dong Wei Sheng Zhi Ye Bing Za Zhi 24: 531–533.
(1999) Human leukocyte antigen-associated susceptibility to pulmonary tuberculosis: molecular analysis of class II alleles by DNA amplification and oligonucleotide hybridization in Mexican patients. Chest 115: 428–433.