|
|
Research Paper
|
|
|||
Acta Biochim Biophys Sin 2005,37:625-633 |
||||
doi:10.1111/j.1745-7270.2005.00089.x |
Factors Influencing the Synonymous Codon and Amino Acid Usage Bias in AT-rich Pseudomonas aeruginosa Phage PhiKZ
K. SAU3,
1 Bioinformatics
Centre, 2 Department of
Biochemistry, Bose Institute, P1/12-CIT Scheme VII M,
3 Department of
Mathematics,
Received:
April 28, 2005
Accepted:
June 11, 2005
This
work was supported by the grants from the Department of Biotechnology,
Government of
*Corresponding
authors:
S. C.
MANDAL: E-mail, [email protected]
T.
C. GHOSH: Tel, +91-33-2334 6626; Fax, +91-33-2334 3886; E-mail,
[email protected]
Abstract To reveal how the AT-rich genome of bacteriophage PhiKZ has been shaped in order to carry out its growth in the GC-rich host Pseudomonas aeruginosa, synonymous codon and amino acid usage bias of PhiKZ was investigated and the data were compared with that of P. aeruginosa. It was found that synonymous codon and amino acid usage of PhiKZ was distinct from that of P. aeruginosa. In contrast to P. aeruginosa, the third codon position of the synonymous codons of PhiKZ carries mostly A or T base; codon usage bias in PhiKZ is dictated mainly by mutational bias and, to a lesser extent, by translational selection. A cluster analysis of the relative synonymous codon usage values of 16 myoviruses including PhiKZ shows that PhiKZ is evolutionary much closer to Escherichia coli phage T4. Further analysis reveals that the three factors of mean molecular weight, aromaticity and cysteine content are mostly responsible for the variation of amino acid usage in PhiKZ proteins, whereas amino acid usage of P. aeruginosa proteins is mainly governed by grand average of hydropathicity, aromaticity and cysteine content. Based on these observations, we suggest that codons of the phage-like PhiKZ have evolved to preferentially incorporate the smaller amino acid residues into their proteins during translation, thereby economizing the cost of its development in GC-rich P. aeruginosa.
Key words relative synonymous codon usage (RSCU); correspondence analysis; amino acid usage; bacteriophage PhiKZ
Synonymous codon and amino acid usage have been studied in numerous living organisms, and the analyses show that they vary not only inter-genomically but also intra-genomically. Several factors such as directional mutational bias [1-3], translational selection [4-9], secondary structure of proteins [10-15], replicational and transcriptional selection [16,17], and environmental factors [18,19] have been reported to influence the codon usage in various organisms. In contrast, amino acid usage has been shown to be influenced by factors such as hydrophobicity, aromaticity, cysteine residue (Cys) content, and mean molecular weight (MMW) [19-24].
Factors influencing the codon and amino acid usage bias have been studied in only a limited number of bacteriophage (or phage) genomes, though these are widespread in nature and instrumental in developing the field of molecular biology.
In this study, we have studied both the synonymous codon and amino acid usage bias in the AT-rich genome of bacteriophage PhiKZ and compared the data with that of its GC-rich host Pseudomonas aeruginosa [9,25,26] in order to see what kind of genomic architecture is needed by the former to grow in the latter. Our results show that synonymous codon as well as amino acid usage of PhiKZ is distinct from that of its host P. aeruginosa and the codons of the protein coding genes of the former have been shaped preferentially to incorporate the smaller amino acid residues into its proteins during its growth in the GC-rich host P. aeruginosa.
Materials and Methods
The genome sequence of bacteriophage PhiKZ was downloaded from
GenBank (http://www.ncbi.nlm.nih.gov) and its 306 protein coding genes
[25] had been extracted from the genome by an in-house program. Genomes of
other phages of the Myoviridae family such as Bxz1, T4, LP65, BcepB
A3s, T3s, G3s and C3s are the distributions of A, T, G and C at the synonymous third position of codons. GC3s is the frequency of G+C at the synonymous third codon position. Nc is the effective number of codons used by a gene, generally used to measure the bias of synonymous codons and independent of amino acid compositions and codon number [28]. The values of Nc range from 20 (when one codon is used per amino acid) to 61 (when all the codons are used with equal probability). Nc values were calculated according to the method of Banerjee et al. [29]. The putative highly and lowly expressed genes have been categorized respectively on the basis of lowest 10% and highest 10% of the genes according to their Nc values. To identify tRNA genes in PhiKZ and P. aeruginosa genomes, a computer program designated "tRNAscan-SE" (http://www.genetics.wustl.edu/eddy/tRNAscan-SE) was used. The program CodonW 1.3 (http://www.molbio.ox.ac.uk/cu) was used for calculating most of the parameters including correspondence analysis (CA) on the relative synonymous codon and amino acid usages.
Results and Discussion
Overall codon usage analysis in bacteriophage PhiKZ
The RSCU value for phage PhiKZ shows that A and/or T-ending codons are predominant (Table 1). Interestingly, the synonymous codon usage pattern of PhiKZ is distinct from that of the host P. aeruginosa [9], though the former uses the translational machinery of the latter for expressing both its structural and regulatory proteins. This is what is expected, as PhiKZ is an AT-rich organism [25], whereas P. aeruginosa is a GC-rich organism [30]. However, from overall RSCU values, it can be assumed that compositional constraint is the only factor responsible for shaping the codon usage variation among the genes in PhiKZ. But overall RSCU values may hide some heterogeneity of codon usage bias among the genes that might be superimposed on the extreme genomic composition of a genome as observed in other extremely skewed organisms.
To decipher the codon usage variation among the PhiKZ genes, Nc and GC3s have been determined. It was observed
that in
Evolutionary forces in shaping the synonymous codon usage variation in PhiKZ
Multivariate statistical analysis CA, one of the multivariate
statistical techniques, has been widely used to study the codon usage variation
between genes in different organisms. In this analysis, the data are plotted in
a multidimensional space of 59 axes (excluding Met, Trp and stop codons), then
the most prominent axes are determined that contribute to the codon usage
variation among the genes. In the present study, RSCU values have been used for
CA in order to minimize the amino acid composition. Fig. 1 shows the
distributions of PhiKZ genes on the first two major axes of the correspondence
analysis. The first major axis accounted for 11.25% of the total variation and
the second major axis accounted for 6.59% of the total variation. The position
of the genes along the first major axis is negatively correlated with A3s (r=-0.756, P<0.01) and T3s (r=-0.363, P<0.01). It is also interesting to note that the
position of the genes along the first major axis is positively correlated with Nc (r=0.151,
P<0.01), C3s (r=0.780, P<0.01),
G3s (r=0.425, P<0.01) and GC3s (r=0.762, P<0.01). From
these results one can reasonably postulate that A and T-ending codons might be
preferred codons in the presumably highly expressed genes. It is also evident
that the positions of the genes along the second major axis is positively
correlated with A3s (r=0.143, P<0.01)
and T3s (r=0.540, P<0.01), but
negatively correlated with C3s (r=-0.364, P<0.01), G3s (r=-0.538, P<0.01), GC3s (r=-0.487, P<0.01), and Nc (r=-0.159, P<0.01). Taken together, the results clearly
indicate that G- and C-ending codons are clustered on the positive side,
whereas A- and T-ending codons are predominant on the negative side of the
first major axis. Highly biased genes are generally highly expressed [6,31], as
there is no information available regarding the gene expression level of
PhiKZ, we have considered highly biased genes as highly expressed. Moreover,
since there exists a significant positive correlation between axis 1 and Nc, we
putative categorized the genes into two parts, highly or lowly expressed genes,
according to the positions of the genes at the two extreme ends of the first
major axis. To investigate the differences between the two clusters of genes
distributed along the first axis, the codon usage in 10% of the genes located
at the extreme right of axis 1 have been compared with that of the 10% of the
genes located at the extreme left of axis 1. To estimate the codon usage
variation between these two sets of genes, we have performed chi-squared tests
taking P<0.01 as significant criterion. Table 2 shows RSCU
values for each codon for the two groups of genes. The asterisk represents the
codons whose occurrences are significantly higher in the genes situated on the
extreme left side of axis 1, compared with the genes present on the extreme
right of the first major axis. It is important to note that out of 17 codons
that are statistically over-represented in genes located on the extreme left
side of axis 1, there are
Relationship between Nc and G3s Wright
suggested that a plot of Nc versus GC3s could effectively be used to explore the codon usage
variation among the genes [28]. As demonstrated by Wright, the comparison of
actual distribution of genes with the expected distribution under no selection
pressure could be indicative if codon usage bias of genes has some other
influences other than mutational bias. If the codon usage bias is completely
dictated by GC3s, the values of Nc should fall on the expected curve between GC3s and Nc. In other words, if codon usage bias is completely
dictated by GC3s composition, the difference between observed and expected Nc values should be very small in the
majority of genes. To explore the possible influence of natural selection and
mutational bias on synonymous codon usage on the PhiKZ genome, we calculated
(NcExpected-NcObserved)/NcExpected. The frequency distributions of (NcExpected-NcObserved)/NcExpected shown in Fig. 2 demonstrate that the majority of genes
have large deviation of NcObserved from NcExpected. This suggests that the majority of genes in PhiKZ have additional
codon usage bias, which is independent of mutational bias.
Influence of mutational pressure on the evolution of synonymous codon usage variation has been demonstrated in bacterial viruses T4 and T7, and in animal viruses belonging to the order Nidovirales [15,32]. Very recently, it was reported that in mycobacteriophages also, codon usage bias is mainly dictated by mutational pressure [33,34].
Effect of translational selection on the synonymous codon usage variation in PhiKZ
The cellular tRNA abundance had been demonstrated to influence the synonymous codon usages of highly expressed genes in several organisms [4,35-39]. To see whether the synonymous codon usage of putatively highly expressed genes of PhiKZ is also positively correlated with the host tRNA abundance, the number of over-represented synonymous codons in such genes was determined by comparing their overall RSCU values with that of the putative lowly expressed genes of PhiKZ. As it was shown that cellular tRNA abundance in some organisms is directly proportional to the copy number of tRNA [39,40], the resulting copy number of tRNA species PhiKZ was compared with that of P. aeruginosa (Table 1). It was found that among the 26 over-represented synonymous codons in highly expressed genes of PhiKZ, only 10 codons could be recognized by the abundant tRNA species of P. aeruginosa. In contrast, 11 out of the 32 over-represented codons of the lowly expressed genes of PhiKZ are also recognized by the abundant tRNA species of P. aeruginosa. Furthermore, PhiKZ-specific tRNAs also recognize two more over-represented codons of the highly expressed genes and three more over-represented codons of the lowly expressed genes. Taken together, the data in Table 1 indicate that the putative highly expressed genes of PhiKZ are expressed a little more preferentially than putative lowly expressed genes by the abundant host tRNAs as well as by its own tRNAs. The fact that the influence of abundant tRNAs of P. aeruginosa on the synonymous codon usage of the highly expressed genes of PhiKZ is not strong enough in comparison with what has been demonstrated for the phage T4-Escherichia coli system [32]. One possible explanation for the above observation may be that in P. aeruginosa, copy number of the tRNAs recognizing the synonymous codons decreased in a manner similar to that of other GC-rich bacterium such as Mycobacterium tuberculosis [40].
It is interesting to note that codon usage bias in PhiKZ is mainly dictated by the mutational bias and to a small extent by translation selection. In contrast, synonymous codon usage of P. aeruginosa, which is incidentally the host of PhiKZ, is influenced by several factors such as mutational bias, translational selection, gene length and hydrophobicity [9,26]. Taken together, the data indicate that synonymous codon usage of PhiKZ is distinct from that of P. aeruginosa.
Distince codon usage in PhiKZ from other 15 phages of Myoviridae family
Bacteriophage PhiKZ has been suggested to belong to a distinct evolutionary branch of the Myoviridae family, as it does not show notable homology to other myoviruses either at the DNA or protein level [25]. To test this hypothesis and to understand the correlation among the phages of the Myoviridae family, a cluster analysis was carried out on the overall codon usage data of 16 representative myoviruses including PhiKZ by using simple D-squared statistic method. D-squared statistic is the sum of the square of the difference between codons of the two codon usage tables; that is, D2 is the sum of 64 codons of [Frequency(codon, table 1)-Frequency(codon, table 2)]2. A low value of D2 indicates a very close similarity in the codon usage. A matrix containing the D2 value of each set has been used to produce a clustering. The clustering produced by unweighted pair group method using arithmetic averages (UPGMA) method [41] shows that there are mainly two branches, "a" and "b", for the 16 phages of the Myoviridae family (Fig. 3). Mycobacteriophage Bxz1 has been clustered in branch "a", whereas the rest of the phages have been clustered in branch "b". The phages T4, PhiKZ and LP65 are clustered in a distinct sub-branch "c" and the sub-branch "d" carries the remaining 12 phages. This type of distribution demonstrates that the synonymous codon usage pattern is not 100% identical even among the phages of each branch and there is a statistically significant difference in the codon usage pattern between the phages of different branches and sub-branches. The data also suggest that PhiKZ is evolutionarily closer to E. coli phage T4, whereas mycobacteriophage Bxz1 has a completely different codon usage pattern from the rest 15 phages of the Myoviridae family (Fig. 3).
Amino acid usage in PhiKZ
To reveal the factors influencing the amino acid composition in PhiKZ, we also carried out CA on the relative amino acid usage of its 306 proteins. It was found that the first and second major axes of CA accounted for 16.43% and 11.77% of the total variation of the amino acid composition of PhiKZ proteins, respectively. Next, a linear regression analysis between the positions of the proteins along each of the three axes was carried out with their MMW, Cys content and aromaticity.
It was found that the first axis was significantly correlated (r=-0.478, P<0.01) with the MMW of PhiKZ proteins (Fig. 4).
This indicates that PhiKZ proteins located on the positive side of the first axis
should preferentially carry the amino acid residues with the lowest MMW. It was
indeed found that the first axis was positively correlated with each of
The second major axis is significantly negatively correlated (r=-0.678, P<0.01) with the aromaticity of each PhiKZ protein (Fig. 5). From amino acid frequency analysis, it was also found that all the aromatic amino acids were rare in PhiKZ proteins (data not shown). Incidentally, aromatic amino acids were also rare in E. coli, T. maritama and G. lamblia proteins, and it was suggested that these amino acids were not incorporated preferentially in proteins as their biosynthesis was energetically expensive for organisms [20-22].
Further analysis has shown that the second major axis is also negatively correlated (r=-0.462, P<0.01) with the Cys content of the PhiKZ proteins (Fig. 6). Interestingly, among the 306 PhiKZ proteins, 45 proteins do not carry any Cys residue, whereas 19 proteins located at the extreme right side in Fig. 6 are found to contain more than 3% Cys residue. It would be interesting to explore the contribution of these Cys-rich proteins towards gene regulation as well as the development of PhiKZ in P. aeruginosa.
To see whether the amino acid usage of PhiKZ is similar to that of its host P. aeruginosa, we also carried out CA on the relative amino acid usage of P. aeruginosa proteins (data not shown). It was found that the first and second major axes of CA accounted for 20.49% and 14.04% of the total variation of the amino acid composition of P. aeruginosa proteins, respectively. Further analysis showed that while the first major axis is significantly correlated with Cys content (r=-0.175, P<0.01), the second axis is significantly correlated with grand average of hydropathicity (r=0.898, P<0.01) and the aromaticity (r=0.447, P<0.01) of each P. aeruginosa protein (data not shown). The data suggest that amino acid usage of PhiKZ is also distinct from that of its host P. aeruginosa.
Bacteriophages including
PhiKZ are devoid of any protein synthesis machinery and depend completely on
the hosts for their protein synthesis and reproduction. To grow in a
genomically distant host, a phage like PhiKZ must evolve its genome in such a
way that it can synthesize its proteins easily. From the above codon and amino
acid usage analyses, it is conspicuous that codons of the protein coding genes
of PhiKZ have been shaped to incorporate predominantly the smaller amino acid
residues into their proteins during translation in P. aeruginosa. This type
of genomic architecture possibly helps PhiKZ to economize the cost of its
development in P. aeruginosa.
References
1 Levin
DB, Whittome B. Codon usage in nucleopolyhedroviruses. J Gen Virol 2000, 81:
2313-2325
2 Jenkins
GM, Pagel M, Gould EA, de A Zanotto PM, Holmes EC. Evolution of base
composition and codon usage bias in the genus Flavivirus. J Mol Evol
2001, 52: 383-390
3 Jenkins
GM, Holmes EC. The extent of codon usage bias in human RNA viruses and its
evolutionary origin. Virus Res 2003, 92: 1-7
4 Grantham
R, Gautier C, Gouy M, Jacobzone M, Mercier R. Codon catalog usage is a genome
strategy modulated for gene expressivity. Nucleic Acids Res 1981, 9: r43-r74
5 Ikemura
T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol
Biol Evol 1985, 2: 13-34
6 Sharp
PM, Cowe E. Synonymous codon usage in Saccharomyces cerevisiae. Yeast
1991, 7: 657-678
7 Lesnik
T, Solomovici J, Deana A, Ehrlich R, Reiss C. Ribosome traffic in E. coli
and regulation of gene expression. J Theor Biol 2000, 202: 175-185
8 Ghosh
TC,
9 Gupta
SK, Ghosh TC. Gene expressivity is the main factor in dictating the codon usage
variation among the genes in Pseudomonas aeruginosa. Gene 2001, 273: 63-70
10 Oresic
M, Shalloway D. Specific correlations between relative synonymous codon usage
and protein secondary structure. J Mol Biol 1998, 281: 31-48
11 Xie
T, Ding DF. The relationship between synonymous codon usage and protein
structure. FEBS Lett 1998, 434: 93-96
12 Chiusano
ML, Alvarez-Valin F, di Giulio M, D'Onofrio G, Ammirato G, Colonna G, Bernardi
G. Second codon positions of genes and the secondary structures of proteins.
Relationships and implications for the origin of the genetic code. Gene 2000,
261: 63-69
13 Gupta
SK, Majumdar S, Bhattacharya TK, Ghosh TC. Studies on the relationships between
the synonymous codon usage and protein secondary structural units. Biochem
Biophys Res Commun 2000, 269: 692-696
14 D'Onofrio
G, Ghosh TC, Bernardi G. The base composition of the genes is correlated with
the secondary structures of the encoded proteins. Gene 2002, 300: 179-187
15 Gu
W, Zhou T, Ma J, Sun X, Lu Z. Analysis of synonymous codon usage in SARS Coronavirus
and other viruses in the Nidovirales. Virus Res 2004, 101: 155-161
16 McInerney
JO. Replicational and transcriptional selection on codon usage in Borrelia
burgdorferi. Proc Natl Acad Sci
17 Romero
H, Zavala A, Musto H. Compositional pressure and translational selection
determine codon usage in the extremely GC-poor unicellular eukaryote Entamoeba
histolytica. Gene 2000, 25: 307-311
18
19 Basak
S, Banerjee T, Gupta SK, Ghosh TC. Investigation on the causes of codon and
amino acid usages variation between thermophilic Aquifex aeolicus and
mesophilic Bacillus subtilis. J Biomol Struct Dyn 2004, 22: 205-214
20 Lobry
JR, Gautier C. Hydrophobicity, expressivity and aromaticity are the major
trends of amino-acid usage in 999 Escherichia coli chromosome-encoded
genes. Nucleic Acids Res 1994, 22: 3174-3180
21 Garat
B, Musto H. Trends of amino acid usage in the proteins from the unicellular
parasite Giardia lamblia. Biochem Biophys Res Commun 2000, 279: 996-1000
22 Zavala
A, Naya H, Romero H, Musto H. Trends in codon and amino acid usage in Thermotoga
maritima. J Mol Evol 2002, 54: 563-568
23 Banerjee
T, Basak S, Gupta SK, Ghosh TC. Evolutionary forces in shaping the codon and
amino acid usages in Blochmannia floridanus. J Biomol Struct Dyn 2004,
22: 13-23
24 Naya
H, Zavala A, Romero H, Rodriguez-Maseda H, Musto H. Correspondence analysis of
amino acid usage within the family Bacillaceae. Biochem Biophys Res
Commun 2004, 325: 1252-1257
25 Mesyanzhinov
VV, Robben J, Grymonprez B, Kostyuchenko VA, Bourkaltseva MV, Sykilinda NN,
Krylov VN et al. The genome of bacteriophage phiKZ of Pseudomonas
aeruginosa. J Mol Biol 2002, 317: 1-19
26 Grocock
RJ, Sharp PM. Synonymous codon usage in Pseudomonas aeruginosa PA01. Gene
2002, 289: 131-139
27 Sharp
PM, Li WH. The codon adaptation index--a
measure of directional synonymous codon usage bias, and its potential
applications. Nucleic Acids Res 1987, 15: 1281-1295
28 Wright
F. The 'effective number of codons's used in a gene. Gene 1990, 87: 23-29
29 Banerjee
T,
30 Stover
CK, Pham XQ, Erwin AL,
31 Hou
ZC, Yang N. Factors affecting codon usage in Yersinia pestis. Acta
Biochim Biophys Sin 2003, 35: 580-586
32 Kunisawa
T. Synonymous codon preferences in bacteriophage T4: A distinctive use of
transfer RNAs from T4 and from its host Escherichia coli.
J Theor Biol 1992, 159: 287-298
33 Sahu
K,
34 Sahu
K,
35 Sharp
PM, Rogers MS, McConnell DJ. Selection pressures on codon usage in the complete
genome of bacteriophage T7. J Mol Evol 21: 150-160
36 Gouy
M. Codon contexts in enterobacterial and coliphage genes. Mol Biol Evol 1987,
4: 426-444
37 Ikemura T. Correlation between codon
usage and tRNA content in microorganisms. In: Hatfield DL, Lee BJ, Pirtle RM
eds. Transfer RNA in Protein Synthesis.
38 Zhou
J, Liu WJ, Peng SW, Sun XY, Frazer I. Papillomavirus capsid protein expression
level depends on the match between codon usage and tRNA availability. J Virol
1999, 73: 4972-4982
39 Kanaya
S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T. Codon usage and tRNA genes in
eukaryotes: Correlation of codon usage diversity with translation efficiency
and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol
2001, 53: 290-298
40 Kanaya
S, Yamada Y, Kudo Y, Ikemura T. Studies of codon usage and tRNA genes of 18
unicellular organisms and quantification of Bacillus subtilis tRNAs:
Gene expression level and species-specific diversity of codon usage based on
multivariate analysis. Gene 1999, 238: 143-155
41 Sokal
RR, Sneath PHA. Principles of Numerical Taxonomy.