Categories
Articles

ABBS 2005,37(09): Factors Influencing the Synonymous Codon and Amino Acid Usage Bias in AT-rich Pseudomonas aeruginosa Phage PhiKZ

 


Research Paper

Pdf file on
Synergy

Download Chinese abstract

Acta Biochim Biophys Sin 2005,37:625-633

doi:10.1111/j.1745-7270.2005.00089.x

Factors Influencing the Synonymous Codon and Amino Acid Usage Bias
in AT-rich Pseudomonas aeruginosa Phage PhiKZ

 

K. SAU3,  

1 Bioinformatics
Centre,
2 Department of
Biochemistry, Bose Institute, P1/12-CIT Scheme VII M, 3 Department of
Mathematics,  

Received:
April 28, 2005

Accepted:
June 11, 2005

This
work was supported by the grants from the Department of Biotechnology,
Government of *Corresponding
authors:

S. C.
MANDAL: E-mail, [email protected]

T.
C. GHOSH: Tel, +91-33-2334 6626; Fax, +91-33-2334 3886; E-mail,
[email protected]

 

Abstract        To reveal how the AT-rich genome of
bacteriophage PhiKZ has been shaped in order to carry out its growth in the
GC-rich host Pseudomonas aeruginosa, synonymous codon and amino acid
usage bias of PhiKZ was investigated and the data were compared with that of P.
aeruginosa
. It was found that synonymous­ codon and amino acid usage of
PhiKZ was distinct from that of P. aeruginosa. In contrast to P.
aeruginosa
, the third codon position of the synonymous codons of PhiKZ
carries mostly A or T base; codon usage bias in PhiKZ is dictated mainly by
mutational bias and, to a lesser extent, by translational selection. A cluster
analysis of the relative synonymous codon usage values of 16 myoviruses
including PhiKZ shows that PhiKZ is evolutionary much closer to Escherichia
coli
phage T4. Further analysis reveals that the three factors of mean
molecular weight, aromaticity and cysteine content are mostly responsible for
the variation of amino acid usage in PhiKZ proteins, whereas amino acid usage
of P. aeruginosa proteins is mainly governed by grand average of
hydropathicity, aromaticity and cysteine content. Based on these observations,
we suggest that codons of the phage-like PhiKZ have evolved to preferentially
incorporate the smaller amino acid residues­ into their proteins during
translation, thereby economizing the cost of its development in GC-rich P.
aeruginosa
.

 

Key words        relative synonymous
codon usage (RSCU); correspondence analysis; amino acid usage; bacteriophage
PhiKZ

 

Synonymous codon and amino acid usage have been studied in numerous
living organisms, and the analyses show that they vary not only inter-genomically
but also intra-genomically. Several factors such as directional mutational­
bias [1
3], translational selection [49], secondary­ structure of proteins [1015], replicational and transcriptional selection [16,17], and
environmental factors­ [18,19] have been reported to influence the codon usage
in various organisms. In contrast, amino acid usage has been shown to be
influenced by factors such as hydrophobicity, aromaticity, cysteine residue
(Cys) content, and mean molecular weight (MMW) [19
24].

Factors influencing the codon and amino acid usage bias have been
studied in only a limited number of bacterio­phage (or phage) genomes, though
these are widespread in nature and instrumental in developing the field of
molecular­ biology.

In this study, we have studied both the synony­­mous codon and amino
acid usage bias in the AT-rich genome of bacteriophage PhiKZ and compared the
data with that of its GC-rich host Pseudomonas aeruginosa [9,25,26] in
order to see what kind of genomic architecture is needed by the former to grow
in the latter. Our results show that synonymous codon as well as amino acid
usage of PhiKZ is distinct from that of its host P. aeruginosa and the
codons of the protein coding genes of the former have been shaped preferentially
to incorporate the smaller amino acid residues into its proteins during its
growth in the GC-rich host P. aeruginosa.

 

 

Materials and Methods

 

The genome sequence of bacteriophage PhiKZ was downloaded from
GenBank (http://www.ncbi.nlm.nih.gov) and its 306 protein coding genes
[25] had been extracted from the genome by an in-house program. Genomes of
other phages of the Myoviridae family such as Bxz1, T4, LP65, BcepBA3s, T3s, G3s and C3s are the distributions of A, T, G and C
at the synonymous third position of codons. GC
3s is the frequency of G+C at the synonymous third codon
position. N
c is the effective number of codons used by a gene, generally­ used
to measure the bias of synonymous codons and independent of amino acid
compositions and codon number [28]. The values of N
c range from 20 (when one codon is used per
amino acid) to 61 (when all the codons are used with equal probability). N
c values
were calculated­ according to the method of Banerjee et al. [29]. The
putative­ highly and lowly expressed genes have been categorized­ respectively
on the basis of lowest 10% and highest 10% of the genes according to their N
c values.
To identify tRNA genes in PhiKZ and P. aeruginosa genomes, a computer
program designated “tRNAscan-SE” (http://www.genetics.wustl.edu/eddy/tRNAscan-SE)
was used. The program CodonW 1.3 (http://www.molbio.ox.ac.uk/cu) was
used for calculating most of the para­meters including­ correspondence analysis
(CA) on the relative­ synonymous­ codon and amino acid usages.

 

 

Results and Discussion

 

Overall codon usage analysis in bacteriophage PhiKZ

 

The RSCU value for phage PhiKZ shows that A and/or T-ending codons
are predominant (Table 1). Interestingly, the synonymous codon usage
pattern of PhiKZ is distinct from that of the host P. aeruginosa [9],
though the former uses the translational machinery of the latter for expressing­
both its structural and regulatory proteins. This is what is expected, as PhiKZ
is an AT-rich organism [25], whereas P. aeruginosa is a GC-rich organism
[30]. However, from overall RSCU values, it can be assumed that compositional­
constraint is the only factor responsible for shaping the codon usage variation
among the genes in PhiKZ. But overall­ RSCU values may hide some heterogeneity
of codon usage bias among the genes that might be superimposed on the extreme
genomic composition of a genome as observed­ in other extremely skewed
organisms.

To decipher the codon usage variation among the PhiKZ genes, Nc and GC3s have been determined. It was observed
that in  

Evolutionary forces in shaping the synonymous codon usage variation
in PhiKZ

 

Multivariate statistical analysis    CA, one of the multivariate
statistical techniques, has been widely used to study the codon usage variation
between genes in different organisms. In this analysis, the data are plotted in
a multidimensional space of 59 axes (excluding Met, Trp and stop codons), then
the most prominent axes are determined that contribute to the codon usage
variation among the genes. In the present study, RSCU values have been used for
CA in order to minimize the amino acid composition. Fig. 1 shows the
distributions of PhiKZ genes on the first two major axes of the correspondence
analysis. The first major axis accounted for 11.25% of the total variation and
the second major axis accounted for 6.59% of the total variation. The position
of the genes along the first major axis is negatively correlated with A
3s (r=0.756, P<0.01) and T3s (r=0.363, P<0.01). It is also interesting to note that the position of the genes along the first major axis is positively correlated with Nc (r=0.151,
P<0.01), C
3s (r=0.780, P<0.01), G3s (r=0.425, P<0.01) and GC3s (r=0.762, P<0.01). From these results one can reasonably­ postulate that A and T-ending codons might be preferred codons in the presumably highly expressed genes. It is also evident that the positions of the genes along the second­ major­ axis is positively correlated with A3s (r=0.143, P<0.01) and T3s (r=0.540, P<0.01), but negatively correlated with C3s (r=0.364, P<0.01), G3s (r=0.538, P<0.01), GC3s (r=0.487, P<0.01), and Nc (r=0.159, P<0.01). Taken together, the results clearly indicate that G- and C-ending codons are clustered on the positive side, whereas A- and T-ending codons are predominant on the negative side of the first major axis. Highly biased genes are generally highly expressed [6,31], as there is no information available regarding­ the gene expression level of PhiKZ, we have considered highly biased genes as highly expressed. Moreover, since there exists a significant positive correlation between axis 1 and Nc, we
putative categorized the genes into two parts, highly or lowly expressed genes,
according to the positions of the genes at the two extreme ends of the first
major axis. To investigate the differences between the two clusters of genes
distributed along the first axis, the codon usage in 10% of the genes located
at the extreme right of axis 1 have been compared with that of the 10% of the
genes located at the extreme left of axis 1. To estimate the codon usage
variation between these two sets of genes, we have performed chi-squared tests
taking P<0.01 as significant criterion. Table 2 shows RSCU
values for each codon for the two groups of genes. The asterisk represents­ the
codons whose occurrences are significantly­ higher in the genes situated on the
extreme left side of axis 1, compared­ with the genes present on the extreme
right of the first major axis. It is important to note that out of 17 codons
that are statistically over-represented in genes located­ on the extreme left
side of axis 1, there are Relationship between Nc and G3s        Wright
suggested that a plot of N
c versus GC3s could effectively be used to explore­ the codon usage
variation among the genes [28]. As demonstrated­ by Wright, the comparison of
actual distribution­ of genes with the expected distribution under no selection
pressure could be indicative if codon usage bias of genes has some other
influences other than mutational bias. If the codon usage bias is completely
dictated by GC
3s, the values of Nc should fall on the expected curve between GC3s and Nc. In other words, if codon usage bias is completely
dictated by GC
3s composition, the difference­ between observed and expected­ Nc values should be very small in the
majority of genes. To explore the possible influence of natural selection­ and
mutational bias on synonymous­ codon usage on the PhiKZ genome, we calculated­
(N
cExpectedNcObserved)/NcExpected. The frequency distributions­ of (NcExpectedNcObserved)/NcExpected shown in Fig. 2 demonstrate that the majority of genes
have large deviation­ of N
cObserved from NcExpected. This suggests that the majority of genes in PhiKZ have additional
codon usage bias, which is independent of mutational bias.

Influence of mutational pressure on the evolution of synonymous
codon usage variation has been demonstrated in bacterial viruses T4 and T7, and
in animal viruses belonging­ to the order Nidovirales [15,32]. Very recently,
it was reported that in mycobacteriophages also, codon usage bias is mainly
dictated by mutational pressure [33,34].

 

Effect of translational selection on the synonymous codon usage
variation in PhiKZ

 

The cellular tRNA abundance had been demonstrated to influence the
synonymous codon usages of highly expressed­ genes in several organisms [4,35
39]. To see whether the synonymous codon usage of putatively highly
expressed genes of PhiKZ is also positively correlated with the host tRNA
abundance, the number of over-represented synonymous codons in such genes was
determined by comparing their overall RSCU values with that of the putative­
lowly expressed genes of PhiKZ. As it was shown that cellular­ tRNA abundance
in some organisms is directly­ proportional to the copy number of tRNA [39,40],
the resulting­ copy number of tRNA species PhiKZ was compared­ with that of P.
aeruginosa
(Table 1). It was found that among the 26
over-represented synonymous codons in highly expressed genes of PhiKZ, only 10
codons could be recognized by the abundant tRNA species­ of P. aeruginosa.
In contrast, 11 out of the 32 over-represented­ codons of the lowly expressed
genes of PhiKZ are also recognized by the abundant tRNA species of P.
aeruginosa
. Furthermore, PhiKZ-specific tRNAs also recognize two more
over-represented codons of the highly expressed genes and three more
over-represented codons of the lowly expressed genes. Taken together, the data
in Table 1 indicate­ that the putative highly expressed genes of PhiKZ
are expressed­ a little more preferentially than putative lowly expressed genes
by the abundant host tRNAs as well as by its own tRNAs. The fact that the
influence of abundant tRNAs of P. aeruginosa on the synonymous codon
usage of the highly expressed genes of PhiKZ is not strong enough in comparison
with what has been demonstrated for the phage T4-Escherichia coli system
[32]. One possible explanation­ for the above observation may be that in P.
aeruginosa
, copy number of the tRNAs recognizing the synonymous codons
decreased in a manner similar to that of other GC-rich bacterium such as Mycobacterium
tuberculosis­
[40].

It is interesting to note that codon usage bias in PhiKZ is mainly
dictated by the mutational bias and to a small extent by translation selection.
In contrast, synonymous codon usage of P. aeruginosa, which is
incidentally the host of PhiKZ, is influenced by several factors such as
mutational bias, translational selection, gene length and hydrophobicity
[9,26]. Taken together, the data indicate that synonymous codon usage of PhiKZ
is distinct from that of P. aeruginosa.

 

Distince codon usage in PhiKZ from other 15 phages of Myoviridae
family

 

Bacteriophage PhiKZ has been suggested to belong to a distinct
evolutionary branch of the Myoviridae family, as it does not show
notable homology to other myoviruses either at the DNA or protein level [25].
To test this hypothesis­ and to understand the correlation among the phages of
the Myoviridae family, a cluster analysis was carried out on the overall
codon usage data of 16 representative myoviruses including PhiKZ by using
simple D-squared statistic method. D-squared statistic is the sum of the square
of the difference between codons of the two codon usage tables; that is, D
2 is the sum of 64 codons of [Frequency(codon, table 1)Frequency(codon, table 2)]2.
A low value of D
2 indicates a very close similarity
in the codon usage. A matrix containing the D
2 value of each set has
been used to produce a clustering. The clustering produced by unweighted pair
group method using arithmetic averages (UPGMA) method [41] shows that there are
mainly two branches, “a” and “b”, for the 16 phages of the Myoviridae
family (Fig. 3). Mycobacteriophage Bxz1 has been clustered in branch
“a”, whereas the rest of the phages have been clustered in branch
“b”. The phages T4, PhiKZ and LP65 are clustered in a distinct sub-branch
“c” and the sub-branch “d” carries the remaining 12 phages.
This type of distribution demonstrates that the synonymous codon usage pattern
is not 100% identical even among the phages­ of each branch and there is a
statistically significant difference­ in the codon usage pattern between the
phages­ of different branches and sub-branches. The
data also suggest that PhiKZ is evolutionarily closer to E. coli phage­
T4, whereas mycobacteriophage Bxz1 has a completely different codon usage
pattern from the rest 15 phages of the Myoviridae family (Fig. 3).

 

Amino acid usage in PhiKZ

 

To reveal the factors influencing the amino acid composition in
PhiKZ, we also carried out CA on the relative amino acid usage of its 306
proteins. It was found that the first and second major axes of CA accounted for
16.43% and 11.77% of the total variation of the amino acid composition of PhiKZ
proteins, respectively. Next, a linear regression­ analysis between the
positions of the proteins along each of the three axes was carried out with
their MMW, Cys content and aromaticity.

It was found that the first axis was significantly correlated (r=0.478, P<0.01) with the MMW of PhiKZ proteins­ (Fig. 4).
This indicates that PhiKZ proteins located on the positive side of the first axis
should preferentially carry the amino acid residues with the lowest MMW. It was
indeed found that the first axis was positively correlated with each of The second major axis is significantly negatively correlated (r=0.678, P<0.01) with the aromaticity of each PhiKZ protein (Fig. 5). From amino acid frequency analysis, it was also found that all
the aromatic amino acids were rare in PhiKZ proteins (data not shown).
Incidentally, aromatic­ amino acids were also rare in E. coli, T.
maritama
and G. lamblia proteins, and it was suggested that these
amino acids were not incorporated preferentially in proteins­ as their
biosynthesis was energetically expensive for organisms­ [20
22].

Further analysis has shown that the second major axis is also
negatively correlated (r=
0.462, P<0.01) with the Cys content of the PhiKZ proteins (Fig. 6). Interestingly,
among the 306 PhiKZ proteins, 45 proteins do not carry any Cys residue, whereas
19 proteins located at the extreme­ right side in Fig. 6 are found to
contain more than 3% Cys residue. It would be interesting to explore the
contribution of these Cys-rich proteins towards gene regulation as well as the
development of PhiKZ in P. aeruginosa.

To see whether the amino acid usage of PhiKZ is similar­ to that of
its host P. aeruginosa, we also carried out CA on the relative amino
acid usage of P. aeruginosa proteins­ (data not shown). It was found
that the first and second major axes of CA accounted for 20.49% and 14.04% of
the total variation of the amino acid composition of P. aeruginosa
proteins, respectively. Further analysis showed that while the first major axis
is significantly correlated­ with Cys content (r=
0.175, P<0.01), the second­ axis is significantly correlated with grand average of hydropathicity (r=0.898, P<0.01) and the aromaticity­ (r=0.447, P<0.01) of each P.
aeruginosa
protein (data not shown). The data suggest that amino acid usage
of PhiKZ is also distinct from that of its host P. aeruginosa.

Bacteriophages including
PhiKZ are devoid of any protein­ synthesis machinery and depend completely on
the hosts for their protein synthesis and reproduction. To grow in a
genomically distant host, a phage like PhiKZ must evolve its genome in such a
way that it can synthesize­ its proteins easily. From the above codon and amino
acid usage analyses, it is conspicuous that codons of the protein­ coding genes
of PhiKZ have been shaped to incorporate predominantly the smaller amino acid
residues into their proteins during translation in P. aeruginosa. This type
of genomic architecture possibly helps PhiKZ to economize the cost of its
development in P. aeruginosa.

 

References

 

1    Levin
DB, Whittome B. Codon usage in nucleopolyhedroviruses. J Gen Virol 2000, 81:
2313
2325

2    Jenkins
GM, Pagel M, Gould EA, de A Zanotto PM, Holmes EC. Evolution of base
composition and codon usage bias in the genus Flavivirus. J Mol Evol
2001, 52: 383
390

3    Jenkins
GM, Holmes EC. The extent of codon usage bias in human RNA viruses and its
evolutionary origin. Virus Res 2003, 92: 1
7

4    Grantham
R, Gautier C, Gouy M, Jacobzone M, Mercier R. Codon catalog usage is a genome
strategy modulated for gene expressivity. Nucleic Acids Res 1981, 9: r43
r74

5    Ikemura
T. Codon usage and tRNA content in unicellular and multicellular organisms. Mol
Biol Evol 1985, 2: 13
34

6    Sharp
PM, Cowe E. Synonymous codon usage in Saccharomyces cerevisiae. Yeast
1991, 7: 657
678

7    Lesnik
T, Solomovici J, Deana A, Ehrlich R, Reiss C. Ribosome traffic in E. coli
and regulation of gene expression. J Theor Biol 2000, 202: 175
185

8    Ghosh
TC, 9    Gupta
SK, Ghosh TC. Gene expressivity is the main factor in dictating the codon usage
variation among the genes in Pseudomonas aeruginosa. Gene 2001, 273: 63
70

10  Oresic
M, Shalloway D. Specific correlations between relative synonymous codon usage
and protein secondary structure. J Mol Biol 1998, 281: 31
48

11  Xie
T, Ding DF. The relationship between synonymous codon usage and protein
structure. FEBS Lett 1998, 434: 93
96

12  Chiusano
ML, Alvarez-Valin F, di Giulio M, D’Onofrio G, Ammirato G, Colonna G, Bernardi
G. Second codon positions of genes and the secondary structures of proteins.
Relationships and implications for the origin of the genetic code. Gene 2000,
261: 63
69

13  Gupta
SK, Majumdar S, Bhattacharya TK, Ghosh TC. Studies on the relationships between
the synonymous codon usage and protein secondary structural units. Biochem
Biophys Res Commun 2000, 269: 692
696

14  D’Onofrio
G, Ghosh TC, Bernardi G. The base composition of the genes is correlated with
the secondary structures of the encoded proteins. Gene 2002, 300: 179
187

15  Gu
W, Zhou T, Ma J, Sun X, Lu Z. Analysis of synonymous codon usage in SARS Coronavirus
and other viruses in the Nidovirales. Virus Res 2004, 101: 155
161

16  McInerney
JO. Replicational and transcriptional selection on codon usage in Borrelia
burgdorferi
. Proc Natl Acad Sci 17  Romero
H, Zavala A, Musto H. Compositional pressure and translational selection
determine codon usage in the extremely GC-poor unicellular eukaryote Entamoeba
histolytica
. Gene 2000, 25: 307
311

18  19  Basak
S, Banerjee T, Gupta SK, Ghosh TC. Investigation on the causes of codon and
amino acid usages variation between thermophilic Aquifex aeolicus and
mesophilic Bacillus subtilis. J Biomol Struct Dyn 2004, 22: 205
214

20  Lobry
JR, Gautier C. Hydrophobicity, expressivity and aromaticity are the major
trends of amino-acid usage in 999 Escherichia coli chromosome-encoded
genes. Nucleic Acids Res 1994, 22: 3174
3180

21 Garat
B, Musto H. Trends of amino acid usage in the proteins from the unicellular
parasite Giardia lamblia. Biochem Biophys Res Commun 2000, 279: 996
1000

22  Zavala
A, Naya H, Romero H, Musto H. Trends in codon and amino acid usage in Thermotoga
maritima
. J Mol Evol 2002, 54: 563
568

23  Banerjee
T, Basak S, Gupta SK, Ghosh TC. Evolutionary forces in shaping the codon and
amino acid usages in Blochmannia floridanus. J Biomol Struct Dyn 2004,
22: 13
23

24  Naya
H, Zavala A, Romero H, Rodriguez-Maseda H, Musto H. Correspondence­ analysis of
amino acid usage within the family Bacillaceae. Biochem Biophys Res
Commun 2004, 325: 1252
1257

25  Mesyanzhinov
VV, Robben J, Grymonprez B, Kostyuchenko VA, Bourkaltseva MV, Sykilinda NN,
Krylov VN et al. The genome of bacteriophage phiKZ of Pseudomonas
aeruginosa
. J Mol Biol 2002, 317: 1
19

26  Grocock
RJ, Sharp PM. Synonymous codon usage in Pseudomonas aeruginosa PA01. Gene
2002, 289: 131
139

27  Sharp
PM, Li WH. The codon adaptation index
a
measure of directional synonymous codon usage bias, and its potential
applications. Nucleic Acids Res 1987, 15: 1281
1295

28  Wright
F. The ‘effective number of codons’s used in a gene. Gene 1990, 87: 23
29

29  Banerjee
T, 30  Stover
CK, Pham XQ, Erwin AL, 31  Hou
ZC, Yang N. Factors affecting codon usage in Yersinia pestis. Acta
Biochim Biophys Sin 2003, 35: 580
586

32  Kunisawa
T. Synonymous codon preferences in bacteriophage T4: A distinctive­ use of
transfer RNAs from T4 and from its host Escherichia coli.
J Theor Biol 1992, 159: 287
298

33  Sahu
K, 34  Sahu
K, 35  Sharp
PM, Rogers MS, McConnell DJ. Selection pressures on codon usage in the complete
genome of bacteriophage T7. J Mol Evol 21: 150
160

36  Gouy
M. Codon contexts in enterobacterial and coliphage genes. Mol Biol Evol 1987,
4: 426
444

37  Ikemura T. Correlation between codon
usage and tRNA content in microorganisms. In: Hatfield DL, Lee BJ, Pirtle RM
eds. Transfer RNA in Protein Synthesis. 38  Zhou
J, Liu WJ, Peng SW, Sun XY, Frazer I. Papillomavirus capsid protein expression
level depends on the match between codon usage and tRNA availability. J Virol
1999, 73: 4972
4982

39  Kanaya
S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T. Codon usage and tRNA genes in
eukaryotes: Correlation of codon usage diversity with translation­ efficiency
and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol
2001, 53: 290
298

40  Kanaya
S, Yamada Y, Kudo Y, Ikemura T. Studies of codon usage and tRNA genes of 18
unicellular organisms and quantification of Bacillus subtilis tRNAs:
Gene expression level and species-specific diversity of codon usage based on
multivariate analysis. Gene 1999, 238: 143
155

41  Sokal
RR, Sneath PHA. Principles of Numerical Taxonomy.