|
https://www.abbs.info e-mail:[email protected] ISSN 0582-9879 |
|
Short Communication |
Factors
Affecting Codon Usage in Yersinia pestis
HOU Zhuo-Cheng, YANG Ning*
( College of Animal Science and
Technology, China Agricultural University, Beijing 100094, China )
Abstract The
complete genome of Yersinia pestis which was the causative agent of the
systemic invasive infectious disease classically referred as plague, had been
recently sequenced. In order to have a further insight into the synonymous
codon usage evolution, factors shaping synonymous codon usage pattern of Yersinia
pestis were analyzed in this paper. The coding sequences larger than or
equal to 300 bp were used in codon usage analysis. Though “G”+“C” content in Y.
pestis genome was slightly lower (47.64%), the highly expressed genes
tended to use “C” or “G” at synonymous sites compared with lowly expressed
genes. Conversely, lowly expressed genes tended to prefer “A” or “T” at
synonymous positions. Gene expression level was strongly correlated with the
first axis of the correspondence analysis (COA) (R=0.63, P<0.0001). By the
analyses of the codon usage pattern of highly and lowly expressed genes, it was
confirmed that gene expression level was partially responsible for the codon
usage bias. GC-skew analysis showed that codon usage suffered
replication-transcriptional selection. Codon adaptation index (CAI), frequency
of “C”+“G” at the synonymous third position of codon (GC3s) and the effective
number of codons (Nc) values showed some differences among different gene
length groups. “G”+“C” content of genes was strongly correlated with the first
axis of the COA (R=0.72, P<0.0001). It could be concluded that gene
expressivity, replication-transcriptional selection, gene length and gene
composition constraints were the main affecting factors of codon usage
variation in Y. pestis.
Key words codon usage;
correspondence analysis; gene expression level; coding sequence length; Yersinia
pestis
The fast-growing
data of genomes give us new opportunities to study genome evolution on the
molecular level. It is well known that codon usage pattern is nonrandom and
species-specific, and the inter-genomic variation of the codon usage pattern is
a widespread phenomenon. There were also some reports that different genes have
different codon usage patterns in a same organism[1]. Biased codon usage of
codons might be influenced by various factors, such as translational
selection[2], mutation[3], compositional constraints[4], physical location of
the gene on chromosome[5], replication-translational selection[6],
hydrophobicity of each gene[7], etc. In Y. pestis, the analysis of the codon usage pattern
intrigued researchers greatly, because it was essential for studying major
codon evolution, predicting ORF, and designing primers for PCR.
Yersinia pestis, a Gram-negative bacterium, had been considered as the causative
agent of the systemic invasive infectious disease classically referred as
plague, and had been responsible for three human pandemics: the Justinian
plague, the Black Death, and modern plague. The complete genome of Y. pestis
had been recently published[8]. Many genes in the Y. pestis genome seem
to have been acquired from other bacteria and viruses. There are also evidences
that Y. pestis has undergone large-scale genetic flux. Y. pestis
provides a unique insight into the ways in which new and highly virulent
pathogens evolve.
In this paper,
the Y. pestis codon usage pattern and the main factors that influence
the codon usage of Y. pestis were analysed by using the whole genome
datasets. The aim of this study was to facilitate the further study on
codon evolution, ORF prediction,
and primers designing.
1 Materials
and Methods
The complete DNA
sequences of the Y. pestis genome were downloaded from the Sanger Center
(ftp://ftp.ebi.ac.uk/pub/databases/embl/genomes/Bact-eria/ypestis―CO92). The
length of all used coding region sequences is equal or greater than 300 bp. The
149 pseudogenes and 3 plasmids found in the Y. pestis genome were
excluded from our datasets. 3444 genes (coding region sequences) were totally
analyzed in this study. The coding sequences from the complete genome were
retrieved with a program developed in our lab
(ftp://202.205.81.236/download/soft/applying software/CDsRead).
Relative
synonymous codon usage(RSCU)[9], the effective number of codons(Nc)[10],
frequency of “C” + “G” at the synonymous third position of codon (GC3s) and
correspondence analysis (COA) were calculated by using the program CodonW1.3
[written and provided by Dr. John Peden (Oxford University), see
http://molbiol.ox.ac.uk/cu]. A3s, T3s, G3s, C3s were the distributions of “A”,
“T”, “G” and “C” at the synonymous third position of codons, respectively.
Codon adaptation index(CAI)[11] was calculated by using genes encoding the
ribosomal proteins and elongation factors as the referenced dataset (totally,
71 genes). CAI value had been proved to be the best gene expression theory
value and had been extensively used as a measure of gene expression
level[4,6,7,12]. In this study, CAI value was used as a presumed expression
level. Higher CAI value meant higher codon usage bias and higher gene
expression level[11].
2 Results
2.1 Genome
and gene composition constraint analysis
The “G”+“C”
content could be one of the most important factors in the evolution of genomic
structures[13]. The genome of Y. pestis was slightly compositionally
biased, since its “G”+“C” content was 47.64%. The GC3s values of genes ranged
from 16.5% to 69.8%, with a mean of 47.08% and standard deviation of 7.6%. The
Nc values of different genes in Y. pestis ranged from 28.07 to 61, with
a mean of 50.82 and standard deviation of 4.47. Wright[10] suggested that a plot
of Nc against GC3s could be effectively used in explaining the codon usage
variations among the genes. This method had been used to investigate the
evolution of many genomes[6,14]. If the codon usage of a gene had not suffered
from “G”+“C” composition constraints and natural selection, the Nc value of the
gene would fall on the continuous Nc-plot curve. In Y.pestis, it was found that
although there were a small number of genes lied on the Nc-Plot curve, the Nc
values of most genes fell below the expected Nc-plot curve (Fig.1), which
indicating that compositional constraints had some effects on the codon usage
among the most genes.
