Categories
Articles

ABBS 2005,38(06): Local Sequence Information-based Support Vector Machine to Classify Voltage-gated Potassium Channels

 


Original Paper

Pdf
file on Synergy

omments

Acta Biochim Biophys
Sin 2006, 38: 363-371

doi:10.1111/j.1745-7270.2006.00177.x

Local Sequence Information-based
Support Vector Machine to Classify Voltage-gated Potassium Channels

 

Li-Xia LIU, Meng-Long
LI*, Fu-Yuan TAN, Min-Chun LU, Ke-Long WANG, Yan-Zhi GUO, Zhi-Ning WEN, and Lin
JIANG

 

 

Received: January
20, 2006       

Accepted: March 17,
2006

This work was
supported by the State Key Laboratory of Chemo/Biosensing and Chemometrics, College
of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China

*Corresponding author: Tel,
86-28-89005151; Fax, 86-28-85412356; E-mail, [email protected]

Abstract        In our previous work, we
developed a computational tool, PreK-ClassK-ClassKv, to predict and classify
potassium (K
+) channels.
For K
+ channel prediction (PreK) and classification
at family level (ClassK), this method performs well. However, it does not
perform so well in classifying voltage-gated potassium (Kv) channels (ClassKv).
In this paper, a new method based on the local sequence information of Kv
channels is introduced to classify Kv channels. Six transmembrane domains of a
Kv channel protein are used to define a protein, and the dipeptide composition
technique is used to transform an amino acid sequence to a numerical sequence.
A Kv channel protein is represented by a vector with 2000 elements, and a
support vector machine algorithm is applied to classify Kv channels. This method
shows good performance with averages of total accuracy (Acc), sensitivity (SE),
specificity (SP), reliability (R) and Matthews correlation coefficient (MCC) of
98.0%, 89.9%, 100%, 0.95 and 0.94 respectively. The results indicate that the
local sequence information-based method is better than the global sequence
information-based method to classify Kv channels.

 

Key words        voltage-gated potassium channel;
classification; transmembrane domain; dipeptide composition; support vector
machine

 

Potassium (K+) channels are the most
diverse group of the ion channel family [1,2]. All K
+ channels discovered so far possess a core of
alpha subunits, each comprising either one or two highly conserved pore loop
(P-) domains. K
+ channel subunits containing one P-domain can
be assigned into one of two superfamilies: the 6-transmembrane (TM) domain
superfamily or the 2-TM domain superfamily. The 6-TM domain superfamily can be
further subdivided into conserved gene families: the voltage-gated potassium
(Kv) channels and the calcium-activated potassium channels [3,4]. The 2-TM
domain family comprises inward-rectifying potassium channels. Moreover, K
+ channel subunits containing two P-domains are
usually highly regulated K
+ selective leak (“Leak” K+) channels. The architectures
of K
+ channels are shown in Fig. 1.

Kv channels are the
largest family of K
+ channels. They play important roles in
shaping action potentials in neurons and modulating the electrical activity of
excitable membranes. Mutations in Kv genes can lead to severe diseases, such as
long QT syndrome and epilepsy [5
8]. Thus, Kv
channels have been considered as possible targets for drug design. In terms of
the diversity of genes encoding homologous subunits, Kv channels are classified
into five major subfamilies, Kv1 (Shaker), Kv2 (Shab), Kv3 (Shaw),
Kv4 (Shal) and Kv7 (KCNQ) [9,10], and proteins in these
subfamilies are functionally different. A Kv channel is a tetramer consisting
of four subunits and only members of the same subfamily can bind to each other
to form a functional channel [11]. The results of multiple sequence alignment
by ClustalW (http://www.ebi.ac.uk/clustalw/)
show that the sequences of Kv channel proteins are similar, and the similarity
within each subfamily (similarity score above 50%) is higher than that between
each subfamily (similarity score below 31%) (Table 1).

A well-developed
specific database for Kv channels, the voltage-gated potassium channel database
(VKCDB), has been accomplished by Li and Gallin [12]. It is the data source of
our study and it is freely accessible at http://vkcdb.biology.ualberta.ca.

At present, many simple
and widely used methods to classify a protein are mostly based on the global
sequence. Considering such methods, we developed a computational tool,
PreK-ClassK-ClassKv, which is a global sequence information-based support
vector machine method, to predict and classify K
+ channels.
A flowchart of this tool is shown in Fig. 2. It is also freely available
at http://chem.scu.edu.cn/pregapp.
For K
+ channel prediction (PreK) and classification
at family level (ClassK), this method shows good performance with averages of
total accuracy (Acc), sensitivity (SE), specificity (SP), reliability (R) and
Matthews correlation coefficient (MCC) of 97.7%, 92.0%, 98.4%, 0.95 and 0.93,
respectively, but it is not suitable for Kv channel classification (ClassKv) with
the averages of Acc, SE, SP, R and MCC of 95.7%, 81.2%, 85.8%, 0.88 and 0.81,
respectively. Instead of global sequence information, a method based on local
sequence information has been put forward by Sadka and Linial to classify
membranous proteins [13]. In their work, they used only the amino acid
composition of their TM domains to create profiles of membranous protein
families, and their method reached good performances. In this paper, we tried
the local sequence information-based method to classify Kv channels. The
multiple sequence alignment result of Kv channels shows that the six TM domains
of Kv channels are structurally more conserved and biologically more important
than other regions. Therefore, the features extracted from these specific
domains will be superior to those from full-length sequences [14,15].

The work of identifying
a Kv channel protein is comprised of three main steps. First, locate six TM
domains of a Kv channel protein by a TM (transmembrane segments) prediction
tool. Second, represent this protein by a vector with 2000 elements using the
dipeptide composition technique. Finally, put the vector into a support vector
machine (SVM) classifier to determine its class.

A variety of evaluating
indices are introduced to evaluate this method, including Acc, SE, SP, R and
MCC. Compared with the prediction results of the global sequence, we conclude
that those six TM domains of Kv channels indeed contain more classification
information.

 

 

Materials and Methods

 

Features for
classification of Kv channels

 

The architecture of Kv
channels is shown in Fig. 1(A) [16,17]. All members of Kv channels
contain six TM segments (S1
S6) per
subunit, and four identical subunits surrounding a central ion-conduction pore.
Segments S5 and S6 form the pore and determine ion selectivity, whereas
segments S1
S4 form the
voltage sensors [18
20]. All Kv
channels have a highly unusual “S4 sequence” in which lysine or
arginine appears in every third or fourth amino acid [3,9]. The P-domain
between S5 and S6 with the signature sequence “TVGYG” is conserved,
for the reason that the alternating glycine amino acids permit the required
dihedral angles, the threonine hydroxy oxygen atom coordinates to a K
+ ion, and the side-chains of valine and
tyrosine are directed into the protein core surrounding the filter to impose
geometric constraint [21].

The six TM domains
contain the most significant information of a Kv channel, so these domains with
less redundant information are considered to represent a Kv channel. From S1 to
S4, each TM domain is defined as a feature. However, because segments S5 and
S6, as well as the P-domain between them, are closely related, these three
domains were combined as one feature. The results of multiple sequence
alignment by ClustalW indicated that each domain (S1, S2, S3, S4 or S5-pore-S6)
was well conserved within each subfamily, and it was different from other
subfamilies (Table 2). The similarity score of S1, for example, was 78,
79, 70, 85 and 69 within subfamily Kv1, Kv2, Kv3, Kv4 and Kv7, respectively,
but the score between each subfamily was less than 50; and the results for the
other four domains are similar (Table 2).

To represent a Kv
channel protein using these five features, we initially transformed the amino
acid sequence of each domain to numbers using the dipeptide composition
technique, thus each domain was represented by a vector with 400 elements. A Kv
channel protein was represented by a vector with 2000 elements.

 

Datasets

 

Experimental knowledge
about Kv channels is still rare due to the technical difficulties in producing,
purifying and crystallizing TM proteins. Most data in protein databases, even
in generally accepted databases such as SwissProt and GenBank, are
non-experimental. However, VKCDB is a well-developed database available for Kv
channel research. Datasets used in this work were obtained from VKCDB. It
provides two kinds of database, one containing the full-length proteins at http://vkcdb.biology.ualberta.ca/vkcindex.html,
and the other containing proteins only described by their T1 and six TM domains
at http://vkcdb.biology.ualberta.ca/alignment.html.
In our previous work, we used the former to construct ClassKv; and here we use
the latter. This database contains 81, 28, 37, 46 and 37 proteins in the Kv1,
Kv2, Kv3, Kv4 and Kv7 families, respectively. Two proteins in the Kv7 family,
identified as VKC455 and VKC467, were excluded from the datasets due to their
absence of S1 domains, thus 35 proteins in the Kv7 family remained. Each of the
proteins contains five parts, S1, S2, S3, S4 and S5-pore-S6. T1 domains were
ignored here because they are absent in the Kv7 family.

Removing redundant sequences
by the commonly used sequence clustering software CD-HIT (version  

Dipeptide composition

 

Dipeptide composition is
a length fixing technology widely used in bioinformatics [24
27]. The dipeptide composition of each protein is
calculated by Equation 1:

 

Eq.  1

 

where Fdip(i) is the fraction of dip(i) that is the ith
dipeptide out of 400 dipeptides; Ndip(i) is the total number of dip(i); and Ndip is
the total number of all dipeptides.

With the dipeptide
composition sequence coding scheme, a Kv channel protein can be represented by
a vector with 2000 elements. The base and method to transform a Kv channel
protein from an amino acid sequence to a numerical sequence were detailed as
described above. Thus the vectors representing proteins can be put into SVMs
for prediction.

 

SVM

 

SVM is a statistical
learning theory-based machine learning algorithm developed by Vapnik [28,29],
with many successful applications in protein research, such as protein
classification [25,30
36],
structural prediction [37
40], TM
segment prediction [41], and pharmaceutical data analysis [42,43]. The SVM
algorithm is essentially a binary classifier, although it can be extended to
handle multiple classes. For a two-class problem, samples are described by the
feature vectors x
i (i=1,2,…,N)
with corresponding labels y
i={+1,1} (i=1,2,…,N). In this particular
study, a Kv family is defined as one class (labeled as +1) and all of the other
families are defined as the other class (labeled as
1). Mapping the input samples into a high-dimensional
space in which the two sample sets are supposed to be linearly separable, the
SVM aims to find the maximal margin hyperplane separating the two sets. The
hyperplane (determined by coefficient ai and b)
can be obtained by solving the following convex quadratic programming (QP)
optimization problem:

 

Maximize    Eq. 2

 

subject to
Eq. 3

where C is a
parameter that controls the trade-off between margin and classification error. K(x
i,xj) is a kernel
function, which is the inner product of input samples when they are mapped to a
high-dimensional space. In this study, radial basis function is selected and
given as

 

Eq. 4

 

where s is the width of the kernel function.

The distance from an
unlabeled test sample to the hyperplane can be calculated by the decision
function

 

Eq. 5

 

where b is a
constant used to balance the SVM outputs. The sign of the distance f(x)
is used to judge whether a protein belongs to a Kv family or not.

For multi-class problem,
the “one against the others” method is used [44]. In this algorithm, n
hyperplanes are constructed, where n is the number of classes. Each
hyperplane separates one class from the other classes. In this way, we get n
decision functions

 

Eq. 6

 …

Eq. 7

 

The class of a test
sample x is given by arg max
n fn(x), that is, the class with the largest decision function.

In this work, five SVMs
were constructed. The radial basis function was selected as the kernel function
and the QP method was introduced to solve the optimization problem. A training
process aimed to find out the best values of C (regulatory parameter)
and
s (kernel width
parameter) and to
optimize two parameters a and b that are coefficients determining the
classification hyperplane. Parameters a and b were optimized by the QP method
automatically. Before optimizing a and b, C and
s should be determined. We divided the training datasets
into two parts, one for training models (i.e., optimizing parameters) and the
other for validating. First, we put initial C1 (0.5) and
s1 (0.5) into the SVM, and using datasets for
training models, the QP method gives the optimized a1 and b1. Then we use the
validating datasets to evaluate this SVM (determined by C1,
s1, a1 and b1). If it shows good performances, we
will decide an SVM model by parameters C1,
s1, a1 and b1; if it does not, we will adjust the values
of C and
s by adding
with a fixed step 0.5 until the SVM gives satisfactory results when evaluated
by validating datasets. Once parameters C,
s, a and b are determined, an SVM model is constructed.

When an unknown Kv
channel is queried, it is required to be computed in every SVM model in turn
and each model gives a result. If all the results are negative, this protein
will be classified into “other Kv channels”; if not, it will be
classified into one family according to the maximum result. The program was
written in Matlab 7.0.

 

Evaluation index

 

Five indices were
introduced to evaluate this method: Acc, SE, SP, R and MCC. Details of these
indices are listed in Table 3.

Acc [40] just gives an
overall view of performance and it does not show the differences between
performance of detecting positive samples and negative samples. There may be
very significant differences, therefore, separate indices (SE and SP) [40] are
calculated for this purpose. However, SE or SP alone does not indicate how
reliable such predictions are. For this reason, two combined indices (R and
MCC) [45,46] are also used.

 

 

Results

 

To construct the SVM
models to classify Kv channels, three main steps were followed: first, we transformed
each domain of Kv channel proteins in training datasets into a vector with 400
elements using the dipeptide composition technique; second, we represented a Kv
channel protein by a vector with 2000 elements; third, we put vectors
representing Kv channel proteins into SVMs to train and determine the
parameters C,
s, a and b.
Once C,
s, a and b
were decided, the SVMs were constructed.

The leave-one-out
cross-validation test [47] was used to train and validate the models, that is,
a Kv channel protein was selected as the test sample to validate models and the
remaining Kv channel proteins were used to train the SVMs. Then all proteins in
training datasets were rotated. This avoided reusing the training sequences in
validating.

 

Model training and evaluating

 

To predict the subfamily
of Kv channels, five binary SVMs were constructed. They were trained and
validated by the leave-one-out cross-validation test. The parameters C and
s were 5 and 0.5, respectively, for all of these
five models, and parameters a and b were optimized by the QP method. Table 4
shows the performance of the SVMs in classifying Kv channels using the
leave-one-out cross-validation test.

From Table 4, we
can see that Acc, SE and SP for Kv7 are all 100%, and R and MCC are both 1.00.
The results suggest that the Kv7 family can be easily discriminated from other
Kv families, because of its lower sequence identity to Kv genes than that
between members of other Kv families [10]. This conclusion is also supported by
Table 1, which shows the average similarity score between Kv7 and other
Kv families is 13%. All 100% specificities mean that no non-Kvx channels
are misclassified as Kvx channels. The combined indices R (all above 0.91)
and MCC (all above 0.90) achieve good performances in this method.

As a result, SVM models
for Kv channel classification were constructed by deciding parameters a and b
based on C (5 for all models) and
s (0.5 for all
models), using 37 Kv1 channels and 63 non-Kv1 Kv channels, 16 Kv2 channels and
84 non-Kv2 Kv channels, 18 Kv3 channels and 82 non-Kv3 Kv channels, 15 Kv4
channels and 85 non-Kv4 Kv channels, 14 Kv7 channels and 86 non-Kv7 Kv channels
as training datasets.

 

Performance on a test
dataset

 

Using the leave-one-out
cross-validation test, the models to classify Kv channels were constructed and
showed good performances. However, it was necessary to evaluate these models on
a test dataset to demonstrate the unbiased performance. The sequences of the
test dataset were used neither for training nor for validating during the
process of modeling. In this study, the results evaluated using the test
dataset are given in Table 5. All total accuracies achieve 100%.

 

Comparison with the
global sequence information-based SVM method

 

To compare this method,
based on local sequence information, with that based on global sequence
information, the results of global sequence information-based SVM method are
listed in Tables 6 and 7. Global sequence information-based SVM
method was carried out using the following two steps: represent a Kv channel
protein by a vector with 400 elements using the dipeptide composition
technique, then put the vector into the SVM classifier (ClassKv) to assign it a
class. Comparing the results given in Tables 4 and 5 with those
given in Tables 6 and 7, we can conclude that the performances of
the local sequence information-based SVM method are better than those of the
global sequence information-based SVM method.

 

 

Discussion

 

We conclude that a Kv
channel protein can be identified using its important regions that play roles
in its function, by leaving out redundant information. The functions of Kv
channels are closely dependent on the six TM domains and the P-domain between S5
and S6. The results of multiple sequence alignment by ClustalW (Table 2)
also indicate that each domain (S1, S2, S3, S4 and S5-pore-S6) is more
conserved within each subfamily, but different from other subfamilies.
Therefore, we defined a Kv channel protein using those domains with less
redundant information. As a result, the performances of the local sequence
information-based SVM method were superior to those of the global sequence
information-based SVM method. This local sequence information-based method can
also be applied to other TM proteins whose TM domains play important roles and
contain rich information.

Due to the experimental
limitations, the available Kv channels containing transmembrane domain
information are still sparse. In our work, we could only obtain 81, 28, 37, 46
and 37 proteins in the Kv1, Kv2, Kv3, Kv4 and Kv7 families, respectively. After
removing the redundant sequences, there were only 37, 16, 18, 15 and 14
proteins in the Kv1, Kv2, Kv3, Kv4 and Kv7 families, respectively, in the training
datasets. So we introduced the SVM algorithm, capable of solving small sample
problems, to handle this classification problem based on a small number of
sequences. The results of the leave-one-out cross-validation test show the good
performances of SVM. With the experimental data increasing, our method will be
further identified.

From Table 5, we
see that all total accuracies of the local sequence information-based SVM
method on a test dataset are 100%. To some extent, such perfect results were
obtained because the sequences in the test dataset have some similarity.
However, the testing can also, to some extent, imply the performances of this
method. In addition to the testing results, the leave-one-out cross-validation
test (Table 4) also attained good performances. So we are convinced that
this method is good for Kv channel classification.

There are several TM
prediction tools available at present, such as TMHMM [48,49], HMMTOP [50,51],
PHDhtm [52], DAS [53], SOSUI [54], PRED-TMR [55], and ConPred elite [56]. All
of these tools have been proven to be of high quality in a comprehensive field.
However, when applied to the field of Kv channels, they were not perfect. In
order to locate these six TM domains of a Kv channel more exactly, biological
knowledge of Kv channels should be added to TM prediction tools. We are engaged
in solving this problem in our present work.

 

 

Acknowledgements

 

We thank Prof. Gábor E.
Tusnády ( 

 

References

 

 1   Perney TM, Kaczmarek LK. The molecular biology
of K+ channels. Curr Opin Cell Biol 1991, 3: 663
670

 2   Luneau C, Wiedman R, Smith JS, Williams JB.
Shaw-like rat brain potassium channel cDNAs with divergent 3 ends. FEBS
Lett 1991, 288: 163
167

 3   Miller C. An overview of the potassium
channel family. Genome Biol 2000, 1: reviews 0004.1
0004.5

 4   Ashcroft FM. Voltage-gated K+ channels. In: Conner M ed. Ion Channels and
Disease.  5   Lehmann-Horn F, Jurkat-Rott K. Voltage-gated
ion channels and hereditary disease. Physiol Rev 1999, 79: 1317
1372

 6   Jentsch TJ. Neuronal KCNQ potassium channels:
Physiology and role in disease. Nat Rev Neurosci 2000, 1: 21
30

 7   Kaczorowski GJ, Garcia ML. Pharmacology of
voltage-gated and calcium-activated potassium channels. Curr Opin Chem Biol
1999, 3: 448
458

 8   Pongs O. Voltage-gated potassium channels:
From hyperexcitability to excitement. FEBS Lett 1999, 452: 31
35

 9   Pongs O. Molecular biology of
voltage-dependent potassium channels. Physiol Rev 1992, 72: S69
S87

10  Coetzee WA, Amarillo Y, Chiu J, Chow A, Lau D, McCormack T, Moreno
H et al. Molecular diversity of K+ channels. Ann NY Acad Sci 1999, 868: 233
285

11  MacKinnon R. Potassium channels. FEBS Lett 2003, 555: 6265

12  Li B, Gallin WJ. VKCDB: Voltage-gated potassium channel database.
BMC Bioinformatics 2004, 5: 3

13  Sadka T, Linial M. Families of membranous proteins can be
characterized by the amino acid composition of their transmembrane domains.
Bioinformatics 2005, 21: i378
i386

14  Di Francesco V, Garnier J, Munson PJ. Improving protein secondary
structure prediction with aligned homologous sequences. Protein Sci 1996, 5:
106
113

15  Chakrabarti S, Sowdhamini R. Regions of minimal structural
variation among members of protein domain superfamilies: Application to remote
homology detection and modelling using distant relationships. FEBS Lett 2004,
569: 31
36

16  Yellen G. The bacterial K+ channel structure and its implications for neuronal channels. Curr
Opin Neurobiol 1999, 9: 267
273

17  Patten CD, Caprini M, Planells-Cases R, Montal M. Structural and
functional modularity of voltage-gated potassium channels. FEBS Lett 1999, 463:
375
381

18  Jiang YX, Lee A, Chen JY, Ruta V, Cadene M, Chait BT, MacKinnon R.
X-ray structure of a voltage-dependent K+ channel. Nature 2003, 423: 33
41

19  Haris PI. Structural model of a voltage-gated potassium channel
based on spectroscopic data. Biochem Soc Trans 2001, 29: 589
593

20  Van de Voorde A, Tytgat J. Transmembrane segments critical for potassium
channel function. Biochem Biophys Res Commun 1995, 209: 1094
1101

21  MacKinnon R. Potassium channels and the atomic basis of selective
ion conduction (Nobel Lecture). Angew Chem Int Ed Engl 2004, 43: 4265
4277

22  Li W Z, Jaroszewski L, Godzik A. Clustering of highly homologous
sequences to reduce the size of large protein database. Bioinformatics, 2001,
17: 282
283

23  Li W Z, Jaroszewski L, Godzik A. Tolerating some redundancy
significantly speeds up clustering of large protein databases. Bioinformatics,
2002, 18: 77
82

24  Bhasin M, Raghava GPS. Classification of nuclear receptors based on
amino acid composition and dipeptide composition. J Biol Chem 2004, 279: 23262
23266

25  Bhasin M, Raghava GPS. GPCRpred: An SVM-based method for prediction
of families and subfamilies of G-protein coupled receptors. Nucleic Acids Res
2004, 32: W383
W389

26  Reczko M, Bohr H. The DEF data base of sequence based protein fold
class predictions. Nucleic Acids Res 1994, 22: 3616
3619

27  Huang N, Chen H, Sun ZR. CTKPred: An SVM-based method for the
prediction and classification of the cytokine superfamily. Protein Eng Des Sel
2005, 18: 365
368

28  Vapnic VN. Statistical Learning Theory. 29  Haykin S. Support Vector Machines. In: Prentice-Hall eds. Neural
Networks: A Comprehensive Foundation. 30  Dobson PD, Doig AJ. Predicting enzyme class from protein structure
without alignments. J Mol Biol 2005, 345: 187
199

31 Wang M, Yang J, Liu GP, Xu ZJ, Chou KC. Weighted-support vector
machines for predicting membrane protein types based on pseudo-amino acid
composition. Protein Eng Des Sel 2004, 17: 509
516

32  Cai YD, Zhou GP, Chou KC. Support vector machines for predicting
membrane protein types by using functional domain composition. Biophys J 2003,
84: 3257
3263

33  Yang ZR, Chou KC. Bio-support vector machines for computational
proteomics. Bioinformatics 2004, 20: 735
741

34  Karchin R, Karplus K, Haussler D. Classifying G-protein coupled
receptors with support vector machines. Bioinformatics 2002, 18: 147
159

35  Guo YZ, Li ML, Wang KL, Wen ZN, Lu MC, Liu LX, Jiang L. Fast
fourier transform-based support vector machine for prediction of G-protein
coupled receptor subfamilies. Acta Biochim Biophys Sin 2005, 37: 759
766

36  Guo YZ, Li ML, Lu MC, Wen ZN, Wang KL, Li GB, Wu J. Classifying
GPCRs and NRs based on protein power spectrum from fast Fourier transform.
Amino Acids 2006, in press

37  Cai YD, Liu XJ, Xu XB, Chou KC. Prediction of protein structural
classes by support vector machines. Comput Chem 2002, 26: 293
296

38  Bhasin M, Raghava GPS. ESLpred: SVM-based method for subcellular
localization of eukaryotic proteins using dipeptide composition and PSI-BLAST.
Nucleic Acids Res 2004, 32: W414
W419

39  Hua SJ, Sun ZR. Support vector machine approach for protein
subcellular localization prediction. Bioinformatics 2001, 17: 721
728

40  Garg A, Bhasin M, Raghava GPS. Support vector machine-based method
for subcellular localization of human proteins using amino acid compositions,
their order and similarity search. J Biol Chem 2005, 280: 14427
14432.

41  Yuan Z, Mattick JS, Teasdale RD. SVMtm: Support vector machines to
predict transmembrane segments. J Comput Chem 2004, 25: 632
636

42  Wang ML, 43  Burbidge R, Trotter M, Buxton B, Holden S. Drug design by machine
learning: Support vector machines for pharmaceutical data analysis. Comput Chem
2001, 26: 5
14

44  Chapelle O, Haffner P, Vapnik V. SVMs for Histogram-based Image
Classification. IEEE Transactions on Neural Networks, 1999, special issue on
Support Vectors. IEEE Computational Intelligence Society, 45  Novič M, Zupan J. Investigation of infrared spectra-structure
correlation using kohonen and counterpropagation neural network. J Chem Inf
Comput Sci 1995, 35: 454
466

46  Matthews BW. Comparison of predicted and observed secondary structure
of T4 phage lysozyme. Biochim Biophys Acta 1975, 405: 442
451

47  Chou KC, Zhang CT. Prediction of protein structural classes. Crit
Rev Biochem Mol Biol 1995, 30: 275
349

49  Krogh A, Larsson B, von Heijne G, Sonnhammer EL. Predicting
transmembrane protein topology with a hidden Markov model: Application to
complete genomes. J Mol Biol 2001, 305: 567
580

50  Tusnády GE, Simon I. Principles governing amino acid composition of
integral membrane proteins: Application to topology prediction. J Mol Biol
1998, 283: 489
506

51  Tusnády GE, Simon I. The HMMTOP transmembrane topology prediction
server. Bioinformatics 2001, 17: 849
850

52  Rost B, Fariselli P, Casadio R. Topology prediction for helical
transmembrane proteins at 86% accuracy. Protein Sci 1996, 5: 1704
1718

53  Cserzo M, Wallin E, Simon I, von Heijne G, Elofsson A. Prediction
of transmembrane alpha-helices in prokaryotic membrane proteins: The dense
alignment surface method. Protein Eng 1997, 10: 673
676

54  Hirokawa T, Boon-Chieng S, Mitaku S. SOSUI: Classification and
secondary structure prediction system for membrane proteins. Bioinformatics
1998, 14: 378
379

55  Pasquier C, Promponas VJ, Palaios GA, Hamodrakas JS, Hamodrakas SJ.
A novel method for predicting transmembrane segments in proteins based on a
statistical analysis of the SwissProt database: The PRED-TMR algorithm. Protein
Eng 1999, 12: 381
385

56  Xia JX, Ikeda M, Shimizu T. ConPred elite: A highly reliable
approach to transmembrane topology prediction. Comput Biol Chem 2004, 28: 51
60