Improved prediction of drug-target interactions based on ensemble learning with fuzzy local ternary pattern

1. Abstract 2. Introduction 3. Materials and methods 3.1 Datasets 3.2 Drug substructure characterization 3.3 Position-specific scoring matrix 3.4 Fuzzy local ternary pattern 3.5 Rotation forest 4. Results and discussion 4.1 Evaluation criteria 4.2 Parameter discussion 4.3 Five-fold CV results on four datasets 4.4 Comparison of FLTP and ZMs models 4.5 Comparison with other classifiers 4.6 Comparison with previous methods 5. Conclusions 6. Limitation and future work 7. Author contributions 8. Ethics approval and consent to participate 9. Acknowledgment 10. Funding 11. Conflict of interest 12. Data and software 13. References


Introduction
The identification of DTIs has turned into a focal point of pharmaceutical science to support screening the drug candidates and solving the problems of etiologies. The strikingly improved biochemical technologies have dramatically promoted the process of therapeutic drug discovery. In the last few years, Food and Drug Administration (FDA) has just approved a limited quantity of medicines due to the efficiency issues and harmful side effects [1]. Detecting interacting drug-target pairs is still of great significance to select the promising molecule drugs. The researchers have put much effort into exploring the DTIs based on traditional experiments. Nevertheless, the biochemical methods remain to be expensive and cumbersome. Furthermore, these methods need to face the contingency of serial results. Hence, the novel computer-aided drug development (CADD) models are essential to be constructed for stably and reliably inferring DTIs [2].
With the breakthrough of protein sequencing and drug molecular structure determination technologies, various sorts of databases including PubChem [3], ChEMBL [4], Therapeutic Target Database (TTD) [5], Kyoto Encyclopedia of Genes and Genomes (KEGG) [6], and DrugBank [7] are continuously enriching the public data of target proteins and drug sub-structures. Previously, computational-based prediction models mainly focused on molecular docking, ligand, and data mining [8]. However, there are some limitations in these traditional methods. For instance, the molecular docking method mainly predicts the binding sites by energy and geometry matching, it predicts the affinity of binding sites by computational simulation [9]. This method plays a critical role in determining the mode of drug actions. However, molecular docking requires all proteins in the model to have a complete 3D prediction structure that seriously limits the versatility of the model. The ligand-based method combines the chemical structure and pharmacological activity of a specific object through quantitative-structure activity relationships (QSAR), each model can only predict the relationship of one target [10]. The poor physical interoperability of the single model makes the method is hardly to be widely utilized in large-scale cross prediction. The data mining method collects DTIs by text mining and data matching [11]. The method is limited by mining algorithm and database authority, so it cannot achieve further promotion and application in DTIs prediction. In conclusion, the development of effective and robust models has become the essential requirement of DTIs prediction.
Bolgár et al. [12] proposed Variational Bayesian Multiple Kernel Logistic Matrix Factorization which embedded multiple kernel learning, weighted observation, and graph Laplacian regularization to model DTIs. Shi et al. [13] develop two-layer multiple classifier system (TLMCS) which focuses on fully utilizing heterogeneous features for better predicting DTIs. Xia et al. [14] proposed a novel model namely Self-Paced Learning with Collaborative Matrix Factorization based on weighted low-rank approximation (SPLCMF) to predict DTIs. Specifically, this framework employed regularized least squares to fuse the related networks and reduce the complexity of samples by soft weighting. Yan et al. [15] developed (substructure-drugtarget Kronecker product kernel regularized least squares)

Statistics
Enzyme Ion channel GPCRs Nuclear receptor   Drugs  445  210  223  54   Target proteins  664  204  95  26  Interactions  2926  1467  635  90 SDTRLS model which integrates RLS-Kron model, chemical substructure similarity fusion, and Gaussian Interaction Profile (GIP) kernels to detect interacting drug-target pairs. Cui et al. [16] proposed L2,1-GRMF which is a developed GRMF method to identify the DTIs by combining L2,1norm. Hao et al. [17] construct dual network integrated logistic matrix factorization (DNILMF) for drug structure matrix and target sequence kernel matrix to predict DTIs. We established a novel in silico mothed to infer DTIs within this paper, this method mainly integrates PSSM, FLTP, and RF classifier. Specifically, the target primary sequences are first converted into numerical PSSM metrics which record the frequencies of amino acids that appear in different positions. Then, we employed FLTP approach to excavate the potential characteristics of PSSMs. Subsequently, we merge them and drug fingerprints as entire feature vectors of drug-target pairs. Finally, the full feature describers are fed into rotation forest to detect DTIs. We verified our model on the benchmark data sets, viz. Enzymes, Ion Channels, GPCRs, and Nuclear Receptors by utilizing 5-fold Cross-validation. Furthermore, we compared the established model with another advanced feature describer and various classifiers including LGBM and RF. The different experimental results illustrate that the proposed model has an outstanding effect on predicting DTIs, this model can reliably screen candidates for clinical trials. The flowchart of the established model is depicted in Fig. 1.
Drug and protein interactions were represented as a bipartite graph; drugs and proteins formed the nodes of the graph, and the verified interactions between them were denoted by edges within the graph. In the experiments, all drug-target pairs which are connected by edges are categorized to positive dataset, the other pairs are treated as neg- ative samples. Considering the number of the nodes, the known interactions only take a little account of all relationships of drug-target pairs. Take GPCRs dataset for an example, there are 42,840 (210 × 204) types of relationships exist in the network. However, the interactions which have been certified by biotechnology are 1467 which accounts for 3.42%. Consequently, the down sampling algorithm is employed to extract the same number of negative samples as the positive data set to ensure sample balance. The 1467 positive samples are verified by clinical experiments, and the remaining samples were verified as negative samples. However, it is hardly to ignore the false verification caused by the error in the clinical experiment. Meanwhile, considering the number of negative samples only accounted for 3.55% of the remaining samples, the possibility that positive samples which exist in the remaining samples are assigned to negative data set can be ignored for the huge quantity gap.

Drug substructure characterization
Recently, the molecular fingerprints which contain chemical substructure information can effectively reflect drug structure [20]. It transforms the molecular struc-tures into a series of binary fingerprint sequences by detecting specific fragments in the molecular structure [21]. Although the molecular is divided into several independent parts, it still ensures the integrality of the entire drug structural information [22]. Studies substantiate that the molecular fingerprints inhibit the information loss and accumulated error of screening procedures. Meanwhile, it also reduces the complexity of the calculation in the description process. Specifically, when the fraction matches a molecular substructure, the corresponding position of carrier will be assigned as 1. Mature fingerprint databases provide reliable tools for the generation of molecular fingerprints. We selected the fingerprint map which contains 881 substructures from Pubchem system (https://pubchem.ncbi.nlm.nih.gov/) [23]. Therefore, the describers of drug molecules are completely converted into a series of 881-dimentional Boolean vectors. Fig. 2 gives the transformation of Zanamivir into a fingerprint.

Position-specific scoring matrix
In recent years, various Physico-chemical methods are applied to numerically characterize protein which is composed of 20 types of letters [24]. Position-Specific Scoring Matrix (PSSM) is extensively utilized in protein binding site prediction, protein secondary structure prediction, and protein subcellular localization [25]. In this section, PSSM is employed to excavate the evolutionary information by calculating the probability of an amino acid emerges in a specific location of protein primary sequence. PSSM matrix is showed as follows.
where PSSM is a matrix, where L × 20 denotes the length of the target, and 20 represents the number of amino acids. ℓ i,j represents the evolutionary score that ith residue mutate into jth amino acid during the evolutionary process. After optimizing Position-Specific Iterated Basic Local Alignment Search Tool (PSI-BLAST), parameter e is set to 0.001 and the iteration frequency is set to 3. Fig. 3 gives the example of Lipoprotein Lipase converting into PSSM.

Fuzzy local ternary pattern
Fuzzy Local Ternary Pattern (FLTP) can be utilized to precisely describe the texture feature, and it has a wide application in preventing face spoofing and image tampering areas [26]. For the anti-rotation ability of FLTP, it is also robust to the noise in the image. This method dynamically calculates the threshold based on Weber's law to extract multiple features. Meanwhile, it can be extended to circles and neighborhoods with different radius. In this paper, FLTP is employed to describe the characteristics of PSSMs. The algorithm converts the difference between neighborhood pixels and center pixels into the upper and lower binary codes. The upper binary code can be expressed as F LT P − S upper P,R , the lower one can be expressed as F LT P − S lower P,R . The complete describer is defined as follows: where F LT P − S upper P,R can be calculated as follows.
where F LT P − S lower P,R can be calculated as follows.
where (x c , y c ) represents the circular central pixel, and i i represents the gray value of neighborhood pixel. When calculating the upper binary code, if the gray value of neighborhood pixel is greater than i c + τ , the neighborhood pixel is marked as 1. When calculating the lower binary code, if the gray value of the neighborhood pixel is less than i c − τ , the neighborhood pixel is marked as 0. The gray value of the circular central pixel i i which was generated by the nonlinear interpolation algorithm and dynamic threshold τ are calculated as follows.
Finally, the FLTP feature vector can be obtained as follow.
In this experiment, the radius of the circular domain R = 1, the number of pixels in circular domain P = 8. The upper and lower binary codes are transformed into 256 dimensional vectors respectively. Hence, the entire descriptor of PSSM is a matrix of 1 × 512.

Rotation forest
Rodriguez et al. [27] proposed rotation forest (RF) based on integrated forest [27,28]. This ensemble classifier succeeds in the classification of small-sized data set. Significantly, RF also has good effects on promoting sample difference [29]. Within the experiments, we utilized rotation forest to detect DTIs. Firstly, RF stochastically separates the sample set into L disjoint subsets. Subsequently, Principal Component Analysis (PCA) approaches to convert subsets to generate rotation forest. Finally, send them to different base classifiers for scorning each subtree. The matrix C of n×N is regarded to be the train set containing N features of n samples, and T = (t 1 , t 2 , · · · , t n ) T gathers the labels of different samples. The method has K base classifiers R i . The sequential training steps of the base classifier are as follows.
(I) Follow obtaining the optimized parameter L, dataset P is separated to L disjoint subsets stochastically, each subset has N /L features.
(II) Let P i,j represents jth the subset of P, and C i,j denotes the feature set of P i,j . Then calculate the new training features set C ′ i,j by bootstrap sampling on 75% of C i,j . (III) Execute PCA on C ′ i,j to get the principal com-ponent coefficients a (1) i,j , a (2) i,j , · · · a (mj ) i,j . (IV) These coefficients make up the sparse rotation matrix Q i as: In the process of classification, the possibility that sample x belongs to category t i is d i,j (xQ a i ) yielded by base classifier R i . Afterword, calculate the confidence degrees that x belongs to different classes as follows: Finally, the sample x will be classified in accordance with the degree.

Evaluation criteria
For improving the reliability of the experimental performance, the evaluative indices, viz. accuracy (Acc.), precision (Prec.), sensitivity (Sen.), specificity (Spec.), and Matthews correlation coefficient (MCC) are utilized to analyze the results of 5-fold CV.
Pr ec. = T P T P + F P Sen. = T P T P + F N Spec . = T N T P + F P (15) where true positive (TP) records the aggregate of interacting drug-target pairs which were assigned to positive set; true negative (TN) denotes the sum of non-interacting drugtarget pairs which were assigned to negative set; false positive (FP) is the quantity of non-interacting drug-target pairs which were assigned in positive set; false negative (FN) denotes the count of interacting drug-target pairs which were assigned to negative set. In addition, the receiver operating characteristic (ROC) curves were pictured to visualize the prediction results [30], the area under the curves (AUC) was also attached to ROC for justifying the established model [31]. We also utilized PR curves and AUPR values to indicate the sample balance and model performance.

Parameter discussion
In RF classifier, the main parameters K and L denote the numbers of feature sub-sets and decision trees which affect the classification accuracy. To get the optimal parameters, this paper employs grid-search algorithm to study the influence of parameters on prediction results [32]. When L-value increased from 0 to 38, the experimental results show that the accuracy was increasing, then it decreased sharply. Meanwhile, the accuracy was growing with the increase of K-value. In consideration of the model efficiency, the optimal parameters K and L are set to 18 and 38, respectively. Fig. 4 depicts the prediction accuracy surface with factors of K-value and L-value.

Five-fold CV results on four datasets
To certify the feasibility of the established model and avoid over-fitting, we executed 5-fold CV on four benchmark data sets with the same parameters. Specifically, each data set is separated into 5 equal-sized and disjointed fractions. The independent fractions take turns to be treated as test sets, while the other fractions serve as train sets. Tables 2,3,4,5 display the experimental results of our method on four standard data sets.
The statistics of results has been shown in Ta

Comparison of FLTP and ZMs models
For strictly validating the feature describing ability of fuzzy local ternary pattern (FLTP) method. We constructed the comparative experiment by replacing FLTP descriptors with Zernike Moments (ZMs) descriptors which have strong Rotational Invariance [33,34]. ZMs method is widely utilized in the field of edge detection by extracting global feature information at different scales [35]. Table 7 shows the comparison of ZMs and FLTP with the same classifier. These experimental statistic shows that FLTP method has a significant performance improvement compared with Zernike Moments on benchmarks. The criteria values entirely get promoted on Enzyme, Ion Channel, and GPCRs dataset. Fig. 13 displays the mean ROC curves of FLTP model and ZMs model by an interpolation method. It is noteworthy that the AUC values of FLTP-embedded model are comprehensive greater than ZMs model, and the mean value gaps attain 2.55%, 0.89%, 1.17%, and 4.09%, respectively. The results indicate that our model provides an effective way to characterize PSSM for detecting potential DTIs.

Comparison with other classifiers
Thus far, some machine learning-based classifiers are utilized to identify DTIs. To fairly verify the performance of the proposed model, we embed the state of art support vector machine (SVM) and light gradient boosting machine (LGBM) algorithm into our model with fuzzy lo-cal ternary pattern. Within RF classifier, we set parameters K = 18, L = 38 which was discussed above. The SVM utilized inner product kernel function instead of nonlinear mapping to high dimensional space, it also adopts smallsample learning method to greatly simplify the process of classification and regression. There are 400 experiments with different combinations of parameters c and g were car-  ried out to get the highest accuracy, and we set c-value, g-value to 0.7 and 40, respectively. The kernel of SVM was select as radial basis function (RBF) based on LIBSVM tool. The LGBM method is the improved gradient boosting decision trees (GBDT) algorithm to reduce the time cost and power consumption in industrial applications. After parameter optimizations, the leaves-number, the learning rate, and the training rounds were set to 55, 0.05, and 37, respectively.  Nuclear Receptor data sets. The results indicate that model which embeds RF classifier has higher prediction accuracy. Compared with SVM classifier, the average accuracy promotions of RF are 10.49%, 10.57%, 8.40%, and 15.20%, the accuracy gaps between RF and LGBM are 3.93%, 3.24%, 3.21%, 6.77% on four benchmark dataset. Figs. 15,16 plot the ROC curves of the golden standard datasets based on the rates of 1-specificity against sensitivity. The model which has higher AUC values predict more accurate. As shown in Figs. 15,16, the AUC value gaps of four data sets attain to 0.1051, 0.1162, 0.0944, and 0.2232 between RF and SVM, the value gaps between RF and LGBM attain to 0.1013, 0.1013, 0.0910, and 0.1329, respectively. Therefore, it is considered that the proposed model is more efficient at predicting DTIs.

Comparison with previous methods
So far, numerous advanced models have been established to predict DTIs and assist drug design. In this section, we compared our model with partial state-of-art models for fully evaluating the model performance by adopting 5-fold CV. After experimenting the previous methods such as SIMCOMP [36], DCT [37], Bigram-PSSM [38], LOOP [39] on benchmark datasets. Table 8 gives the comparison of AUC value and AUPR values. It is clearly that the performance of the established model has risen significantly. Although the AUC value of our model is 0.006 lower than LOOP on Ion Channel dataset, the AUC values of Enzyme, GPCRs, and Nuclear Receptors have grown 0.003, 0.004, and 0.034, respectively, and the AUPR values of four benchmark datasets have grown 0.028, 0.014, 0.029, and 0.042, respectively. As a result, the experiments substantiate that the model which combining FLTP descriptors and rotation forest can remarkably enhance the performance of predicting DTIs.

Conclusions
In summary, this paper integrates Position-Specific Scoring Matrix, fuzzy local ternary pattern, and rotation forest as a novel prediction algorithm for identifying the relationships between drugs and targets. Specifically, the fusions which combine FLTP describers of PSSMs and drug molecular fingerprints are fed into RF for inferring DTIs. The mean accuracies of our model were 89.08%, 86.14%, 82.41%, and 78.40% on standard data sets. We

Limitation and future work
Besides achieving more accurate prediction results than previous models, we also noticed the limitations of our model. This section will analyze these limitations from two aspects. On one side, the fuzzy local ternary pattern only describes the local texture characteristics. This feature descriptor is hardly to capture the global information of the sample, which leads to the singleness of the feature of PSSM. To extract more excellent feature vectors, future work will focus on fusion features. We will study a variety of local and global feature extraction methods and combine them to build a prediction model. On the other side, the loss and noise of data samples have a great effect on the accuracy of the model. We will explore two-dimensional data sample filtering algorithms to reduce data noise and improve data robustness. Meanwhile, we will further optimize the parameters to keep the integrity of the samples for accurate prediction. In general, the subsequent work will concentrate on extracting more accurate supervised classifiers and more fusion features which integrate the texture features and contour features of PSSMs. The growth of high  throughput data set will create favorable circumstances and challenges for constructing auxiliary tools to enhance the accuracy of identification.

Author contributions
ZYZ handled the Conceptualization. ZYZ and XKZ performed the methodology, software, and validation. YAH curated the data. WZH, SWZ, and CQY administrated the project. WZH handled the funding acquisition.

Ethics approval and consent to participate
Not applicable.

Acknowledgment
We thank Zhu-Hong You for technical assistance. Thanks to all the peer reviewers for their opinions and suggestions.