Document Type

Article

Publication Date

12-21-2016

Keywords

Computational Biology, Internet, MicroRNAs, Support Vector Machine, User-Computer Interface

Digital Object Identifier (DOI)

https://doi.org/10.1371/journal.pone.0168392

Abstract

MiRNAs are short non-coding RNAs of about 22 nucleotides, which play critical roles in gene expression regulation. The biogenesis of miRNAs is largely determined by the sequence and structural features of their parental RNA molecules. Based on these features, multiple computational tools have been developed to predict if RNA transcripts contain miRNAs or not. Although being very successful, these predictors started to face multiple challenges in recent years. Many predictors were optimized using datasets of hundreds of miRNA samples. The sizes of these datasets are much smaller than the number of known miRNAs. Consequently, the prediction accuracy of these predictors in large dataset becomes unknown and needs to be re-tested. In addition, many predictors were optimized for either high sensitivity or high specificity. These optimization strategies may bring in serious limitations in applications. Moreover, to meet continuously raised expectations on these computational tools, improving the prediction accuracy becomes extremely important. In this study, a meta-predictor mirMeta was developed by integrating a set of non-linear transformations with meta-strategy. More specifically, the outputs of five individual predictors were first preprocessed using non-linear transformations, and then fed into an artificial neural network to make the meta-prediction. The prediction accuracy of meta-predictor was validated using both multi-fold cross-validation and independent dataset. The final accuracy of meta-predictor in newly-designed large dataset is improved by 7% to 93%. The meta-predictor is also proved to be less dependent on datasets, as well as has refined balance between sensitivity and specificity. This study has two folds of importance: First, it shows that the combination of non-linear transformations and artificial neural networks improves the prediction accuracy of individual predictors. Second, a new miRNA predictor with significantly improved prediction accuracy is developed for the community for identifying novel miRNAs and the complete set of miRNAs. Source code is available at: https://github.com/xueLab/mirMeta.

Rights Information

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 License.

Was this content written or created while at USF?

Yes

Citation / Publisher Attribution

PLoSONE, v. 12, issue 11, art. e0168392

2 Fig1.PNG (79 kB)
Infrastructure of the meta-predictor

2 Fig2.PNG (178 kB)
Non-linear transformations change the distribution of ProMiR prediction scores of all the samples in the D1679 dataset

2 Fig3.PNG (249 kB)
Prediction accuracies of meta-predictors made from various combinations of five individual predictors in the (A) D163 and (B) D1679 datasets

2 Table1.PNG (63 kB)
Possible outputs of five individual predictors

2 Table 2.PNG (140 kB)
Prediction accuracies of five individual predictors in the D163 and D1679 datasets

2 Table 3.PNG (191 kB)
Comparison of true predictions between every two individual predictors for all the 163 positive samples in the D163 dataset

2 Table 4.PNG (192 kB)
Comparison of true predictions between every two individual predictors for all the 168 negative samples in the D163 dataset

2 Table 5.PNG (204 kB)
Comparison of true predictions between every two individual predictors for all the 1679 positive samples in the D1679 dataset

2 Table 6.PNG (200 kB)
Comparison of true predictions between every two individual predictors for all the 674 negative samples in the D1679 dataset

2 Table 7.PNG (269 kB)
Performance of meta-predictors using preprocess-I transformation under multi-fold cross validation and in independent dataset

2 Table 8.PNG (264 kB)
Performance of meta-predictors under multi-fold cross validation and in independent dataset under preprocess-II transformation strategy

2 Table 9.PNG (274 kB)
Performance of meta-predictors under multi-fold cross validation and in independent dataset under preprocess-III transformation strategy

2 Table 10.PNG (153 kB)
Comparison between mirMeta and HetroMirPred

Additional Files

2 Fig1.PNG (79 kB)
Infrastructure of the meta-predictor

2 Fig2.PNG (178 kB)
Non-linear transformations change the distribution of ProMiR prediction scores of all the samples in the D1679 dataset

2 Fig3.PNG (249 kB)
Prediction accuracies of meta-predictors made from various combinations of five individual predictors in the (A) D163 and (B) D1679 datasets

2 Table1.PNG (63 kB)
Possible outputs of five individual predictors

2 Table 2.PNG (140 kB)
Prediction accuracies of five individual predictors in the D163 and D1679 datasets

2 Table 3.PNG (191 kB)
Comparison of true predictions between every two individual predictors for all the 163 positive samples in the D163 dataset

2 Table 4.PNG (192 kB)
Comparison of true predictions between every two individual predictors for all the 168 negative samples in the D163 dataset

2 Table 5.PNG (204 kB)
Comparison of true predictions between every two individual predictors for all the 1679 positive samples in the D1679 dataset

2 Table 6.PNG (200 kB)
Comparison of true predictions between every two individual predictors for all the 674 negative samples in the D1679 dataset

2 Table 7.PNG (269 kB)
Performance of meta-predictors using preprocess-I transformation under multi-fold cross validation and in independent dataset

2 Table 8.PNG (264 kB)
Performance of meta-predictors under multi-fold cross validation and in independent dataset under preprocess-II transformation strategy

2 Table 9.PNG (274 kB)
Performance of meta-predictors under multi-fold cross validation and in independent dataset under preprocess-III transformation strategy

2 Table 10.PNG (153 kB)
Comparison between mirMeta and HetroMirPred

Share

COinS