Multivariate Mutual Information based Feature Selection for Predicting Histone Post-Translational Modifications in Epigenetic Datasets
Abstract
Mutual information (MI) has been traditionally employed in many areas including biology to identify the non-linear relationships between features. This technique is particularly useful in the biological context to identify features such as genes, histone post-translational modifications (PTMs), transcriptional factorsetc. In this work, instead of considering the conventional pairwise MI between PTM features, we evaluate multivariate mutual information (MMI) between PTM triplets, to identify a set of outlier features. This enables us to form a small subset of PTMs that serve as principal features for the prediction of the values of any histone PTM across the epigenome. We also compare the principal MMI features with those from the traditional feature selection techniques such as PCA and Orthogonal Matching Pursuit. We predict all the remaining histone PTM intensities using XGBoost based regression on the selected features. The accuracy of this technique is demonstrated on the ChIP-seq datasets from the yeast and the human epigenomes.The results indicate that the proposed MMI based feature selection technique can serve as a useful method across various biological datasets.
Related articles
Related articles are currently not available for this article.