Prediction of novel precursor miRNAs using a context-sensitive hidden Markov model (CSHMM)

Introduction

This is the companion website to the paper by Sumeet Agarwal, Candida Vaz, Alok Bhattacharya and Ashwin Srinivasan (2010; with the same title as above). This page provides supplementary information to accompany the paper, along with an applet which allows the user to enter an RNA sequence for classification as a miRNA precursor or otherwise. The applet constructs the most likely secondary structure for the given sequence, using a context-sensitive hidden Markov model. Along with the structure, the CSHMM generates a likelihood score for the sequence being a pre-miRNA. The sequence is classified as positive or negative (i.e., as precursor or non-precursor) based on a threshold on this likelihood score. A stand-alone Java version of the CSHMM code, which can be used to scan contiguous genomic regions for pre-miRNA candidates, is also available: CSHMMCode_Basic.tar.gz (this requires WEKA, included in the tarball; please see readme.txt).

Supplementary Information

Additional_file_1 (.PDF, 107 KB) - Methodological details of CSHMM implementation and computational complexity estimation
Additional_file_2 (.PDF, 26 KB) - Analysis of known miRNAs of Chromosome 19
Additional_file_3 (.XLS, 132 KB) - Novel predicted miRNAs of Chromosome 19
Additional_file_4 (.PDF, 15 KB) - Secondary structures of the novel predicted miRNAs on Chromosome 19
Additional_file_5 (.PDF, 23 KB) - Novel miRNAs from small RNA sequence analysis
Additional_file_6 (.PDF, 10 KB) - Secondary structures of the 5 representative novel miRNAs from the sRNA sequence data

The Applet

Instructions

The applet requires Java. If you don't have the Java Runtime Environment (IE users may not need it, as IE generally comes packaged with the Microsoft JVM), you can get it here. The input to the applet must be a RNA/DNA sequence consisting only of the four nucleotides (A, C, G and U/T). Case doesn't matter. If the input sequence contains any character other than the nucleotide letters, an error will be given. If the input sequence is valid, the applet will display the constructed secondary structure, the likelihood value, and the classification as a precursor or non-precursor.
Note: As the input is supposed to be an RNA sequence, DNA sequences, if used as input, will be treated as pre-transcription RNA sequences. This means that 'T' will be treated just like 'U', and G-T pairing will be allowed, just as G-U pairing is allowed for RNA sequences.

Your browser is ignoring the <APPLET> tag!

Please send any feedback to sumeet@iitd.ac.in.