This is the recommended new JAVA-package of "Long Short-Term Memory" for Protein classification (jLSTM_protein). The implementation of the LSTM neural network is the same as in the C package and performs and behaves identically but faster. jLSTM_protein is multithreaded and therefore uses effectively multicore and -processor machines. Using more than one thread results in faster computation as in the C package. Each thread individually computes the gradients with a local weight matrix and updates a global weight matrix performing asynchronous stochastic gradient. jLSTM_protein uses Biojava 1.6 for reading FASTA sequences. The Biojava package is included and needs no separate installation.

(j)LSTM as Logistic Regression with the Spectrum Kernel (new)

LSTM logistic regression / spectrum kernel is a stripped down LSTM which can be interpreted as logistic regression with the spectrum kernel for sequence classification. For any step in the DNA sequence and a given k a k-mer string vector is build and fed into the network. The LSTM architecture is just two memory cells and no input- or output gates. The memory cells are not connected with each other. The squashing function h is the identity function. LSTM in this version weighs important k-mers for the classification and therefore can be used as an additional pattern recognizer based on k-mers.


We offer a first C-package of "Long Short-Term Memory" for Protein classification (LSTM_protein).

Please cite:

Sepp Hochreiter, Martin Heusel, and Klaus Obermayer. "Fast Model-based Protein Homology Detection without Alignment." Bioinformatics 2007; doi: 10.1093/bioinformatics/btm247. Abstract

Updated 08/12/2010

This programm is freely available under the GNU General Public License (GPL).
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.