MEDALS METI Life science integrated database portal site

A Method for Extracting Optimal Sequence Related to Biological Activity

Summary	This software can extract the optimal sequence relating to biological activity, if we have the data composed of sequences and corresponding activity, regardless of definition of characteristic sequence (cited from the original site).
Data type	DNA-motif

Commentary page →

ALN

Summary	Aln is a program for aligning a pair of nucleotide or amino acid sequences or alignments. Aln can even align a nucleotide sequence and a single or a group of protein sequences. This can be used to predict eukaryotic gene structures (protein-coding exons) based on sequence homology with known protein sequences.
Data type	DNA, amino acid sequences

Commentary page →

ALNGG

Summary	ALNGG detects the protein coding gene by genome comparison between two species.
Data type	DNA-sequence , Genome

Commentary page →

ASIAN

Summary	ASIAN is a tool for automatically inferring the relationships between objects from data including redundant information, e.g. expression profiles that were measured for a large number of genes under various conditions.The tool combines cluster analysis, regression analysis, and graphical Gaussian modeling.By inputting your raw data, you can obtain some relationships between objects: the correlation, the grouping, the group number, and the network graph.
Data type	Gene expression profile

Summary

ASIAN is a tool for automatically inferring the relationships between objects from data including redundant information, e.g. expression profiles that were measured for a large number of genes under various conditions.The tool combines cluster analysis, regression analysis, and graphical Gaussian modeling.By inputting your raw data, you can obtain some relationships between objects: the correlation, the grouping, the group number, and the network graph.

Data type

Gene expression profile

Commentary page →

ChIP2LAMP

Summary	Discovery of combinatorial regulations is a key to understand complex gene regulation machineries. Combining this scripts (chip2lamp) with a statistical analysis LAMP allows us to find statistically significant combinations by integrating ChIP-seqs and RNA-seqs. This can handle MACS1/2 result as a ChIP-seq peak caller and Cuffdiff result from RNA-seq.
Data type	ChIP-sequence, RNA-sequence

Commentary page →

COSMOS

Summary	COSMOS can detect somatic structural variations from whole genome short-read sequences. Also, it can be applicable to de novo SV detection in a family trio
Data type	DNA-sequnece

Commentary page →

Dnemulator

Summary	DNemulator is a package for simulating DNA sequencing errors, polymorphisms, cytosine methylation and bisulfite conversion.
Data type	DNA-sequence

Commentary page →

fastapl

Summary	FASTA Perl Loop, a tool for processing multifasta data. Pronounced as "fast apple". and its companion program fastqpl, pronounced "fast Q-ple", for fastq format data.
Data type	DNA-sequence

Commentary page →

GeneDecoder

Summary	GeneDecoder is a gene finding technology for eukaryotes, based on hidden Markov models (HMMs). The algorithm, using dynamic programing method and statistic models trained by annotated genome sequences, divides the input nucleic acid sequence into some meaningful segments.
Data type	DNA-sequence , Eukaryote gene

Commentary page →

GUPPY

Summary	GUPPY is a program to visualize sequence annotation data of the genetic sequence data with graphical layout.
Data type	DNA-sequence

Commentary page →

HEAT

Summary	H-InvDB Enrichment Analysis Tool (HEAT) is a data-mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. This technique is called Gene Set Enrichment Analysis (GSEA), and is popularly used in analyzing results of microarray experiments. Fisher's exact probability is used in statistical tests of HEAT.
Data type	Annotation

Summary

H-InvDB Enrichment Analysis Tool (HEAT) is a data-mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. This technique is called Gene Set Enrichment Analysis (GSEA), and is popularly used in analyzing results of microarray experiments. Fisher's exact probability is used in statistical tests of HEAT.

Data type

Annotation

Commentary page →

LAMPLINK

Summary	The LAMPLINK can detect statistically significant epistatic interactions of two or more SNPs from GWAS data. This software can be used in the same way as the widely used GWAS analysis software PLINK, but LAMPLINK has the additional options for the detection of epistatic interactions with LAMP, which is a multiple testing procedure for combinatorial effects discovery.
Data type	DNA-sequnece

Commentary page →

LAST

Summary	LAST is a software for comparing and aligning sequences, typically DNA or protein sequences. LAST is similar to BLAST, but it copes better with very large amounts of sequence data. It can also report probabilities for every pair of aligned letters, indicating the reliability of each pairing.
Data type	DNA-sequence ,RNA, protein, user-defined alphabet.

Commentary page →

MAFFT

Summary	MAFFT is a multiple sequence alignment program for unix-like operating systems. It offers a range of multiple alignment methods, L-INS-i (accurate; for alignment of
Data type	DNA-sequence alignment

Commentary page →

MDV

Summary	Motif Distribution Viewer (MDV) is a web tool for visualizing the distribution of various motifs around transcription start sites (TSS) on a user-defined set of promoter sequences. The tool can be used on the original site, as well as downloaded to used locally. (cited from original site).
Data type	DNA-motif

Commentary page →

NGSFeatGen

Summary	This software is a framework for predicting true reads from the next generation sequencing data. This software generates several key features: observed count, estimate true count, loglikelihood with entropy penalty, loglikelihood ratio, expectation matching score, and specific correction coefficient.
Data type	DNA-sequence NGS

Commentary page →

paraclu

Summary	Paraclu finds clusters in data attached to sequences(cited from original site).
Data type	DNA-sequence

Commentary page →

PHMMTS

Summary	PHMMTS (Pair Hidden Markov Models on Tree Structures) aligns a sequence of unknown secondary structure to a sequence of known secondary structure (cited from the original site).
Data type	RNA

Commentary page →

PMID-Extractor

Summary	PMID-Extractor allows a user to obtain PubMed IDs (PMIDs) from PDF files or text format files of journal paper in your hand. From Digital Object Identifiers (DOIs, http://en.wikipedia.org/wiki/Digital_object_identifier) or text information (e.g. titles) in the first page of each files, To start using PubMedScan, a paper recommender, PMIDs are required to specify the users' interest. That is the main usage of PMID-Extractor.
Data type	Journal

Summary

PMID-Extractor allows a user to obtain PubMed IDs (PMIDs) from PDF files or text format files of journal paper in your hand. From Digital Object Identifiers (DOIs, http://en.wikipedia.org/wiki/Digital_object_identifier) or text information (e.g. titles) in the first page of each files, To start using PubMedScan, a paper recommender, PMIDs are required to specify the users' interest. That is the main usage of PMID-Extractor.

Data type

Journal

Commentary page →

Prediction program

Summary	Gene prediction program can be deleted region search for the candidate region from the database, which does not contain essential genes or synthetic lethal gene, and alongside a series of adverse genetic traits such as the emergence of delayed growth and deletion mutations.(cited from the project reports)
Data type	DNA-sequence

Commentary page →

PRRN

Summary	PRRN is a multiple sequence alignment program by doubly nested randomized iterative method. PRRN accepts either nucleotide or protein sequences. PRRN repeatedly uses pairwise group-to-group alignment to improve the overall weighted sum-of-pairs score at each iterative step, where the pair weights are introduced to correct for uneven representations of the sequences to be aligned. The strategies of PRRN work most effectively for refining a crude alignment obtained by other more rapid methods, e.g. progressive alignment. (Summarized from the original site)
Data type	DNA, amino acid sequences

Summary

PRRN is a multiple sequence alignment program by doubly nested randomized iterative method. PRRN accepts either nucleotide or protein sequences. PRRN repeatedly uses pairwise group-to-group alignment to improve the overall weighted sum-of-pairs score at each iterative step, where the pair weights are introduced to correct for uneven representations of the sequences to be aligned. The strategies of PRRN work most effectively for refining a crude alignment obtained by other more rapid methods, e.g. progressive alignment. (Summarized from the original site)

Data type

DNA, amino acid sequences

Commentary page →

Recount

Summary	RECOUNT is a software for estimating the true count of Solexa readsbased on a probabilistic model. RECOUNT uses the quality score provided by Solexa and the reads as its input. Typical application of this software is for transcriptome or metagenomic expression analysis (cited from the original site).
Data type	DNA-sequence 　Next Generation Sequencing Data

Commentary page →

SCARNA Local Multiple

Summary	SCARNA_LM (SCARNA Local Multiple) is a local multiple aligner for RNA sequences. It is based on a discriminative pairwise alignment model which incorporates secondary structure features as base pairing probability calculrated by Rfold, and uses an efficient local multiple alignment construction procedure proposed by Phuong et al for local multiple alignment of protein sequences (cited from the original site).
Data type	RNA alignment

Summary

SCARNA_LM (SCARNA Local Multiple) is a local multiple aligner for RNA sequences. It is based on a discriminative pairwise alignment model which incorporates secondary structure features as base pairing probability calculrated by Rfold, and uses an efficient local multiple alignment construction procedure proposed by Phuong et al for local multiple alignment of protein sequences (cited from the original site).

Data type

RNA alignment

Commentary page →

seg-suite

Summary	The seg suite provides tools for manipulating segments and alignments. It uses a format called "seg". This program converts segments or alignments from various formats to seg.
Data type	DNA-sequence , RNA-sequence

Commentary page →

SlideSort

Summary	SlideSort is fast and exact method that can find all similar pairs from a string pool in terms of edit distance (cited from the original site).
Data type	DNA-sequence 　Protein-sequence

Commentary page →

SNP-system

Summary	We developed SNP-system which is unified in common interface and released it on the web.Website( http://www.h-invitational.jp/snps/ ) was closed, and at 22th Mar 2013, The archive opened in MEDALS Archive.
Data type	DNA-sequence

Commentary page →

SPALN

Summary	Spaln is a stand-alone program that maps and aligns a set of cDNA or protein sequences onto a whole genomic sequence in a single job.
Data type	Comparative genomics

Commentary page →

tantan

Summary	tantan is a tool to find cryptic repeats (low complexity and short-period tandem repeats) in DNA, RNA, and protein sequences.The aim of tantan is to prevent false predictions when searching for homologous regions between two sequences. You can get it from the archive page(cited from the original site).
Data type	DNA-sequence ,　RNA, Aminoacid

Commentary page →

toxRank

Summary	toxRnak is an application that displays a list of compounds with an expression profile similar to the expression pattern of the input gene group. Given a signature of query drug, toxRank computes the similarity with reference to expression profiles which are developed rank matrix data. It is possible to generate flexible rank matrix from genes that users consider important for hepatotoxicity and molecular panels obtained with toxBridge.
Data type	-

Summary

toxRnak is an application that displays a list of compounds with an expression profile similar to the expression pattern of the input gene group. Given a signature of query drug, toxRank computes the similarity with reference to expression profiles which are developed rank matrix data. It is possible to generate flexible rank matrix from genes that users consider important for hepatotoxicity and molecular panels obtained with toxBridge.

Data type

-

Commentary page →

Web page checker

Summary	A web page checker. This tool can monitor web pages, and grouping of web pages, highlight changes in a page, and send Email reports.
Data type	Web contents

Commentary page →

MEDALS METI Life science integrated database portal site

DNA, Genome

A Method for Extracting Optimal Sequence Related to Biological Activity

ALN

ALNGG

ASIAN

ChIP2LAMP

COSMOS

Dnemulator

fastapl

GeneDecoder

GUPPY

HEAT

LAMPLINK

LAST

MAFFT

MDV

NGSFeatGen

paraclu

PHMMTS

PMID-Extractor

Prediction program

PRRN

Recount

SCARNA Local Multiple

seg-suite

SlideSort

SNP-system

SPALN

tantan

toxRank

Web page checker