= About CONSERV

CONSERV is a software to find contiguous conserved sequences among nucleic
 or amino acid sequences. It is fast enough for genome-scale data.
It uses the generalized suffix tree data structure and Ukkonen's linear-time
suffix tree construction algorithm. It is written in C language.

= Install

 % make

and copy two executable files "conserv" and "conserv_sort" to anywhere
you like.

In 64-bit environment, you shall edit Makefile to specify 64-bit compliler
and 64-bit compile options.

= Applications

There are two programs:

 conserv      : Finds conserved sequences among sequences.
 conserv_sort : Sorts result of the conserv program.

The main application is named "conserv".
"conserv_sort" is a utility to sort result of the conserv program.

= Usage of "conserv" Program

 Usage: conserv [option...] input files...

 -L nnn              minimum length (default 100)
 -v nnn              minimum number of sequences or groups (default 2)
 -n nnn              minimum appearance count (default 1)
 -i file.fst         read sequence(s) from file.fst
 -c file.fst         read sequence(s) from file.fst, using complemental strand
 -b file.fst         read sequence(s) from file.fst, using both strands
 -t FILTER[=option]  add a input filter, or change filter settings
 -o file             specify output file prefix
 -C | -B             using {complemental | both} strands in all input sequence
                     (except files specifyed with -i or -c or -b options)
 -g +                group mode enabled, and change group
 -g -                group mode disabled, (1 sequence 1 group)
 -g f                group mode enabled, 1 file 1 group
 
 -x mix              mixed output format (default: separate)
 -x {fopen|popen}    input files are {normal files(default)|popen}
 -a | -F             append | overwrite output file
                     (default: not append nor overwrite)
 -S nnn              maximum length to show sequence in output file
                     (0 don't show, -1 show all) (default 50)

 Input files:  *.fst
 Output files: file.seqinfo file.position file.fst (normal),
            or file (-x mix)
 
 default loaded input filter: case=upper
 
 Available input filters:
     case: convert case (uppercase or lowercase)
         case={upper|lower|-}
             upper       uppercase
             lower       lowercase
             -           not convert (disable this filter)
     xmask: mask specified letters
         xmask[=[length][,letters]]
             length      minimum length of repeat
             letters     letters to be masked
         default options: 10,XN (xmask=10,XN)
     only: restricts to specifyed characters
         only[=characters]]
             characters   mask characters
         default options: ACGTUacgtu (only=ACGTUacgtu)

= Usage of "conserv_sort" program

 Usage: conserv_sort [option...] [filename]

 Options:
  -h | --help      print help message
  -verbose         verbose message to stderr
  -more-verbose    more verbose
  -force-overwrite overwrite output files
  -out-path=DIR    output files to DIR (default: same as input files)
  -out-basename=X  basename of output filenames
  -out-ext=X       string added to output filenames (default=.sorted)
  -input-ext=X     input file extension (default=.position)
  -mix | -mixed    mixed mode input
  -DNA             DNA mode: delete same position data
  -sort={+|-|0}    sort order of whole data.   +:ascend -:descend 0: don't sort
  -sortkey=XY...   sort keys(X) and orders(Y): X={L|v|n|s|p|c} Y={+|-|0}
  -isort={+|-|0}   sort order of single data.  +:ascend -:descend 0: don't sort
  -isortkey=XY...  sort keys(X) and orders(Y): X={s|p|c} Y={+|-|0}
  --               end of options (using if capital letter of filename is '-')
 
  -out-stdout      output to stdout
  -out-mixed={yes|no|auto} output mixed mode (default=auto)
  -ext-fst=.FST    fst file extension (default=.fst)
  -ext-seqinfo=.SEQINFO seqinfo file extension (default=.seqinfo)
  -out-ext-position=.POSITON output position file extension
                             (default: same as input file)
  -out-ext-fst=.FST output fst file extension (default: same as -ext-fst=)
  -out-ext-seqinfo=.SEQINFO output seqinfo file extension
                            (default: same as -ext-seqinfo=)
  -out-ext-mixed=.MIXED output mixed file extension
                        (default=mixed or same as input file)
 
  sort keys:
    L     L(length)
    v     v(number of hit sequences or sequence groups)
    n     n(number of hits)
    s     seq. No.
    p     position
    c     strand(complement or not)
 
 Default sort options:
  -sort=+ -sortkey=L-v-n-s+p+c+ -isort=+ -isortkey=s+p+c+
 
 Input files:   file.position and file.fst
                or file and file.fst
                or stdin (only for mixed mode)
 Output files:  file.sorted.position and file.sorted.fst
                or file.sorted and file.sorted.fst
                or stdout (only for mixed mode)
                ("file" can be changed with option -out-basename=)
                (".sorted" can be changed with option -out-ext=)

= Licence

    Copyright (C) 1999-2005 Naohisa Goto <ngoto@gen-info.osaka-u.ac.jp>

   This program is free software; you can redistribute it and/or modify
   it under the terms of the GNU General Public License as published by
   the Free Software Foundation; either version 2 of the License, or
   (at your option) any later version.

   This program is distributed in the hope that it will be useful,
   but WITHOUT ANY WARRANTY; without even the implied warranty of
   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
   GNU General Public License for more details.

   You should have received a copy of the GNU General Public License
   along with this program; if not, write to the Free Software
   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

