trish2/trish2 [ Generics ]

[ Top ] [ Generics ]

NAME

   trish2

SYNOPSIS

   trish2 [OPTIONS] DICT

DESCRIPTION

   trish2 searches for strings in a list of fixed string patterns using
   PATRICIA trees. The list of patterns is given one per line in the file DICT,
   trish2 builds a PATRICIA tree with this list of patterns and searches them
   in the queries given as standard input.

OPTIONS

   Help options
     -?, --help
       Display a help text and exit.

     --usage
       Display a very brief help message and exit.

   General options:
     -n, --no-search
       Do not perform any search. This option is used you just want to save the
       PATRICIA tree or get the optimal tree building parameters.

     -D, --save-as=FILE
       Save the PATRICIA tree in a binary form in FILE. This file can be used
       in future uses of trish2 with the --compiled-dict option.

     -P, --optimal-parameters
       Print the optimal tree building parameters to standard output. This
       output can be copied verbatim in the command line of future uses of
       trish2 on the same dictionary. These options will speed up slightly
       the construction PATRICIA tree.

     --version
       Print version and exit.

   Verbosity options:
     -q, --quiet
       Be quiet. trish2 will not display messages about opening files and
       changing the locale.

     -v, --verbose
       Be verbose. trish2 will display additional information about building
       the PATRICIA tree.

     -V, --locace
       Be insanely verbose. trish2 will display anything you do not want to
       know.

     -s, --strict
       Abort at the first warning. Normally trish2 will proceed after a
       warning.

   Search options:
     -i, --input=FILE
       Read queries from file FILE instead of standard input.

     -o, --output=FILE
       Write the result into file FILE instead of standard output.

     -x, --search-max-length=LENGTH
       Set the maximum length of the queries to LENGTH. All characters beyond
       length will be ignored. This is a wide character count, not a byte count.
       Default value is 1024.

     -e, --search-encoding=ENCODING
       Set the encoding of queries and output to ENCODING. Use `locale -a' to
       get the list of encodings supported by your system. Default value is
       `C'.

     -w, --substrings
       Search for queries as well as queries substrings. Normally trish2 will
       look for patterns that match the whole queries only.

     -W, --words
       When --substrings is set, only look for queries substrings immediately
       preceded and immediately followed by a non-word character. Thus trish2
       only looks for substring forming whole words. This option is ignored if
       --substrings is not present.

     -C, --capital-insensitive
       Perform a case insensitive match on the first character of each query.
       If --substrings is set, trish2 will perform a case-insensitive match
       for the first character of each query substring.

     -c, --case-insensitive
       Perform a case insensitive match on all queries characters.

     -G, --general-capital-insensitive
       Perform a generic case insensitive (GCI) match on the first character
       of each query (or each query substring if --substrings is set). See
       --gci-table option for details on GCI. This option is ignored if the
       --gci-table option is not set.

     -g, --general-case-insensitive
       Perform a generic case insensitive (GCI) match on all queries
       characters. See --gci-table option for details on GCI. This option is
       ignored if the --gci-table option is not set.

     -t, --gci-table=PATH
       Read the general case insensitive (GCI) table in file PATH. A GCI table
       tells trish2 which characters match each other. The table is read line
       by line: the first character of a line will match any of the other
       characters in the line. The encoding of the GCI table must be the same
       as the dictionary.

     -Y, --format-yes=FORMAT
       Set the output format for a query found in the dictionary. FORMAT is a
       string that interprets the following escapes:
         \t     tabulation character
         \n     newline
         \\     backslash
         \{     opening curly bracket
         \}     closing curly bracket
         \[     opening square bracket
         \]     closing square bracket

       Additionally trish2 will replace keywords between curly brackets by
       information about the match:
         {query}      The query string that matched a dictionary pattern. Note
                      that if --substrings is set this will be the substring
                      of the query from the start of the match to the end of
                      the query.
         {line}       The line number in the query stream where the query was
                      read.
         {start}      The character offset of the start of the query substring.
                      This is a wide character offset, not a byte offset.
                      Always equal to 0 if --substrings is not set.
         {length}     Length of the match.
         {match}      Matched portion of the query. If --substrings is set
                      this will be the substring of the query from the start
                      to the end of the match.
         {entry}      Dictionary pattern matched by the query.
         {mismatch}   The number of characters matched by case-insensitive or
                      GCI. Equals to 0 in the case of an exact match.
         {tag N}      N is a digit. Display the tag N associated to the pattern.
                      See --tag-delimiters option for details about tags. Note
                      that the first tag is 0.

       If trish2 encounters anything else between curly brackets, then it will
       issue an error.

       If --multi-dict is set, then anything between square brackets will be
       repeated as many times there are copies of the matched pattern. Since
       there is a tag set for each repetition, any {tag N} keywords must be
       enclosed in square brackets.

       The default value is "{match}\n".

     -N, --format-no=FORMAT
       Set the output format for queries that do not match any dictionary
       pattern. FORMAT has the same syntax as --format-yes formats, however it
       will only recognize the `{query}' keyword. The default value is an
       empty string so trish2 will not display anything.

     -h, --show-headers
       Display the dictionary tag headers before performing matches using the
       format set by the --format-yes option.

   Dictionary options:
     -d, --compiled-dict
       Tell trish2 that the dictionary file DICT is a PATRICIA tree in binary
       form. To create such files, use the --save-as option. If DICT is not a
       binary PATRICIA tree then the behaviour is undefined.

     -X, --dict-max-length=LENGTH
       Set the maximum length of a dictionary entry to LENGTH. This is a wide
       character count, not a byte count. All characters beyond length will be
       ignored. If the last tag character was not already read, trish2 will
       issue a malformed entry line warning. Default value is 1024.

     -E, --dict-encoding=ENCODING
       Set the encoding of dictionary to ENCODING. Use `locale -a' to get the
       list of encodings supported by your system. Default value is `C'.

     -a, --tag-delimiters=DELIMITERS
       Associate tag strings to each dictionary pattern. You can associate 1
       to 9 tags to each entry, if you chose so every entry must have the same
       number of tags. DELIMITERS is the string of characters introducing each
       tag. Everything between the start of the line and the first occurrence
       of the first character of DELIMITERS will be considered as the pattern
       by trish2. Everything between the first occurrence of the first
       character of DELIMITERS and the first occurrence of the second
       character of DELIMITERS will be considered as the first tag. And so on.
       The default value is "", so there are no tags at all.

     -m, --multi-dict
       Allows for multiple identical patterns. Normally trish2 will issue a
       warning and ignore a pettern if it is already present in the dictionary.
       If you use tags along with this option, trish2 may store a different
       tag set for each repetition of the pattern. If you use tags without this
       option only the tags associatedd with the first occurrence of the
       pattern in the dictionary will be stored.

     -H, --headers
       Tell trish2 that the first line of DICT indicate the pattern and tag
       headers. The first line will be split with the delimiters set by
       --tag-delimiters, the first substring will be the pattern header name,
       the second will be the first tag header name, and so on.

   Optimal tree building options. These options indicate trish2 the the amount
   of memory to allocate for the PATRICIA tree. If these parameters are set to
   the right values, trish2 will avoid multiple time costly allocations.
     --opt-mnb=INT
       Number of nodes in the pattern tree.

     --opt-msb=INT
       Total length of tail strings in the pattern tree.

     --opt-tnb=INT
       Number of patterns.

     --opt-tsb=INT
       Number of distinct tags.

     --opt-entries=INT
       Number of nodes in the tag tree.

     --opt-tagsize=INT
       Total length of tail strings in the tag tree.

DIAGNOSTICS

   trish2 will return a non-zero value if there was an error (or a warning
   with --strict option). Otherwise it will return 0 whether there was a
   successful match or not.

WARNINGS

 Informative messages (INFO):

   Computing optimal parameters
     Counting memory usage for the --optimal-parameters option.

   Creating dictionary (single, tags: ...)
   Creating dictionary (multi, tags: ...)
     Building the PATRICIA tree with the indication of either it is a multi
     dictionary or not and the number of tags.

   Entry header: ...
     Name of the pattern entry header.

   Locale set to '...'
     trish2 has swithed to another encoding.

   Opening file '...' for writing
     Opening the file set by the --output option.

   Opening input file '...'
     Opening the file set by the --input option.

   Reading dictionary file '...'
     Reading pattern dictionary file.

   Reading input
     Reading queries from the input file or standard input.

   Reading header line
     Reading the first line of the pattern dictionary as header names.

   Saving dictionary into file '...'
     Saving the tree in the binary file set by --save-as.

   Tag header ...: ...
     Name of the Nth tag.

 Warnings (WARNING):

   Character already has alternatives, line ...
     In the GCI table file, several lines begin with the same character.

   Duplicate entry, line ...
   **** THIS ONE ---> ...
     An entry is already present and the --multi-dict option is not set.

   Illegal character sequence, line ...
   **** THIS ONE ---> ...
     Found an illegal character sequence under the current locale.

   Ill formed entry, line ...
   **** THIS ONE ---> ...
     A line in the dictionary file does not contain all the tag delimiters.

   No GCI table provided, disabling GCI search
     One of --general-capital-insensitive or --general-case-insensitive options
     is set but --gci-table is not set.

ERRORS

 Error messages (ERROR):

   Could not open dictionary file '...'
     Opening the pattern dictionary file was impossible (file does not exist
     or the user is not allowed to).

   Could not open '...' for writing
     Opening the file set by --output in writing mode was impossible, probably
     because the user is not allowed to.

   Could not open file '...'
     Opening the file set by --input was impossible (file does not exist or
     the user is not allowed to).

   Empty file?
     trish2 could not read the header line in the pattern dictionary file.
     This could mean that the file is empty.

   Format yes, char ... : ...
   Format no, char ... : ...
     Generic error in format string indicating the character offset and the
     reason of the error.

   Ill formed header line
     The header line in the pattern dictionary file does not contain all the
     tag delimiters set by --tag-delimiters.

   Unhandled locale '...'
     The locale set by --search-encoding or --dict-encoding is not supported
     by your system.

   There are ... errors in the yes format
   There are ... errors in the no format
     Summary of errors in format strings. See Format errors.

   You must provide a dictionary file name
     Pattern dictionary file name was omitted.
     Format eeror messages:

 Format string errors:
   Curly bracket mismatch
     An opening curly bracket is not closed, or conversely there is a closing
     curly bracket with no opening one.
   Illegal '...'
     This curly bracket keyword is not allowed in the no format.

   Illegal tag number '...'
     In a {tag N}, N is higher or equal than the number of tags. Remember that
     the first tag is 0 so the last one is (nb tags - 1).

   In multi dictionaries, tag display must be between square brackets
     --multi-dict is set so {tag N} must be inside a matching pair of square
     brackets.

   Square bracket mismatch
     Each opening square bracket must have a matching closing square bracket.
     For each closing square bracket, there must be a preceding opening one.
     Also, you cannot open a square bracket until the previous one is not
     closed (no sub-bracketting).

   Square brackets not allowed in format for unmatched queries
     --format-no contains square brackets. If you want to print square
     brackets, escape them with a backslash.

   This dictionary is not multi
     When --multi-dict is not set, the yes format must not contain square
     brackets.

   Unknown '...'
     This curly bracket keyword is unknown to trish2. See --format-yes for a
     list of known curly bracket keywords.

COPYRIGHT

AUTHOR

   Robert Bossy <Robert.Bossy@jouy.inra.fr>

SEE ALSO

   fgrep(1), locale(1), locale(7)