trish2/trish2 [ Generics ]
[ Top ] [ Generics ]
NAME
trish2
SYNOPSIS
trish2 [OPTIONS] DICT
DESCRIPTION
trish2 searches for strings in a list of fixed string patterns using PATRICIA trees. The list of patterns is given one per line in the file DICT, trish2 builds a PATRICIA tree with this list of patterns and searches them in the queries given as standard input.
OPTIONS
Help options
-?, --help
Display a help text and exit.
--usage
Display a very brief help message and exit.
General options:
-n, --no-search
Do not perform any search. This option is used you just want to save the
PATRICIA tree or get the optimal tree building parameters.
-D, --save-as=FILE
Save the PATRICIA tree in a binary form in FILE. This file can be used
in future uses of trish2 with the --compiled-dict option.
-P, --optimal-parameters
Print the optimal tree building parameters to standard output. This
output can be copied verbatim in the command line of future uses of
trish2 on the same dictionary. These options will speed up slightly
the construction PATRICIA tree.
--version
Print version and exit.
Verbosity options:
-q, --quiet
Be quiet. trish2 will not display messages about opening files and
changing the locale.
-v, --verbose
Be verbose. trish2 will display additional information about building
the PATRICIA tree.
-V, --locace
Be insanely verbose. trish2 will display anything you do not want to
know.
-s, --strict
Abort at the first warning. Normally trish2 will proceed after a
warning.
Search options:
-i, --input=FILE
Read queries from file FILE instead of standard input.
-o, --output=FILE
Write the result into file FILE instead of standard output.
-x, --search-max-length=LENGTH
Set the maximum length of the queries to LENGTH. All characters beyond
length will be ignored. This is a wide character count, not a byte count.
Default value is 1024.
-e, --search-encoding=ENCODING
Set the encoding of queries and output to ENCODING. Use `locale -a' to
get the list of encodings supported by your system. Default value is
`C'.
-w, --substrings
Search for queries as well as queries substrings. Normally trish2 will
look for patterns that match the whole queries only.
-W, --words
When --substrings is set, only look for queries substrings immediately
preceded and immediately followed by a non-word character. Thus trish2
only looks for substring forming whole words. This option is ignored if
--substrings is not present.
-C, --capital-insensitive
Perform a case insensitive match on the first character of each query.
If --substrings is set, trish2 will perform a case-insensitive match
for the first character of each query substring.
-c, --case-insensitive
Perform a case insensitive match on all queries characters.
-G, --general-capital-insensitive
Perform a generic case insensitive (GCI) match on the first character
of each query (or each query substring if --substrings is set). See
--gci-table option for details on GCI. This option is ignored if the
--gci-table option is not set.
-g, --general-case-insensitive
Perform a generic case insensitive (GCI) match on all queries
characters. See --gci-table option for details on GCI. This option is
ignored if the --gci-table option is not set.
-t, --gci-table=PATH
Read the general case insensitive (GCI) table in file PATH. A GCI table
tells trish2 which characters match each other. The table is read line
by line: the first character of a line will match any of the other
characters in the line. The encoding of the GCI table must be the same
as the dictionary.
-Y, --format-yes=FORMAT
Set the output format for a query found in the dictionary. FORMAT is a
string that interprets the following escapes:
\t tabulation character
\n newline
\\ backslash
\{ opening curly bracket
\} closing curly bracket
\[ opening square bracket
\] closing square bracket
Additionally trish2 will replace keywords between curly brackets by
information about the match:
{query} The query string that matched a dictionary pattern. Note
that if --substrings is set this will be the substring
of the query from the start of the match to the end of
the query.
{line} The line number in the query stream where the query was
read.
{start} The character offset of the start of the query substring.
This is a wide character offset, not a byte offset.
Always equal to 0 if --substrings is not set.
{length} Length of the match.
{match} Matched portion of the query. If --substrings is set
this will be the substring of the query from the start
to the end of the match.
{entry} Dictionary pattern matched by the query.
{mismatch} The number of characters matched by case-insensitive or
GCI. Equals to 0 in the case of an exact match.
{tag N} N is a digit. Display the tag N associated to the pattern.
See --tag-delimiters option for details about tags. Note
that the first tag is 0.
If trish2 encounters anything else between curly brackets, then it will
issue an error.
If --multi-dict is set, then anything between square brackets will be
repeated as many times there are copies of the matched pattern. Since
there is a tag set for each repetition, any {tag N} keywords must be
enclosed in square brackets.
The default value is "{match}\n".
-N, --format-no=FORMAT
Set the output format for queries that do not match any dictionary
pattern. FORMAT has the same syntax as --format-yes formats, however it
will only recognize the `{query}' keyword. The default value is an
empty string so trish2 will not display anything.
-h, --show-headers
Display the dictionary tag headers before performing matches using the
format set by the --format-yes option.
Dictionary options:
-d, --compiled-dict
Tell trish2 that the dictionary file DICT is a PATRICIA tree in binary
form. To create such files, use the --save-as option. If DICT is not a
binary PATRICIA tree then the behaviour is undefined.
-X, --dict-max-length=LENGTH
Set the maximum length of a dictionary entry to LENGTH. This is a wide
character count, not a byte count. All characters beyond length will be
ignored. If the last tag character was not already read, trish2 will
issue a malformed entry line warning. Default value is 1024.
-E, --dict-encoding=ENCODING
Set the encoding of dictionary to ENCODING. Use `locale -a' to get the
list of encodings supported by your system. Default value is `C'.
-a, --tag-delimiters=DELIMITERS
Associate tag strings to each dictionary pattern. You can associate 1
to 9 tags to each entry, if you chose so every entry must have the same
number of tags. DELIMITERS is the string of characters introducing each
tag. Everything between the start of the line and the first occurrence
of the first character of DELIMITERS will be considered as the pattern
by trish2. Everything between the first occurrence of the first
character of DELIMITERS and the first occurrence of the second
character of DELIMITERS will be considered as the first tag. And so on.
The default value is "", so there are no tags at all.
-m, --multi-dict
Allows for multiple identical patterns. Normally trish2 will issue a
warning and ignore a pettern if it is already present in the dictionary.
If you use tags along with this option, trish2 may store a different
tag set for each repetition of the pattern. If you use tags without this
option only the tags associatedd with the first occurrence of the
pattern in the dictionary will be stored.
-H, --headers
Tell trish2 that the first line of DICT indicate the pattern and tag
headers. The first line will be split with the delimiters set by
--tag-delimiters, the first substring will be the pattern header name,
the second will be the first tag header name, and so on.
Optimal tree building options. These options indicate trish2 the the amount
of memory to allocate for the PATRICIA tree. If these parameters are set to
the right values, trish2 will avoid multiple time costly allocations.
--opt-mnb=INT
Number of nodes in the pattern tree.
--opt-msb=INT
Total length of tail strings in the pattern tree.
--opt-tnb=INT
Number of patterns.
--opt-tsb=INT
Number of distinct tags.
--opt-entries=INT
Number of nodes in the tag tree.
--opt-tagsize=INT
Total length of tail strings in the tag tree.
DIAGNOSTICS
trish2 will return a non-zero value if there was an error (or a warning with --strict option). Otherwise it will return 0 whether there was a successful match or not.
WARNINGS
Informative messages (INFO):
Computing optimal parameters
Counting memory usage for the --optimal-parameters option.
Creating dictionary (single, tags: ...)
Creating dictionary (multi, tags: ...)
Building the PATRICIA tree with the indication of either it is a multi
dictionary or not and the number of tags.
Entry header: ...
Name of the pattern entry header.
Locale set to '...'
trish2 has swithed to another encoding.
Opening file '...' for writing
Opening the file set by the --output option.
Opening input file '...'
Opening the file set by the --input option.
Reading dictionary file '...'
Reading pattern dictionary file.
Reading input
Reading queries from the input file or standard input.
Reading header line
Reading the first line of the pattern dictionary as header names.
Saving dictionary into file '...'
Saving the tree in the binary file set by --save-as.
Tag header ...: ...
Name of the Nth tag.
Warnings (WARNING):
Character already has alternatives, line ...
In the GCI table file, several lines begin with the same character.
Duplicate entry, line ...
**** THIS ONE ---> ...
An entry is already present and the --multi-dict option is not set.
Illegal character sequence, line ...
**** THIS ONE ---> ...
Found an illegal character sequence under the current locale.
Ill formed entry, line ...
**** THIS ONE ---> ...
A line in the dictionary file does not contain all the tag delimiters.
No GCI table provided, disabling GCI search
One of --general-capital-insensitive or --general-case-insensitive options
is set but --gci-table is not set.
ERRORS
Error messages (ERROR):
Could not open dictionary file '...'
Opening the pattern dictionary file was impossible (file does not exist
or the user is not allowed to).
Could not open '...' for writing
Opening the file set by --output in writing mode was impossible, probably
because the user is not allowed to.
Could not open file '...'
Opening the file set by --input was impossible (file does not exist or
the user is not allowed to).
Empty file?
trish2 could not read the header line in the pattern dictionary file.
This could mean that the file is empty.
Format yes, char ... : ...
Format no, char ... : ...
Generic error in format string indicating the character offset and the
reason of the error.
Ill formed header line
The header line in the pattern dictionary file does not contain all the
tag delimiters set by --tag-delimiters.
Unhandled locale '...'
The locale set by --search-encoding or --dict-encoding is not supported
by your system.
There are ... errors in the yes format
There are ... errors in the no format
Summary of errors in format strings. See Format errors.
You must provide a dictionary file name
Pattern dictionary file name was omitted.
Format eeror messages:
Format string errors:
Curly bracket mismatch
An opening curly bracket is not closed, or conversely there is a closing
curly bracket with no opening one.
Illegal '...'
This curly bracket keyword is not allowed in the no format.
Illegal tag number '...'
In a {tag N}, N is higher or equal than the number of tags. Remember that
the first tag is 0 so the last one is (nb tags - 1).
In multi dictionaries, tag display must be between square brackets
--multi-dict is set so {tag N} must be inside a matching pair of square
brackets.
Square bracket mismatch
Each opening square bracket must have a matching closing square bracket.
For each closing square bracket, there must be a preceding opening one.
Also, you cannot open a square bracket until the previous one is not
closed (no sub-bracketting).
Square brackets not allowed in format for unmatched queries
--format-no contains square brackets. If you want to print square
brackets, escape them with a backslash.
This dictionary is not multi
When --multi-dict is not set, the yes format must not contain square
brackets.
Unknown '...'
This curly bracket keyword is unknown to trish2. See --format-yes for a
list of known curly bracket keywords.
COPYRIGHT
AUTHOR
Robert Bossy <Robert.Bossy@jouy.inra.fr>
SEE ALSO
fgrep(1), locale(1), locale(7)