trish2/trish2 [ Generics ]
[ Top ] [ Generics ]
NAME
trish2
SYNOPSIS
trish2 [OPTIONS] DICT
DESCRIPTION
trish2 searches for strings in a list of fixed string patterns using PATRICIA trees. The list of patterns is given one per line in the file DICT, trish2 builds a PATRICIA tree with this list of patterns and searches them in the queries given as standard input.
OPTIONS
Help options -?, --help Display a help text and exit. --usage Display a very brief help message and exit. General options: -n, --no-search Do not perform any search. This option is used you just want to save the PATRICIA tree or get the optimal tree building parameters. -D, --save-as=FILE Save the PATRICIA tree in a binary form in FILE. This file can be used in future uses of trish2 with the --compiled-dict option. -P, --optimal-parameters Print the optimal tree building parameters to standard output. This output can be copied verbatim in the command line of future uses of trish2 on the same dictionary. These options will speed up slightly the construction PATRICIA tree. --version Print version and exit. Verbosity options: -q, --quiet Be quiet. trish2 will not display messages about opening files and changing the locale. -v, --verbose Be verbose. trish2 will display additional information about building the PATRICIA tree. -V, --locace Be insanely verbose. trish2 will display anything you do not want to know. -s, --strict Abort at the first warning. Normally trish2 will proceed after a warning. Search options: -i, --input=FILE Read queries from file FILE instead of standard input. -o, --output=FILE Write the result into file FILE instead of standard output. -x, --search-max-length=LENGTH Set the maximum length of the queries to LENGTH. All characters beyond length will be ignored. This is a wide character count, not a byte count. Default value is 1024. -e, --search-encoding=ENCODING Set the encoding of queries and output to ENCODING. Use `locale -a' to get the list of encodings supported by your system. Default value is `C'. -w, --substrings Search for queries as well as queries substrings. Normally trish2 will look for patterns that match the whole queries only. -W, --words When --substrings is set, only look for queries substrings immediately preceded and immediately followed by a non-word character. Thus trish2 only looks for substring forming whole words. This option is ignored if --substrings is not present. -C, --capital-insensitive Perform a case insensitive match on the first character of each query. If --substrings is set, trish2 will perform a case-insensitive match for the first character of each query substring. -c, --case-insensitive Perform a case insensitive match on all queries characters. -G, --general-capital-insensitive Perform a generic case insensitive (GCI) match on the first character of each query (or each query substring if --substrings is set). See --gci-table option for details on GCI. This option is ignored if the --gci-table option is not set. -g, --general-case-insensitive Perform a generic case insensitive (GCI) match on all queries characters. See --gci-table option for details on GCI. This option is ignored if the --gci-table option is not set. -t, --gci-table=PATH Read the general case insensitive (GCI) table in file PATH. A GCI table tells trish2 which characters match each other. The table is read line by line: the first character of a line will match any of the other characters in the line. The encoding of the GCI table must be the same as the dictionary. -Y, --format-yes=FORMAT Set the output format for a query found in the dictionary. FORMAT is a string that interprets the following escapes: \t tabulation character \n newline \\ backslash \{ opening curly bracket \} closing curly bracket \[ opening square bracket \] closing square bracket Additionally trish2 will replace keywords between curly brackets by information about the match: {query} The query string that matched a dictionary pattern. Note that if --substrings is set this will be the substring of the query from the start of the match to the end of the query. {line} The line number in the query stream where the query was read. {start} The character offset of the start of the query substring. This is a wide character offset, not a byte offset. Always equal to 0 if --substrings is not set. {length} Length of the match. {match} Matched portion of the query. If --substrings is set this will be the substring of the query from the start to the end of the match. {entry} Dictionary pattern matched by the query. {mismatch} The number of characters matched by case-insensitive or GCI. Equals to 0 in the case of an exact match. {tag N} N is a digit. Display the tag N associated to the pattern. See --tag-delimiters option for details about tags. Note that the first tag is 0. If trish2 encounters anything else between curly brackets, then it will issue an error. If --multi-dict is set, then anything between square brackets will be repeated as many times there are copies of the matched pattern. Since there is a tag set for each repetition, any {tag N} keywords must be enclosed in square brackets. The default value is "{match}\n". -N, --format-no=FORMAT Set the output format for queries that do not match any dictionary pattern. FORMAT has the same syntax as --format-yes formats, however it will only recognize the `{query}' keyword. The default value is an empty string so trish2 will not display anything. -h, --show-headers Display the dictionary tag headers before performing matches using the format set by the --format-yes option. Dictionary options: -d, --compiled-dict Tell trish2 that the dictionary file DICT is a PATRICIA tree in binary form. To create such files, use the --save-as option. If DICT is not a binary PATRICIA tree then the behaviour is undefined. -X, --dict-max-length=LENGTH Set the maximum length of a dictionary entry to LENGTH. This is a wide character count, not a byte count. All characters beyond length will be ignored. If the last tag character was not already read, trish2 will issue a malformed entry line warning. Default value is 1024. -E, --dict-encoding=ENCODING Set the encoding of dictionary to ENCODING. Use `locale -a' to get the list of encodings supported by your system. Default value is `C'. -a, --tag-delimiters=DELIMITERS Associate tag strings to each dictionary pattern. You can associate 1 to 9 tags to each entry, if you chose so every entry must have the same number of tags. DELIMITERS is the string of characters introducing each tag. Everything between the start of the line and the first occurrence of the first character of DELIMITERS will be considered as the pattern by trish2. Everything between the first occurrence of the first character of DELIMITERS and the first occurrence of the second character of DELIMITERS will be considered as the first tag. And so on. The default value is "", so there are no tags at all. -m, --multi-dict Allows for multiple identical patterns. Normally trish2 will issue a warning and ignore a pettern if it is already present in the dictionary. If you use tags along with this option, trish2 may store a different tag set for each repetition of the pattern. If you use tags without this option only the tags associatedd with the first occurrence of the pattern in the dictionary will be stored. -H, --headers Tell trish2 that the first line of DICT indicate the pattern and tag headers. The first line will be split with the delimiters set by --tag-delimiters, the first substring will be the pattern header name, the second will be the first tag header name, and so on. Optimal tree building options. These options indicate trish2 the the amount of memory to allocate for the PATRICIA tree. If these parameters are set to the right values, trish2 will avoid multiple time costly allocations. --opt-mnb=INT Number of nodes in the pattern tree. --opt-msb=INT Total length of tail strings in the pattern tree. --opt-tnb=INT Number of patterns. --opt-tsb=INT Number of distinct tags. --opt-entries=INT Number of nodes in the tag tree. --opt-tagsize=INT Total length of tail strings in the tag tree.
DIAGNOSTICS
trish2 will return a non-zero value if there was an error (or a warning with --strict option). Otherwise it will return 0 whether there was a successful match or not.
WARNINGS
Informative messages (INFO): Computing optimal parameters Counting memory usage for the --optimal-parameters option. Creating dictionary (single, tags: ...) Creating dictionary (multi, tags: ...) Building the PATRICIA tree with the indication of either it is a multi dictionary or not and the number of tags. Entry header: ... Name of the pattern entry header. Locale set to '...' trish2 has swithed to another encoding. Opening file '...' for writing Opening the file set by the --output option. Opening input file '...' Opening the file set by the --input option. Reading dictionary file '...' Reading pattern dictionary file. Reading input Reading queries from the input file or standard input. Reading header line Reading the first line of the pattern dictionary as header names. Saving dictionary into file '...' Saving the tree in the binary file set by --save-as. Tag header ...: ... Name of the Nth tag. Warnings (WARNING): Character already has alternatives, line ... In the GCI table file, several lines begin with the same character. Duplicate entry, line ... **** THIS ONE ---> ... An entry is already present and the --multi-dict option is not set. Illegal character sequence, line ... **** THIS ONE ---> ... Found an illegal character sequence under the current locale. Ill formed entry, line ... **** THIS ONE ---> ... A line in the dictionary file does not contain all the tag delimiters. No GCI table provided, disabling GCI search One of --general-capital-insensitive or --general-case-insensitive options is set but --gci-table is not set.
ERRORS
Error messages (ERROR): Could not open dictionary file '...' Opening the pattern dictionary file was impossible (file does not exist or the user is not allowed to). Could not open '...' for writing Opening the file set by --output in writing mode was impossible, probably because the user is not allowed to. Could not open file '...' Opening the file set by --input was impossible (file does not exist or the user is not allowed to). Empty file? trish2 could not read the header line in the pattern dictionary file. This could mean that the file is empty. Format yes, char ... : ... Format no, char ... : ... Generic error in format string indicating the character offset and the reason of the error. Ill formed header line The header line in the pattern dictionary file does not contain all the tag delimiters set by --tag-delimiters. Unhandled locale '...' The locale set by --search-encoding or --dict-encoding is not supported by your system. There are ... errors in the yes format There are ... errors in the no format Summary of errors in format strings. See Format errors. You must provide a dictionary file name Pattern dictionary file name was omitted. Format eeror messages: Format string errors: Curly bracket mismatch An opening curly bracket is not closed, or conversely there is a closing curly bracket with no opening one. Illegal '...' This curly bracket keyword is not allowed in the no format. Illegal tag number '...' In a {tag N}, N is higher or equal than the number of tags. Remember that the first tag is 0 so the last one is (nb tags - 1). In multi dictionaries, tag display must be between square brackets --multi-dict is set so {tag N} must be inside a matching pair of square brackets. Square bracket mismatch Each opening square bracket must have a matching closing square bracket. For each closing square bracket, there must be a preceding opening one. Also, you cannot open a square bracket until the previous one is not closed (no sub-bracketting). Square brackets not allowed in format for unmatched queries --format-no contains square brackets. If you want to print square brackets, escape them with a backslash. This dictionary is not multi When --multi-dict is not set, the yes format must not contain square brackets. Unknown '...' This curly bracket keyword is unknown to trish2. See --format-yes for a list of known curly bracket keywords.
COPYRIGHT
AUTHOR
Robert Bossy <Robert.Bossy@jouy.inra.fr>
SEE ALSO
fgrep(1), locale(1), locale(7)