Mime is used to enable parsing documents with mime types other than text/plain, text/html or text/xml, which have built-in parsers.
Processing of documents with other mime types is possible with help of external parsers - external programs which convert documents of arbitrary types to the above types natively supported by mnoGoSearch.
The from_mime
and
to_mime
parameters are standard mime types.
to_mime
should be one of the natively supported types (listed above)
and can optionally have the charset=
part.
If the charset=
part is omitted,
the parser output is considered to be in
LocalCharset.
By default, when executing a parser, indexer sends data to its STDIN and reads results from its STDOUT.
Some parsers can not operate on STDIN and need a file.
The command line
parameter can have $1
reference which stands for a temporary file name.
If $1
is specified, indexer creates a temporary
file, writes the input data to it, and substitutes the temporary
file in the parser command line instead of the $1
reference.
Command line
can also use variables,
for example ${URL}
or ${Content-Type}
.
See the list of all available variables in indexer -v6 output,
in the lines having the "Response." prefix.
The fourth parameter source
is optional.
It can specify what kind of data is sent to the parser.
By default, indexer sends raw document content.
With help of the source
parameter you
can mix document content with other kind of data,
for example, its URL or some HTTP header,
using the same notation with the command line
parameter.
Raw content is available as ${HTTP.Content}
.
Note: To make
${HTTP.Content}
available, use Section HTTP.Content 0 0 command.
Mime application/msword "text/plain; charset=cp1251" "catdoc $1" Mime application/x-troff-man text/plain "deroff" Mime text/x-postscript text/plain "ps2ascii" Mime application/pdf text/plain "pdftotext $1 -" Mime application/vnd.ms-excel text/plain "xls2csv $1" Mime "text/rtf*" text/html "rthc --use-stdout $1 2>/dev/null" # A parser example with variables in its command line Mime application/mytype text/html "myparser -u ${URL} -t ${Content-Type} $1" # Mixing content with URL and HTTP headers Section HTTP.Content 0 0 Mime application/mytype2 text/html "myparser2" "${URL} # ${Content-Type} # ${HTTP.Content}"