mnoGoSearch 3.3.14 reference manual: Full-featured search engine software
Prev		Next

Chapter 4. Supported file formats and mime types

Table of Contents
Built-in parsers
mnoGoSearch HTML parser

Built-in parsers

mnoGoSearch has built-in parsers for text, HTML, XML, DOCX, RTF, message (*.eml, *.mht) and MP3 file formats, and understands the following mime types in the Content-Type HTTP header (or in the AddType command when indexing local files):

For text/plain, text/tab-separated-values, text/css - the built-in text parser is invoked.
For text/html - the built-in HTML parser is invoked.
For text/xml, application/xml, as well as all for mime types that have sub-strings "+xml" or "rss" (e.g. application/rss+xml, application/vnd.wap.xhtml+xml etc.) - the built-in XML parser is invoked.
For application/vnd.openxmlformats-officedocument.wordprocessingml.document - the built-in DOCX parser is invoked.
For text/rtf, application/rtf and application/x-rtf - the built-in RTF parser is invoked.
For message/rfc822 - the built-in message parser is invoked.
For autio/mpeg - the built-in MP3 parser is invoked.
For the mime types application/http and message/http the document is considered as a full HTTP response consisting of headers (including status line, e.g. HTTP/1.0 200 OK) followed by content. The headers are separated from the content and parsed, then one of the known parser is recursively executed for the content (without headers) according to the Content-Type header value.

Prev	Home	Next
Cached copies		mnoGoSearch HTML parser