SkipIf

Name

SkipIf -- skip revisiting the documents with a section matching the given pattern

indexer.conf

Synopsis

SkipIf [Match | NoMatch] [Case | NoCase] [String | Regexp] {Section...} {Pattern...}

Description

indexer skips downloading and parsing the documents with the given Section matching the given Pattern.

Every time a matching document expires and appears in the crawler queue, indexer just marks the document as fresh again by modifying its next_index_time value. Word and section information about the document remains untouched.

Note: SkipIf can be useful for excluding sites from revisiting when, for example, the sites are temporarily not available.

The meaning of the first three optional parameters is exactly the same as in the Allow and IndexIf commands.

The Section parameter specifies which section is checked against the Pattern. It can also be a concatenation of multiple sections, composed with help of ${SECTION} syntax.

It's possible to use multiple patterns in the same SkipIf command.

Scope

SkipIf takes global effect for the entire configuration file and can be used multiple times.

Examples


# Skip revisiting documents with title starting with the word Archive
SkipIf Title Archive*

# Skip revisiting of text/plain documents only from the given site.
SkipIf "${URL}#${Content-Type}" "http://site/*#text/plain"
      

See also

Allow, CheckMP3, CheckMP3Only, Disallow, HrefOnly, Skip.