|

Spider Exclusions
ISYS:spider is a powerful tool which extends the functionality of ISYS
to allow indexing of websites which can then be searched by ISYS:desktop
or ISYS:web.
Rather than spidering a whole website, you may wish to exclude certain
parts of the site which are not relevant and may take up valuable disk
space or indexing time. This is handled in the spider configuration under
"URL Exclusions" and "URL Patterns".
For example, you wish to index a website at http://www.theworldsnews.com
to query the information held on this site. You would set this web URL
as your starting URL in your spider configuration :

There may be a section in the website that is presented in another language
that you don't need to index, or that you don't need to traverse at all.
Let's say there was a section of the site that was all the news duplicated
in Chinese. By looking at the site you identify that all the Chinese written
pages contain the URL path http://www.theworldnews.com/chinese/ . If you
didn't want to index these pages in that section but still wanted Spider
to traverse through them looking for other parts of the site, you would
add */chinese/* as a URL Exclusion in your Spider configuration :

If you didn't want Spider to even traverse through the Chinese section,
then you would specify the same string under "URL Patterns - Ignore":

Dont forget, if you have any questions regarding Tech Tips you
can e-mail support@isys-search.com
|