A Website has a structure like a tree. The root-file (Start URL) contains
links to other pages, which contain links to other pages, a.s.o. Since
in commercial pages this can be never ending, the Webcrawler has to stop
at some point (Max. Depth). But even if you specify a Max. Depth of only
1 or 2, one HTML-file can contain so many links, that you still end up
downloading too many files. To prevent this you can specify a Max Node#
which tells the Crawler the maximum number of files to download.
Start URL ..... Specify the URL where the Webcrawler should start
at. This can either be a http or a file URL.
e.g: http://www.sun.com
e.g: file:///c:/html/index.html
Max. Depth ..... With the Max. Depth you can determine how "deep" into the tree the Webcrawler should dive. Of course, the higher you set this value, the longer you have to wait.
Max. Node# ..... Determines the maximum number of nodes you wish to download from the net.
To start the Crawler with the specified settings click on the Start Button.
To stop it press the Stop Button.
In the information-field under the buttons you can always see what the Crawler is doing at the moment.
Underneath that there's a progress bar that indicates wheter
the Crawler is working or not.