Spider
Spider
Spider is the basic unit of web crawler programs in Crawlab. You can regard it as a web crawling software project consisted of code and files, e.g. a Scrapy project. Please note that the term project mentioned here is not the same as the basic concept Project in Crawlab.
Note
The concept Spider is so important in Crawlab that we strongly recommend you read through this section.
Typical Process
Below is a typical process for users to play with spiders in Crawlab.
Create Spider
- Navigate to
Spiders
page, and clickNew Spider
button on the top left. - Enter relevant info including
Name
andExecute Command
. - Click
Confirm
.
Execute Command
is the base command that will be executed when running the spider, e.g. scrapy crawl myspider
, and it's the bash/shell command that will be executed when running the spider.
Incremental Sync
is whether to sync files incrementally when running the spider, instead of downloading all files every time. It can save time when downloading files.
Auto Install Dependencies
is whether to install dependencies automatically when running the spider, e.g. the dependencies in requirements.txt
. (This feature is only available in Crawlab Pro Edition)
Upload Spider
There are several ways to upload spider files.
Upload Folder
- Navigate to spider detail page.
- Click
Files
tab. - Click
Upload
button in the nav bar. - Choose
Folder
. - Click
Click to Select Folder to Upload
. - Choose the folder where spider files are located.
- Click
Confirm
.
Upload Files
- Navigate to spider detail page.
- Click
Files
tab. - Click
Upload
button in the nav bar. - Choose
Files
. - Drag and drop spider files into the drop zone, or click the drop zone and select files.
- Click
Confirm
.
Upload Files (Drag & Drop)
- Navigate to spider detail page.
- Click
Files
tab. - Drag and drop spider files or folders into folders on file navigator on the left.
Run Spider
You can follow the steps below to run a spider.
- If in spider detail page, click
Run
button with play icon in the nav bar. - If in
Spiders
page, clickRun
button with play icon on the right. - Select appropriate settings for running spider.
- Click
Confirm
.
Here is the explanation of settings for running a spider.
Command
: Actual cmd/bash/shell base command that will be executed.Param
: Actual parameters/arguments passed toCommand
.Mode
: Task running mode. Default toRandom Node
.Priority
: Task priority. Default to 5.