Data Integration
Data Integration
You can integrate your spiders with Crawlab SDK. This allows you to view scraped results visually on Crawlab.
Crawlab SDK supports integration with various web crawler frameworks including Scrapy, and programming languages including Python, Node.js, Go.
NOTE
By default, Crawlab SDK is installed in the base image of Crawlab. You can also install it manually if you are not using Crawlab Docker image.
Scrapy
- Make sure you have created a Scrapy spider on Crawlab.
- Add
crawlab.CrawlabPipeline
toItem_PIPELINES
insettings.py
file.ITEM_PIPELINES = { 'crawlab.CrawlabPipeline': 888, }
- That's it! You can now run your spider on Crawlab.
Python
- Make sure you have created a Python spider on Crawlab.
- Add package import line to import method
save_item
to your spider code.from crawlab import save_item
- Call
save_item
method to save scraped item.save_item({'title': 'example', 'url': 'https://example.com'})
Node.js
- Make sure you have created a Node.js spider on Crawlab.
- Add package import line to import method
saveItem
to your spider code.const { saveItem } = require('crawlab-sdk');
- Call
saveItem
method to save scraped item.saveItem({ title: 'example', url: 'https://example.com' });
Go
- Make sure you have created a Go spider on Crawlab.
- Add package import line to import method
SaveItem
to your spider code.import "github.com/crawlab-team/crawlab-sdk-go"
- Call
SaveItem
method to save scraped item.crawlab.SaveItem(map[string]interface{}{ "title": "example", "url": "https://example.com", })