Data Sources


Data Sources

NOTE

This functionality is for Pro Edition only.

Crawlab Pro supports data sources integration, which means you can use Crawlab Pro to manage your data sources, such as MongoDB, MySQL, PostgreSQL, SQL Server, etc.

The Community Editionopen in new window only supports storing results data to default MongoDB, which stores operational data of Crawlab.

Supported Data Sources

CategoryData SourceSupported
Non-RelationalMongoDBopen in new window
Non-RelationalElasticSearchopen in new window
RelationalMySQLopen in new window
RelationalPostgreSQLopen in new window
RelationalSQL Serveropen in new window
RelationalCockroachDBopen in new window
RelationalSqliteopen in new window
StreamingKafkaopen in new window

Add Data Source

  1. Go to the Data Sources page
    data-sources-menu
  2. Click New Data Source button
    new-data-source-button
  3. Select Type as the data source type, and enter Name and connection fields
    mongo-form
  4. Click Confirm button to save the data source

Now you should be able to see the data source in the Data Sources page.

Use Data Source

  1. Go to the Spider Detail page
  2. Select the data source in the Data Source field
    mongo-data-source
  3. Click on Save button to save the spider
  4. Add related integration code in the code where saving results data (refer to the Spider Code Examples section below)
  5. Run the spider, and you should see the results in the Data tab
    results

Spider Code Examples

General Python Spider

The method save_item in crawlab-sdkopen in new window can be used to save data to designated data source.


```py
from crawlab import save_item

...
  save_item(result_item)
...

Scrapy Spider

Add crawlab.CrawlabPipeline to settings.py.

ITEM_PIPELINES = {
  'crawlab.CrawlabPipeline': 300,
}