Data Sources

Data Sources

Crawlab supports data sources integration, which means you can use Crawlab to manage your data sources, such as MongoDB, MySQL, PostgreSQL, SQL Server, etc.

Supported Data Sources

CategoryData SourceSupported
Non-RelationalMongoDBopen in new window
Non-RelationalElasticSearchopen in new window
RelationalMySQLopen in new window
RelationalPostgreSQLopen in new window
RelationalSQL Serveropen in new window
RelationalCockroachDBopen in new window
RelationalSqliteopen in new window
StreamingKafkaopen in new window

Add Data Source

  1. Go to the Data Sources page
  2. Click New Data Source button
  3. Select Type as the data source type, and enter Name and connection fields
  4. Click Confirm button to save the data source

Now you should be able to see the data source in the Data Sources page.

Use Data Source

  1. Go to the Spider Detail page
  2. Select the data source in the Data Source field
  3. Click on Save button to save the spider
  4. Add related integration code in the code where saving results data (refer to the Spider Code Examples section below)
  5. Run the spider, and you should see the results in the Data tab

Spider Code Examples

General Python Spider

The method save_item in crawlab-sdkopen in new window can be used to save data to designated data source.

from crawlab import save_item


Scrapy Spider

Add crawlab.CrawlabPipeline to

  'crawlab.CrawlabPipeline': 300,