Data Sources

Data Sources

Crawlab supports data sources integration, which means you can use Crawlab to manage your data sources, such as MongoDB, MySQL, PostgreSQL, SQL Server, etc.

Supported Data Sources

Category	Data Source	Supported
Non-Relational	MongoDBopen in new window	✅
Non-Relational	ElasticSearchopen in new window	✅
Relational	MySQLopen in new window	✅
Relational	PostgreSQLopen in new window	✅
Relational	SQL Serveropen in new window	✅
Relational	CockroachDBopen in new window	✅
Relational	Sqliteopen in new window	✅
Streaming	Kafkaopen in new window	✅

Add Data Source

Go to the Data Sources page
Click New Data Source button
Select Type as the data source type, and enter Name and connection fields
Click Confirm button to save the data source

Now you should be able to see the data source in the Data Sources page.

Use Data Source

Go to the Spider Detail page
Select the data source in the Data Source field
Click on Save button to save the spider
Add related integration code in the code where saving results data (refer to the Spider Code Examples section below)
Run the spider, and you should see the results in the Data tab

Spider Code Examples

General Python Spider

The method save_item in crawlab-sdkopen in new window can be used to save data to designated data source.


```py
from crawlab import save_item

...
  save_item(result_item)
...

Scrapy Spider

Add crawlab.CrawlabPipeline to settings.py.

ITEM_PIPELINES = {
  'crawlab.CrawlabPipeline': 300,
}