Introduction

If you already know what Crawlab is and what it is used for, you can head straight to Quick Start or Installation to install and start to use Crawlab.

If you are not familiar with Crawlab, you can read sections below in order to understand more about Crawlab.

What is Crawlab?

Crawlab is a powerful Web Crawler Management Platform (WCMP) that can run web crawlers and spiders developed in various programming languages including Python, Go, Node.js, Java, C# as well as frameworks including Scrapy, Colly, Selenium, Puppeteer. It is used for running, managing and monitoring web crawlers, particularly in production environment where traceability, scalability and stability are the major factors to concern.

Background and History

Crawlab project has been under continuous development since it was published in March 2019, and gone through a number of major releases. It was initially designed for solving the managerial issue when there are a large number of spiders to coordinate and execute. With a lot of improvements and newly updated features, Crawlab is becoming more and more popular in developer communities, particularly amongst web crawler engineers.

Change Logsopen in new window

Who can use Crawlab?

Web Crawler Engineers. By integrating web crawler programs into Crawlab, you can now focus only on the crawling and parsing logics, instead of wasting too much time on writing common modules such as task queue, storage, logging, notification, etc.
Operation Engineers. The main benefits from Crawlab for Operation Engineers are the convenience in deployment (for both crawler programs and Crawlab itself). Crawlab supports easy installation with Docker and Kubernetes.
Data Analysts. Data analysts who can code (e.g. Python) are able to develop web crawler programs (e.g. Scrapy) and upload them into Crawlab. Then leave all the rest dirty work to Crawlab, and it will automatically collect data for you.
Others. Technically everyone can enjoy the convenience and easiness of automation provided by Crawlab. Though Crawlab is good at running web crawler tasks, it can also be used for other types of tasks such as data processing and automation.

Main Features

Category	Feature	Description
Node	Node Management	Register, manage and control multiple nodes in the distributed system
Spider	Spider Deployment	Auto-deploy spiders to multiple nodes and auto-sync spider files including scripts and programs
	Spider Code Editing	Update and edit script code with the online editor on the go
	Spider Stats	Spider crawling statistical data such as average running time and results count
	Framework Integration	Integrate spider frameworks such as Scrapy
	Data Storage Integration	Automatic saving results data in the database without additional configurations
	Git Integration	Version control through embedded or external remote Git repos
Task	Task Scheduling	Assign and schedule crawling tasks to multiple nodes in the distributed system
	Task Logging	Automatic saving task logs which can be viewed in the frontend UI
	Task Stats	Visually display task stats including task results count and running time
User	User Management	Create, update and delete user accounts
Other	Dependency Management	Search and install dependencies Python and Node.js packages
Other	Notification	Automatic email or mobile notifications when tasks are triggered or complete

Introduction

# Introduction

# What is Crawlab?

# Background and History

# Who can use Crawlab?

# Main Features

Introduction

What is Crawlab?

Background and History

Who can use Crawlab?

Main Features