Node


Node

A node is a Crawlab instance that runs crawling tasks or provides other functionalities. You can basically regard a node as a server.

There are two types of nodes, each of which serves different functionalities.

  1. Master Node
  2. Worker Node

Note

Of course you can set up multiple Crawlab instances (nodes) on a server, but that is NOT recommended as a single instance (node) on a server normally suffices.

Master Node

Master Node is the control center of the whole distributed system in Crawlab. It acts like the brain of a human body. Master Node assigns tasks to Worker Nodes or itself, and manages them. It also deploys and distributes spider files to other nodes. Furthermore, it provides APIs to the frontend application and handles communication between each node.

Note

There is only ONE Master Node in Crawlab.

Worker Node

Worker Node is a Crawlab instance dedicated for running crawling tasks. Normally, a single node or server can be limited to its computing power and resources including CPUs, memory and network IO. Therefore, the number of Worker Nodes can be increased in order to scale up the throughput of data collection and improve the overall crawling performance of the distributed system.

Tips

There can be none (SND) or multiple Worker Nodes (MND) in Crawlab.

Topology

Check Node Status

In Nodes page, you can view the status of a node whether it is online of offline.

Enable/Disable

You can enable or disable nodes to run tasks by toggling the switch button of Enabled attribute in Nodes page and node detail page.

Set Max Runners

A node can run multiple tasks at the same time. The number of concurrent tasks is controlled by Max Runners of a node. It can be configured in the node detail page.

Set Basic Info

Basic info such as node name, IP, MAC address can be set in the node detail page.

Add Node

You can refer to Set up Worker Nodes in Multi-Node Deployment (MND) of Docker Installation to add new nodes.