Kubernetes

Kubernetesopen in new window is an open-source container orchestration platform that automates the deployment, scaling, and management of containerized applications. It is widely used in the industry and has a large community. If you are going to deploy a large-scale Crawlab cluster, Kubernetes is a good choice.

Main Process

If you are familiar with Crawlab Docker installation, you will find that the main process of Kubernetes installation is similar. The only difference is that you need to create Kubernetes resources instead of Docker containers.

Install Kubernetesopen in new window.
Create Kubernetes resources including Deployment, Service, PersistentVolumeClaim, etc.
Start Crawlab.

Note

Before following below guidance, we will assume you have installed Kubernetes and already created the Kubernetes cluster.

Multi-Node Deployment

Kubernetes is designed for large-scale deployment, and it is easy to scale Crawlab to multiple nodes. In the diagram below, we have a Master Node and three Worker Nodes. The Master Node is connected to a MongoDB instance.

Master Node PersistentVolumeClaim (PVC)

In Kubernetes, you need to create PersistentVolume and PersistentVolumeClaim to store Crawlab nodes data. Below is an example for Master Node.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: crawlab-master-pvc
  namespace: crawlab
spec:
  resources:
    requests:
      storage: 10Gi

The above PersistentVolumeClaim requests 10Gi storage for the Master Node. Normally you only need to create PersistentVolumeClaim for Master Node only, because Worker Nodes can be scaled up and down at any time.

Master Node Deployment

Below is an example of Deployment for Master Node. Here we assume you are connecting to an external MongoDB instance.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: crawlab-master
  namespace: crawlab
spec:
  replicas: 1
  selector:
    matchLabels:
      app: crawlab-master
  template:
    metadata:
      labels:
        app: crawlab-master
    spec:
      volumes:
        - name: crawlab-master-storage
          persistentVolumeClaim:
            claimName: crawlab-master-pvc
      containers:
        - name: master
          image: crawlabteam/crawlab:latest
          env:
            - name: CRAWLAB_NODE_MASTER
              value: 'Y'
            - name: CRAWLAB_MONGO_URI
              value: '<mongo_uri>'
            - name: CRAWLAB_MONGO_HOST
              value: '<mongo_host>'
            - name: CRAWLAB_MONGO_PORT
              value: '<mongo_port>'
            - name: CRAWLAB_MONGO_DB
              value: '<mongo_db>'
            - name: CRAWLAB_MONGO_USERNAME
              value: '<mongo_username>'
            - name: CRAWLAB_MONGO_PASSWORD
              value: '<mongo_password>'
            - name: CRAWLAB_MONGO_AUTHSOURCE
              value: '<mongo_auth_source>'
            - name: CRAWLAB_MONGO_AUTHMECHANISM
              value: '<mongo_auth_mechanism>'
            - name: CRAWLAB_MONGO_AUTHMECHANISMPROPERTIES
              value: '<mongo_auth_mechanism_properties>'
          volumeMounts:
            - name: crawlab-master-storage
              mountPath: /root/.crawlab
            - name: crawlab-master-storage
              mountPath: /data
            - name: crawlab-master-storage
              mountPath: /var/log/crawlab
      restartPolicy: Always

Master Node Service

The Master Node Service is used to expose the API and GRPC ports to the both internal and external environments. This is essential for Worker Nodes to connect to the Master Node. Also, it is necessary for the Crawlab Web UI to connect to the Master Node, so that you can visit the Crawlab Web UI from your browser.

Below is an example of Service for Master Node.

kind: Service
apiVersion: v1
metadata:
  name: crawlab-master-service
  namespace: crawlab
spec:
  ports:
    - name: api
      port: 8080
      targetPort: 8080
    - name: grpc
      port: 9666
      targetPort: 9666
  selector:
    app: crawlab-master

Tips

Normally a Service has a ClusterIP type, which means it is only accessible within the Kubernetes cluster. If you want to expose the Master Node to the external environment, you can use NodePort or LoadBalancer type. For more information, please refer to the Kubernetes Serviceopen in new window.

After creating the Service, you can access Crawlab Web UI by visiting http://<service_ip>:8080 in your browser.

Worker Node Deployment

The Deployment for Worker Node is similar to the Master Node. The only difference is that you need to set CRAWLAB_NODE_MASTER to N and CRAWLAB_GRPC_ADDRESS to the Master Node Service address, which in this case is crawlab-master-service:9666.

Below is an example of Deployment for Worker Node.

kind: Deployment
apiVersion: apps/v1
metadata:
  name: crawlab-worker
  namespace: crawlab
spec:
  replicas: 3
  selector:
    matchLabels:
      app: crawlab-worker
  template:
    metadata:
      labels:
        app: crawlab-worker
    spec:
      containers:
        - name: worker
          image: crawlabteam/crawlab:latest
          env:
            - name: CRAWLAB_NODE_MASTER
              value: 'N'
            - name: CRAWLAB_GRPC_ADDRESS
              value: 'crawlab-master-service:9666'
            - name: CRAWLAB_FS_FILER_URL
              value: 'http://crawlab-master-service:8080/api/filer'
      restartPolicy: Always

In the above example, we will create 3 Worker Nodes. You can scale up or down the Worker Nodes by changing the replicas field.

Kubernetes

# Kubernetes

# Main Process

# Multi-Node Deployment

# Master Node PersistentVolumeClaim (PVC)

# Master Node Deployment

# Master Node Service

# Worker Node Deployment

Kubernetes

Main Process

Multi-Node Deployment

Master Node PersistentVolumeClaim (PVC)

Master Node Deployment

Master Node Service

Worker Node Deployment