Set Up SeaTunnel with Kubernetes

4 minute read

Published:

This post provides a quick guide to using SeaTunnel with Kubernetes.

Prerequisites

We assume that you have a local installations of the following:

So that the kubectl and helm commands are available on your local system.

For kubernetes minikube is our choice, at the time of writing this we are using version v1.23.3. You can start a cluster with the following command:

minikube start --kubernetes-version=v1.23.3

Installation

SeaTunnel docker image

To run the image with SeaTunnel, first create a Dockerfile:

FROM flink:1.13

ENV SEATUNNEL_VERSION="2.1.0"

RUN wget https://archive.apache.org/dist/incubator/seatunnel/${SEATUNNEL_VERSION}/apache-seatunnel-incubating-${SEATUNNEL_VERSION}-bin.tar.gz
RUN tar -xzvf apache-seatunnel-incubating-${SEATUNNEL_VERSION}-bin.tar.gz

RUN mkdir -p $FLINK_HOME/usrlib
RUN cp apache-seatunnel-incubating-${SEATUNNEL_VERSION}/lib/seatunnel-core-flink.jar $FLINK_HOME/usrlib/seatunnel-core-flink.jar

RUN rm -fr apache-seatunnel-incubating-${SEATUNNEL_VERSION}*

Then run the following commands to build the image:

docker build -t seatunnel:2.1.0-flink-1.13 -f Dockerfile .

Image seatunnel:2.1.0-flink-1.13 need to be present in the host (minikube) so that the deployment can take place.

Load image to minikube via:

minikube image load seatunnel:2.1.0-flink-1.13

The steps below provide a quick walk-through on setting up the Flink Kubernetes Operator.

Install the certificate manager on your Kubernetes cluster to enable adding the webhook component (only needed once per Kubernetes cluster):

kubectl create -f https://github.com/jetstack/cert-manager/releases/download/v1.7.1/cert-manager.yaml

Now you can deploy the latest stable Flink Kubernetes Operator version using the included Helm chart:


helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-0.1.0/

helm install flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator

You may verify your installation via kubectl:

kubectl get pods
NAME                                                   READY   STATUS    RESTARTS      AGE
flink-kubernetes-operator-5f466b8549-mgchb             1/1     Running   3 (23h ago)   16d

Run SeaTunnel Application

Run Application:: SeaTunnel already providers out-of-the-box configurations.

In this guide we are going to use flink.streaming.conf:

env {
  execution.parallelism = 1
}

source {
    FakeSourceStream {
      result_table_name = "fake"
      field_name = "name,age"
    }
}

transform {
    sql {
      sql = "select name,age from fake"
    }
}

sink {
  ConsoleSink {}
}

This configuration need to be present when we are going to deploy the application (SeaTunnel) to Flink cluster (on Kubernetes), we also need to configure a Pod to Use a PersistentVolume for Storage.

  • Create /mnt/data on your Node. Open a shell to the single Node in your cluster. How you open a shell depends on how you set up your cluster. For example, in our case weare using Minikube, you can open a shell to your Node by entering minikube ssh. In your shell on that Node, create a /mnt/data directory:
    minikube ssh
    
    # This assumes that your Node uses "sudo" to run commands as the superuser
    sudo mkdir /mnt/data
    
  • Copy application (SeaTunnel) configuration files to your Node.
    minikube cp flink.streaming.conf /mnt/data/flink.streaming.conf
    

Once the Flink Kubernetes Operator is running as seen in the previous steps you are ready to submit a Flink (SeaTunnel) job:

  • Create seatunnel-flink.yaml FlinkDeployment manifest:
    apiVersion: flink.apache.org/v1alpha1
    kind: FlinkDeployment
    metadata:
      namespace: default
      name: seatunnel-flink-streaming-example
    spec:
      image: seatunnel:2.1.0-flink-1.13
      flinkVersion: v1_14
      flinkConfiguration:
        taskmanager.numberOfTaskSlots: "2"
      serviceAccount: flink
      jobManager:
        replicas: 1
        resource:
          memory: "2048m"
          cpu: 1
      taskManager:
        resource:
          memory: "2048m"
          cpu: 2
      podTemplate:  
        spec:
          containers:
            - name: flink-main-container
              volumeMounts:
                - mountPath: /data
                  name: config-volume
          volumes:
            - name: config-volume
              hostPath:
                path: "/mnt/data"
                type: Directory
    
      job:
        jarURI: local:///opt/flink/usrlib/seatunnel-core-flink.jar
        entryClass: org.apache.seatunnel.SeatunnelFlink
        args: ["--config", "/data/flink.streaming.conf"]
        parallelism: 2
        upgradeMode: stateless
    
    
  • Run the example application:
    kubectl apply -f seatunnel-flink.yaml
    

See The Output

You may follow the logs of your job, after a successful startup (which can take on the order of a minute in a fresh environment, seconds afterwards) you can:

kubectl logs -f deploy/seatunnel-flink-streaming-example

To expose the Flink Dashboard you may add a port-forward rule:

kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081

Now the Flink Dashboard is accessible at localhost:8081.

Or launch minikube dashboard for a web-based Kubernetes user interface.

The content printed in the TaskManager Stdout log:

kubectl logs \
-l 'app in (seatunnel-flink-streaming-example), component in (taskmanager)' \
--tail=-1 \
-f

looks like the below (your content may be different since we use FakeSourceStream to automatically generate random stream data):

+I[Kid Xiong, 1650316786086]
+I[Ricky Huo, 1650316787089]
+I[Ricky Huo, 1650316788089]
+I[Ricky Huo, 1650316789090]
+I[Kid Xiong, 1650316790090]
+I[Kid Xiong, 1650316791091]
+I[Kid Xiong, 1650316792092]

To stop your job and delete your FlinkDeployment you can simply:

kubectl delete -f seatunnel-flink.yaml

A simplified verison of this post has been contributed to SeaTunnel already.

Happy SeaTunneling!