Jekyll2023-05-22T23:46:12+02:00https://gezimsejdiu.github.io//feed.xmlDr. Gezim SejdiuTech Lead Data Engineer at Deutsche Post DHL Group & Assistant Professor of Computer Science at Universum College, PhD from University of BonnDr. Gezim Sejdiusejdiu@cs.uni-bonn.deSet Up SeaTunnel with Kubernetes2022-04-24T00:00:00+02:002022-04-24T00:00:00+02:00https://gezimsejdiu.github.io//posts/seatunnel-kubernetes<p><em>This post provides a quick guide to using SeaTunnel with Kubernetes.</em></p>
<ul id="markdown-toc">
<li><a href="#prerequisites" id="markdown-toc-prerequisites">Prerequisites</a></li>
<li><a href="#installation" id="markdown-toc-installation">Installation</a> <ul>
<li><a href="#seatunnel-docker-image" id="markdown-toc-seatunnel-docker-image">SeaTunnel docker image</a></li>
<li><a href="#deploying-flink-operator" id="markdown-toc-deploying-flink-operator">Deploying Flink operator</a></li>
</ul>
</li>
<li><a href="#run-seatunnel-application" id="markdown-toc-run-seatunnel-application">Run SeaTunnel Application</a></li>
</ul>
<h2 id="prerequisites">Prerequisites</h2>
<p>We assume that you have a local installations of the following:</p>
<ul>
<li><a href="https://docs.docker.com/">docker</a></li>
<li><a href="https://kubernetes.io/">kubernetes</a></li>
<li><a href="https://helm.sh/docs/intro/quickstart/">helm</a></li>
</ul>
<p>So that the <code class="language-plaintext highlighter-rouge">kubectl</code> and <code class="language-plaintext highlighter-rouge">helm</code> commands are available on your local system.</p>
<p>For kubernetes <a href="https://minikube.sigs.k8s.io/docs/start/">minikube</a> is our choice, at the time of writing this we are using version v1.23.3. You can start a cluster with the following command:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>minikube start <span class="nt">--kubernetes-version</span><span class="o">=</span>v1.23.3
</code></pre></div></div>
<h2 id="installation">Installation</h2>
<h3 id="seatunnel-docker-image">SeaTunnel docker image</h3>
<p>To run the image with SeaTunnel, first create a <code class="language-plaintext highlighter-rouge">Dockerfile</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FROM flink:1.13
ENV <span class="nv">SEATUNNEL_VERSION</span><span class="o">=</span><span class="s2">"2.1.0"</span>
RUN wget https://archive.apache.org/dist/incubator/seatunnel/<span class="k">${</span><span class="nv">SEATUNNEL_VERSION</span><span class="k">}</span>/apache-seatunnel-incubating-<span class="k">${</span><span class="nv">SEATUNNEL_VERSION</span><span class="k">}</span><span class="nt">-bin</span>.tar.gz
RUN <span class="nb">tar</span> <span class="nt">-xzvf</span> apache-seatunnel-incubating-<span class="k">${</span><span class="nv">SEATUNNEL_VERSION</span><span class="k">}</span><span class="nt">-bin</span>.tar.gz
RUN <span class="nb">mkdir</span> <span class="nt">-p</span> <span class="nv">$FLINK_HOME</span>/usrlib
RUN <span class="nb">cp </span>apache-seatunnel-incubating-<span class="k">${</span><span class="nv">SEATUNNEL_VERSION</span><span class="k">}</span>/lib/seatunnel-core-flink.jar <span class="nv">$FLINK_HOME</span>/usrlib/seatunnel-core-flink.jar
RUN <span class="nb">rm</span> <span class="nt">-fr</span> apache-seatunnel-incubating-<span class="k">${</span><span class="nv">SEATUNNEL_VERSION</span><span class="k">}*</span>
</code></pre></div></div>
<p>Then run the following commands to build the image:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker build <span class="nt">-t</span> seatunnel:2.1.0-flink-1.13 <span class="nt">-f</span> Dockerfile <span class="nb">.</span>
</code></pre></div></div>
<p>Image <code class="language-plaintext highlighter-rouge">seatunnel:2.1.0-flink-1.13</code> need to be present in the host (minikube) so that the deployment can take place.</p>
<p>Load image to minikube via:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>minikube image load seatunnel:2.1.0-flink-1.13
</code></pre></div></div>
<h3 id="deploying-flink-operator">Deploying Flink operator</h3>
<p>The steps below provide a quick walk-through on setting up the Flink Kubernetes Operator.</p>
<p>Install the certificate manager on your Kubernetes cluster to enable adding the webhook component (only needed once per Kubernetes cluster):</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl create <span class="nt">-f</span> https://github.com/jetstack/cert-manager/releases/download/v1.7.1/cert-manager.yaml
</code></pre></div></div>
<p>Now you can deploy the latest stable Flink Kubernetes Operator version using the included Helm chart:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>
helm repo add flink-operator-repo https://downloads.apache.org/flink/flink-kubernetes-operator-0.1.0/
helm <span class="nb">install </span>flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator
</code></pre></div></div>
<p>You may verify your installation via <code class="language-plaintext highlighter-rouge">kubectl</code>:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl get pods
NAME READY STATUS RESTARTS AGE
flink-kubernetes-operator-5f466b8549-mgchb 1/1 Running 3 <span class="o">(</span>23h ago<span class="o">)</span> 16d
</code></pre></div></div>
<h2 id="run-seatunnel-application">Run SeaTunnel Application</h2>
<p><strong>Run Application:</strong>: SeaTunnel already providers out-of-the-box <a href="https://github.com/apache/incubator-seatunnel/tree/dev/config">configurations</a>.</p>
<p>In this guide we are going to use <a href="https://github.com/apache/incubator-seatunnel/blob/dev/config/flink.streaming.conf.template">flink.streaming.conf</a>:</p>
<div class="language-conf highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">env</span> {
<span class="n">execution</span>.<span class="n">parallelism</span> = <span class="m">1</span>
}
<span class="n">source</span> {
<span class="n">FakeSourceStream</span> {
<span class="n">result_table_name</span> = <span class="s2">"fake"</span>
<span class="n">field_name</span> = <span class="s2">"name,age"</span>
}
}
<span class="n">transform</span> {
<span class="n">sql</span> {
<span class="n">sql</span> = <span class="s2">"select name,age from fake"</span>
}
}
<span class="n">sink</span> {
<span class="n">ConsoleSink</span> {}
}
</code></pre></div></div>
<p>This configuration need to be present when we are going to deploy the application (SeaTunnel) to Flink cluster (on Kubernetes), we also need to configure a Pod to Use a PersistentVolume for Storage.</p>
<ul>
<li>Create <code class="language-plaintext highlighter-rouge">/mnt/data</code> on your Node. Open a shell to the single Node in your cluster. How you open a shell depends on how you set up your cluster. For example, in our case weare using Minikube, you can open a shell to your Node by entering <code class="language-plaintext highlighter-rouge">minikube ssh</code>.
In your shell on that Node, create a /mnt/data directory:
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>minikube ssh
<span class="c"># This assumes that your Node uses "sudo" to run commands as the superuser</span>
<span class="nb">sudo mkdir</span> /mnt/data
</code></pre></div> </div>
</li>
<li>Copy application (SeaTunnel) configuration files to your Node.
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>minikube <span class="nb">cp </span>flink.streaming.conf /mnt/data/flink.streaming.conf
</code></pre></div> </div>
</li>
</ul>
<p>Once the Flink Kubernetes Operator is running as seen in the previous steps you are ready to submit a Flink (SeaTunnel) job:</p>
<ul>
<li>Create <code class="language-plaintext highlighter-rouge">seatunnel-flink.yaml</code> FlinkDeployment manifest:
<div class="language-yaml highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="na">apiVersion</span><span class="pi">:</span> <span class="s">flink.apache.org/v1alpha1</span>
<span class="na">kind</span><span class="pi">:</span> <span class="s">FlinkDeployment</span>
<span class="na">metadata</span><span class="pi">:</span>
<span class="na">namespace</span><span class="pi">:</span> <span class="s">default</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">seatunnel-flink-streaming-example</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">image</span><span class="pi">:</span> <span class="s">seatunnel:2.1.0-flink-1.13</span>
<span class="na">flinkVersion</span><span class="pi">:</span> <span class="s">v1_14</span>
<span class="na">flinkConfiguration</span><span class="pi">:</span>
<span class="s">taskmanager.numberOfTaskSlots</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2"</span>
<span class="na">serviceAccount</span><span class="pi">:</span> <span class="s">flink</span>
<span class="na">jobManager</span><span class="pi">:</span>
<span class="na">replicas</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">resource</span><span class="pi">:</span>
<span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2048m"</span>
<span class="na">cpu</span><span class="pi">:</span> <span class="m">1</span>
<span class="na">taskManager</span><span class="pi">:</span>
<span class="na">resource</span><span class="pi">:</span>
<span class="na">memory</span><span class="pi">:</span> <span class="s2">"</span><span class="s">2048m"</span>
<span class="na">cpu</span><span class="pi">:</span> <span class="m">2</span>
<span class="na">podTemplate</span><span class="pi">:</span>
<span class="na">spec</span><span class="pi">:</span>
<span class="na">containers</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">flink-main-container</span>
<span class="na">volumeMounts</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">mountPath</span><span class="pi">:</span> <span class="s">/data</span>
<span class="na">name</span><span class="pi">:</span> <span class="s">config-volume</span>
<span class="na">volumes</span><span class="pi">:</span>
<span class="pi">-</span> <span class="na">name</span><span class="pi">:</span> <span class="s">config-volume</span>
<span class="na">hostPath</span><span class="pi">:</span>
<span class="na">path</span><span class="pi">:</span> <span class="s2">"</span><span class="s">/mnt/data"</span>
<span class="na">type</span><span class="pi">:</span> <span class="s">Directory</span>
<span class="na">job</span><span class="pi">:</span>
<span class="na">jarURI</span><span class="pi">:</span> <span class="s">local:///opt/flink/usrlib/seatunnel-core-flink.jar</span>
<span class="na">entryClass</span><span class="pi">:</span> <span class="s">org.apache.seatunnel.SeatunnelFlink</span>
<span class="na">args</span><span class="pi">:</span> <span class="pi">[</span><span class="s2">"</span><span class="s">--config"</span><span class="pi">,</span> <span class="s2">"</span><span class="s">/data/flink.streaming.conf"</span><span class="pi">]</span>
<span class="na">parallelism</span><span class="pi">:</span> <span class="m">2</span>
<span class="na">upgradeMode</span><span class="pi">:</span> <span class="s">stateless</span>
</code></pre></div> </div>
</li>
<li>Run the example application:
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl apply <span class="nt">-f</span> seatunnel-flink.yaml
</code></pre></div> </div>
</li>
</ul>
<p><strong>See The Output</strong></p>
<p>You may follow the logs of your job, after a successful startup (which can take on the order of a minute in a fresh environment, seconds afterwards) you can:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl logs <span class="nt">-f</span> deploy/seatunnel-flink-streaming-example
</code></pre></div></div>
<p>To expose the Flink Dashboard you may add a port-forward rule:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl port-forward svc/seatunnel-flink-streaming-example-rest 8081
</code></pre></div></div>
<p>Now the Flink Dashboard is accessible at <a href="http://localhost:8081">localhost:8081</a>.</p>
<p>Or launch <code class="language-plaintext highlighter-rouge">minikube dashboard</code> for a web-based Kubernetes user interface.</p>
<p>The content printed in the TaskManager Stdout log:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl logs <span class="se">\</span>
<span class="nt">-l</span> <span class="s1">'app in (seatunnel-flink-streaming-example), component in (taskmanager)'</span> <span class="se">\</span>
<span class="nt">--tail</span><span class="o">=</span><span class="nt">-1</span> <span class="se">\</span>
<span class="nt">-f</span>
</code></pre></div></div>
<p>looks like the below (your content may be different since we use <code class="language-plaintext highlighter-rouge">FakeSourceStream</code> to automatically generate random stream data):</p>
<div class="language-shell highlighter-rouge"><div class="highlight"><pre class="highlight"><code>+I[Kid Xiong, 1650316786086]
+I[Ricky Huo, 1650316787089]
+I[Ricky Huo, 1650316788089]
+I[Ricky Huo, 1650316789090]
+I[Kid Xiong, 1650316790090]
+I[Kid Xiong, 1650316791091]
+I[Kid Xiong, 1650316792092]
</code></pre></div></div>
<p>To stop your job and delete your FlinkDeployment you can simply:</p>
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>kubectl delete <span class="nt">-f</span> seatunnel-flink.yaml
</code></pre></div></div>
<p><em>A simplified verison of this post has been <a href="https://seatunnel.apache.org/docs/start/kubernetes">contributed to SeaTunnel already</a>.</em></p>
<p>Happy SeaTunneling!</p>Dr. Gezim Sejdiusejdiu@cs.uni-bonn.deThis post provides a quick guide to using SeaTunnel with Kubernetes.