Erstellt am 17. Mai 2026

Senior Site Reliability Engineer

Diligent Corporation

Munchen, Bavaria, Germany Vollzeit

Reference: 102_716120_5797512004

You're a seasoned Site Reliability Engineer with years spent running production Kubernetes at scale, and you're the kind of engineer who takes the initiative when something can be better - observability, resilience, a tricky upgrade, or the way the team thinks about security. You're looking for a role where that initiative has room to turn into real improvements on a platform that customers trust with their most confidential data.

In this role you'll join our operations team for our MeetingSuite product in Munich - a flat and diverse SRE team of four engineers. It's a team where influence comes from example rather than authority. Your day-to-day is keeping our Kubernetes platforms observable, resilient and boring-to-upgrade: GitOps with Flux, multi-AZ design, zero-downtime releases, and a centralised observability story every service owner can use without calling SRE. Alongside that, you'll partner closely with our Application Security Engineer on Kubernetes and container security - with room to grow into our security champion over time - to keep the bar high for the DAX 30 and other DACH customers we serve.

If multi-cluster Kubernetes, GitOps, logging, monitoring and NoSQL database management on Kubernetes are in your vocabulary, read on.

Here's a breakdown of what you'll do (not all of it, just the important stuff):

Operate and continuously improve our Kubernetes production platforms, contributing to zero-downtime upgrades and multi-AZ resilience as team-wide goals.

Grow into the team's expert on our ELK-based log platform - centralised cross-cluster monitoring and anomaly detection - so every service owner can see, alert on and debug their workload without SRE hand-holding. Maintain and evolve our Prometheus alerting rules and Grafana dashboards alongside the team.

Partner with our Application Security Engineer on Kubernetes and container security - admission control, workload identity, secrets management, network segmentation and runtime threat detection - with an interest in growing into our security champion over time.

Love automation. Chip away at operational toil - deployments, monitoring setup, internal reporting - building on the baseline the team already has, and ship reliably through our GitOps workflow (Flux, GitLab CI).

Participate in our Standby and Daily Business rotation, lead incident response, run blameless post-mortems and drive the resulting action items to completion.

These are the essentials you'll need to get an interview:

Several years hands-on SRE, DevOps or Platform Engineering, including meaningful time running production Kubernetes at scale.

Strong Kubernetes expertise with deep hands-on experience in at least one area - cluster lifecycle and upgrades, workload identity and RBAC, admission control, network policies, or custom resources and operators - and working familiarity with the rest.

Solid grasp of Kubernetes and container security - secrets management, network segmentation and runtime protection - and an interest in growing into our security champion alongside our Application Security Engineer.

Proven depth in the ELK stack (or a very similar log platform) - pipelines, indexing, dashboards, alerting - with an interest in growing into the team's observability expert. Working knowledge of Prometheus and Grafana.

Comfortable with GitOps and CI/CD as a daily way of working (we run Flux and GitLab CI; equivalents like Argo CD, GitHub Actions or Jenkins are fine), and hands-on experience with Helm and Kustomize for managing manifests. Solid coding in Go, Python or Bash, with a love for automating away repetitive work.

Comfortable being on-call and leading incidents calmly under pressure.

Professional fluency in German and excellent English; at home working in a diverse team.

It would be great if you had these to, but we'll support you if you don't:

Experience in regulated industries (financial services, legal, healthcare, defence) or under compliance frameworks such as ISO 27001 or C5.

Track record of designing or contributing to custom Kubernetes Operators.

Service-mesh experience (Istio, Linkerd, Cilium).

A demonstrated interest in working shoulder-to-shoulder with AppSec engineers to raise platform security posture.

Experience operating Couchbase (Couchbase Operator, server groups, XDCR) or another stateful data platform on Kubernetes.

Experience migrating ingress controllers or other cluster-wide components with zero customer downtime.

Experience with anomaly detection on platform telemetry.

#LIHybrid

Jetzt online bewerben

Senior Site Reliability Engineer

Jobbenachrichtigungen per Newsletter erhalten

Dieses Jobangebot teilen