Zum Hauptinhalt gehen
Erstellt am 17. Mai 2026

Senior Site Reliability Engineer

Diligent Corporation
Munchen, Bavaria, Germany Vollzeit
Reference: 102_716120_5797512004

You're a seasoned Site Reliability Engineer with years spent running production Kubernetes at scale, and you're the kind of engineer who takes the initiative when something can be better - observability, resilience, a tricky upgrade, or the way the team thinks about security. You're looking for a role where that initiative has room to turn into real improvements on a platform that customers trust with their most confidential data.

In this role you'll join our operations team for our MeetingSuite product in Munich - a flat and diverse SRE team of four engineers. It's a team where influence comes from example rather than authority. Your day-to-day is keeping our Kubernetes platforms observable, resilient and boring-to-upgrade: GitOps with Flux, multi-AZ design, zero-downtime releases, and a centralised observability story every service owner can use without calling SRE. Alongside that, you'll partner closely with our Application Security Engineer on Kubernetes and container security - with room to grow into our security champion over time - to keep the bar high for the DAX 30 and other DACH customers we serve.

If multi-cluster Kubernetes, GitOps, logging, monitoring and NoSQL database management on Kubernetes are in your vocabulary, read on.

Here's a breakdown of what you'll do (not all of it, just the important stuff):

  • Operate and continuously improve our Kubernetes production platforms, contributing to zero-downtime upgrades and multi-AZ resilience as team-wide goals.
  • Grow into the team's expert on our ELK-based log platform - centralised cross-cluster monitoring and anomaly detection - so every service owner can see, alert on and debug their workload without SRE hand-holding. Maintain and evolve our Prometheus alerting rules and Grafana dashboards alongside the team.
  • Partner with our Application Security Engineer on Kubernetes and container security - admission control, workload identity, secrets management, network segmentation and runtime threat detection - with an interest in growing into our security champion over time.
  • Love automation. Chip away at operational toil - deployments, monitoring setup, internal reporting - building on the baseline the team already has, and ship reliably through our GitOps workflow (Flux, GitLab CI).
  • Participate in our Standby and Daily Business rotation, lead incident response, run blameless post-mortems and drive the resulting action items to completion.

These are the essentials you'll need to get an interview:

  • Several years hands-on SRE, DevOps or Platform Engineering, including meaningful time running production Kubernetes at scale.
  • Strong Kubernetes expertise with deep hands-on experience in at least one area - cluster lifecycle and upgrades, workload identity and RBAC, admission control, network policies, or custom resources and operators - and working familiarity with the rest.
  • Solid grasp of Kubernetes and container security - secrets management, network segmentation and runtime protection - and an interest in growing into our security champion alongside our Application Security Engineer.
  • Proven depth in the ELK stack (or a very similar log platform) - pipelines, indexing, dashboards, alerting - with an interest in growing into the team's observability expert. Working knowledge of Prometheus and Grafana.
  • Comfortable with GitOps and CI/CD as a daily way of working (we run Flux and GitLab CI; equivalents like Argo CD, GitHub Actions or Jenkins are fine), and hands-on experience with Helm and Kustomize for managing manifests. Solid coding in Go, Python or Bash, with a love for automating away repetitive work.
  • Comfortable being on-call and leading incidents calmly under pressure.
  • Professional fluency in German and excellent English; at home working in a diverse team.

It would be great if you had these to, but we'll support you if you don't:

  • Experience in regulated industries (financial services, legal, healthcare, defence) or under compliance frameworks such as ISO 27001 or C5.
  • Track record of designing or contributing to custom Kubernetes Operators.
  • Service-mesh experience (Istio, Linkerd, Cilium).
  • A demonstrated interest in working shoulder-to-shoulder with AppSec engineers to raise platform security posture.
  • Experience operating Couchbase (Couchbase Operator, server groups, XDCR) or another stateful data platform on Kubernetes.
  • Experience migrating ingress controllers or other cluster-wide components with zero customer downtime.
  • Experience with anomaly detection on platform telemetry.

    #LIHybrid

Jobbenachrichtigungen per Newsletter erhalten