D

Senior Site Reliability Engineer

Deimos

Today
New
Experience Level: Senior level Experience Length: 5 years

Job descriptions & requirements

ABOUT THE COMPANY

Deimos’ purpose is to guide companies on the journey of adopting the cloud for improved service to its customers. Ultimately, our services are aimed at trying to help you avoid costly mistakes in order to benefit from scalable, performant, and reliable systems that lie at the end of a cloud native transformation. Whether it be Developer and Security Operations, Cloud Native Transformation Strategy, or Software Engineering & Architecture, we have an intense focus on the engineering fundamentals. This allows for us to plan and build a solid foundation for your company, resulting in simplified workflows, stronger systems and true future-proofing.

JOB SUMMARY

RequirementsBachelor's degree in Computer Science, Information Technology, or a related field.5+ years of experience in Software Engineering, SRE, DevOps, or Platform Engineering, with demonstrable ownership of reliability standards at a team or company level.Strong coding fluency: Proficiency in Python (or similar) with the ability to read, understand, reason about, and write production-grade automation code.Cloud & IaC: Hands-on experience with AWS, and a solid understanding of Infrastructure as Code (Terraform or CloudFormation).Deep Observability Knowledge: Demonstrable experience with monitoring tools (DataDog, Prometheus, ELK stack). Strong understanding of SRE concepts including Golden Signals, high-cardinality data handling, and error budget mathematics.Systems Thinking: Strong grasp of designing for scale and resilience, including graceful failure, circuit breaking, connection pooling, and multi-AZ deployments.Proven ability to define and drive reliability standards across multiple teams and drive a blameless post-mortem culture.

RESPONSIBILITIES

Enablement & RelOps CultureImplement the Observability Ladder: Guide teams from basic monitoring to high-signal metric tracking. Work with product teams to define SLAs, SLIs, and SLOs, and build dashboards that track specific error budgets.Empower Product Teams: Build frameworks and deployment tooling (e.g., CI/CD, internal tooling integrations) that allow teams to make data-driven decisions on deployment safety and automate rollbacks when error budgets are depleted.Champion Reliability: Drive a blameless post-mortem culture focused on actionable takeaways, system improvements, and measurable metrics (MTBF, MTTR).Frameworks & AutomationStandardised Alerting & On-Call: Continuously improve company-wide alerting and on-call frameworks to reduce alert fatigue, ensuring alerts are highly actionable and symptom-based.Disaster Recovery: Drive evolution of DR strategies from manual processes into fully automated runbooks-as-code, allowing teams to prove and improve service recoverability through autonomous, evidence-based testing.Eliminate Toil: Develop systems, automations, and tooling for pre- and post-deployment verification, ensuring our hands-off reliability vision becomes a production reality, via Python (or similar).Reliability-as-Code: Lead the drive to manage our entire reliability suite through IaC. Use Terraform to architect, deploy, and configure our observability stack including ELK, Grafana, Loki, Prometheus, and Tracing.

REQUIRED SKILLS

Project implementation, IT training, IT support, Troubleshooting, Web service and application development, Cloud architectures and services, Data models and architectures, Java, Programming, .Net

REQUIRED EDUCATION

Bachelor's degree

Important safety tips

  • Do not make any payment without confirming with the BrighterMonday Customer Support Team.
  • If you think this advert is not genuine, please report it via the Report Job link below.

This action will pause all job alerts. Are you sure?

Cancel Proceed

Similar jobs

Lorem ipsum

Lorem ipsum dolor (Location) Lorem ipsum Confidential
3 years ago

Stay Updated

Join our newsletter and get the latest job listings and career insights delivered straight to your inbox.

v2.homepage.newsletter_signup.choose_type

We care about the protection of your data. Read our

We care about the protection of your data. Read our  privacy policy .

Follow us On:
Get it on Google Play
2026 BrighterMonday

Or your alerts