Site Reliability Engineer | Codersbrain

contractual

Posted on 08-09-2025

Job Description

Site Reliability Engineer (SRE)

Company Overview

Information not provided.

Job Summary

The Site Reliability Engineer (SRE) will focus on enhancing the reliability, availability, and performance of applications. This role is integral in implementing SRE practices to ensure efficient incident management and continuous service improvement. The SRE will collaborate closely with cross-functional teams to design and automate infrastructure, ensuring optimal operation of applications in cloud environments.

Responsibilities

Implement and maintain Site Reliability Engineering (SRE) practices, including Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgets, incident management, and post-mortem analyses.
Enhance application reliability, availability, and performance through tuning and optimization.
Develop and manage observability platforms, utilizing tools such as Datadog and New Relic, and implement MELT (Metrics, Events, Logs, Tracing) techniques.
Leverage cloud-native design patterns, particularly with Microsoft Azure or Google Cloud, to streamline platform engineering and infrastructure automation.
Utilize Infrastructure as Code (IaC) practices with tools like Terraform and Ansible to support deployment and management of applications.
Design and execute CI/CD pipelines, incorporate Test-Driven Development (TDD), and promote a DevSecOps culture to ensure secure coding practices.

Qualifications

Proficiency in Site Reliability Engineering principles and practices.
Experience with observability platforms, specifically Datadog and New Relic.
Strong knowledge of cloud platforms, preferably Microsoft Azure or Google Cloud.
Familiarity with Infrastructure as Code methodologies (Terraform, Ansible).
Understanding of modern application architectures including SOA, API-first, Twelve-Factor, and Microservices.
Experience in API ecosystem design and integration patterns.
Skills in developing and maintaining CI/CD pipelines and promoting Test-Driven Development.
Understanding of DevSecOps principles to incorporate security into the development lifecycle.

Preferred Skills

Experience with OpenTelemetry for enhanced observability.
Knowledge of DNS, traffic routing, and Microsoft Front Door services.
Familiarity with chaos engineering, performance testing, and disaster recovery (DR) testing.

Experience

An ideal candidate should have significant experience in Site Reliability Engineering, cloud infrastructures, and modern application design patterns. Specific years of experience have not been provided.

Environment

Information not provided regarding work setting, location, or additional environmental conditions.

Salary

Information not provided.

Growth Opportunities

Information not provided.

Benefits

Information not provided.