HumanBit Logo

Site Reliability Engineer | Codersbrain

contractual
Posted on August 6, 2025

Job Description

SRE

Company Overview

(No specific details provided)

Job Summary

The Site Reliability Engineer (SRE) will play a crucial role in designing and implementing scalable, resilient cloud-native infrastructure. This position focuses on maintaining high availability, performance, and capacity planning to ensure the smooth operation of infrastructure services. The SRE will collaborate with engineering and product teams to enhance system reliability and drive improvements across the organization's cloud operations.

Responsibilities

  • Design and implement scalable, resilient cloud-native infrastructure on AWS.
  • Own the SRE function, focusing on availability, latency, performance, monitoring, emergency response, and capacity planning.
  • Collaborate with engineering and product teams to improve system reliability, speed, and performance.
  • Set up, maintain, and improve Continuous Integration/Continuous Deployment (CI/CD) pipelines using tools like Jenkins, GitHub Actions, or CodePipeline.
  • Perform load and stress testing, analyze performance bottlenecks, and provide remediation strategies.

Qualifications

  • Must have hands-on experience building secure, scalable cloud architectures.
  • AWS Solution Architect – Associate or Professional certification is highly preferred.
  • Proven SRE and DevOps experience with a strong problem-solving mindset.
  • Proficiency in performance testing tools like JMeter, Gatling, k6, or Locust.
  • Experience in containerization and orchestration tools such as Docker and Kubernetes is a plus.
  • Strong scripting or programming skills in languages like Python, Go, or Java.
  • Good understanding of networking concepts, cloud security best practices, and monitoring tools.

Preferred Skills

  • Knowledge of Infrastructure as Code (IaC) tools like Terraform or CloudFormation.
  • Experience with distributed systems and microservices architecture.
  • Familiarity with incident management tools and frameworks like PagerDuty or OpsGenie.

Experience

  • Minimum of 6+ years of relevant experience in Site Reliability Engineering or a similar role.
Powered by
HumanBit Logo