Site Reliability Engineer | Codersbrain

full-time

Posted on 05-09-2025

Job Description

SITE Reliability Engineer

Company Overview

Company details are not specified.

Job Summary

The Site Reliability Engineer (SRE) plays a critical role in ensuring the reliability, availability, and performance of our systems. This position requires a deep understanding of cloud infrastructure, automation, and monitoring to effectively manage large-scale systems. The SRE will collaborate with development and operations teams to troubleshoot issues, implement automation solutions, and support cloud-based services.

Responsibilities

Monitor and maintain the reliability, performance, and availability of large-scale systems.
Implement automation for repetitive tasks and workflows to enhance operational efficiency.
Conduct incident response, root cause analysis, and post-incident reviews to ensure system reliability.
Collaborate with development teams to support continuous integration and delivery pipelines.
Optimize system performance, scalability, and availability using industry best practices.
Document processes and procedures, including creating runbooks for operational tasks.

Qualifications

Education: Bachelor’s degree in Computer Science, Engineering, or a related field.
Technical Skills:
- Proficiency in programming languages, particularly Python.
- Deep understanding of Linux/Windows operating systems and networking concepts.
- Experience with Azure, including services, architecture, and best practices.
- Hands-on experience with containerization and orchestration tools like Docker and Kubernetes.
- Familiarity with Infrastructure as Code (IaC) tools such as Terraform, CloudFormation, or Azure CLI.
- Experience with monitoring tools like Splunk, New Relic, or Azure Monitoring.
- Knowledge of CI/CD practices and tools like GitHub and GitHub Actions.
- Expertise in supporting Azure Machine Learning, Databricks, and other related SaaS tools.
Soft Skills:
- Strong problem-solving ability to troubleshoot complex distributed systems independently.
- Excellent written and verbal communication skills to facilitate collaboration across teams.

Preferred Skills

Experience with specific cloud platforms, particularly Azure.
Relevant certifications in cloud engineering or DevOps.
Familiarity with microservices architecture and supporting AI/ML solutions.
Previous experience in large-scale system management and configuration.

Experience

Minimum of 10 years of relevant experience in site reliability engineering, cloud engineering, or a related field.

Environment

Position based in Andheri, Mumbai. Details about work setting and flexibility (remote, in-office, hybrid) are not specified.

Salary

Salary details are not specified.

Growth Opportunities

Information on career advancement opportunities within the company is not specified.

Benefits

Benefits offered are not specified.