Site Reliability Engineer | Codersbrain
contractualfull-time
Posted on April 2, 2025
Job Description
Site Reliability Engineer
Job Summary
The Site Reliability Engineer is responsible for designing, implementing, and maintaining scalable and reliable infrastructure on AWS. This role focuses on enhancing operational efficiency through automation and collaboration with development teams to integrate reliability into the software development lifecycle. The position plays a critical part in ensuring system resilience through proactive chaos engineering initiatives.
Responsibilities
- Design, implement, and maintain scalable and reliable infrastructure on AWS.
- Utilize Dynatrace for monitoring, performance tuning, and troubleshooting of applications and services.
- Develop automation scripts to streamline deployment processes and enhance operational efficiency.
- Lead chaos engineering initiatives to identify weaknesses and improve system resilience.
- Collaborate with development teams to integrate reliability into the software development lifecycle.
- Automate operational processes to reduce manual intervention.
- Participate in on-call rotations to support incident response and resolution.
Qualifications
- Education: Bachelor's or Master's degree in Computer Science, Engineering, or related field.
- Experience: 5+ years in Site Reliability Engineering or related roles.
- Strong proficiency in AWS services (EC2, S3, RDS, Lambda, etc.) and cloud architecture best practices.
- Experience with Dynatrace or similar monitoring tools for application performance management.
- Familiarity with chaos engineering principles and tools.
- Solid understanding of load testing methodologies and tools.
- Proficient in scripting languages and configuration management tools.
- Excellent problem-solving skills and the ability to work under pressure.
Preferred Skills
- Additional experience with other AWS services like EKS and DynamoDB.
- Experience with other performance monitoring tools similar to Dynatrace.
- Knowledge of configuration management tools and scripting languages beyond the essential ones.
Experience
- Minimum of 5 years of experience in Site Reliability Engineering or related fields.
Environment
- Location: Kolkata, Mumbai, or Remote
- Work Type: Contractual, Full-Time
- Start Date: Immediately
- Deadline for Applications: April 9, 2025
Salary
- Not specified
Growth Opportunities
- Potential for career advancement through leadership roles within the SRE team or broader engineering teams.
Benefits
- Since the salary and specific benefits are not specified, please inquire during the interview process for more details.