HumanBit Logo

Site Reliability Engineer | Scrabble

full-time
Posted on July 1, 2025

Job Description

Leadership – Site Reliability & Platform Architect

Job Summary

This role is pivotal in scaling a high-performance SaaS platform that powers logistics automation for over 500 global enterprises. As the Leadership – Site Reliability & Platform Architect, you will lead the evolution of our cloud infrastructure, DevOps maturity, and backend platform architecture. You will blend deep DevOps/SRE expertise with backend architectural thinking to build resilient, observable, and scalable systems from the ground up.

Responsibilities

  • Own Infrastructure Architecture: Design and evolve cloud-native systems to ensure scalability, high availability, cost efficiency, and security.
  • Lead Backend Platform Design: Collaborate with product and engineering teams to design performant, modular, and reliable backend systems.
  • CI/CD & Deployment Strategy: Build and scale deployment pipelines, optimize rollouts with blue-green/canary deployments, and ensure smooth delivery processes.
  • Orchestrate Systems: Manage containerized workloads using orchestration tools such as Kubernetes (EKS/GKE), ECS, or others.
  • Observability & Performance: Standardize monitoring, tracing, and logging across systems; lead capacity planning and performance tuning.
  • Infrastructure as Code (IaC): Define and maintain scalable infrastructure using tools such as Terraform and Helm.
  • Mentor & Lead: Guide engineering teams in cloud architecture, system design, and operational excellence.
  • Champion Reliability & Security: Define SLOs, SLIs, and incident response processes while enforcing best practices for infrastructure and application security.

Qualifications

  • 4+ years of experience in backend or infrastructure roles working with high-scale, production-grade systems.
  • 3+ years of hands-on backend development experience with languages such as Ruby, Node.js, Python, or Java.
  • 3+ years of experience in system design, API development, and performance optimization.
  • 1+ years in a technical leadership role focusing on DevOps/SRE/platform engineering.
  • Proven experience architecting and running infrastructure on AWS (preferred), GCP, or Azure.
  • Deep understanding of cloud-native architecture, microservices, and distributed systems.
  • Hands-on experience with Docker, Kubernetes, Terraform, and observability tools (e.g., Prometheus, Grafana, ELK, OpenTelemetry).
  • Strong programming/scripting skills in Python, Go, or Bash and the ability to review production backend code (Ruby/Node).
  • Experience with relational and NoSQL databases such as Postgres, MongoDB, and Redis.

Preferred Skills

  • Experience with service mesh, multi-region high availability (HA) systems, or event-driven architectures.
  • Background in security, compliance, or cost optimization.
  • Prior experience leading backend engineering teams and being deeply involved in designing and scaling core systems.
  • A comprehensive grasp of backend fundamentals (data modeling, API design, asynchronous jobs, caching) and a passion for building fast, resilient, and observable systems.
  • A builder, architect, and operator mindset with strong business context awareness.

Experience

  • 7–14 years of total industry experience with significant exposure to backend, infrastructure, and technical leadership roles in high-scale production environments.

Environment

  • Location: Bangalore
  • Type: Full-time
    You will work within a dynamic, modern engineering environment, engaging with cross-functional teams to drive continuous improvement in cloud infrastructure, deployment methodologies, and system performance.
Powered by
HumanBit Logo