Datadog | Codersbrain
Job Description
Datadog Engineer
Company Overview
Company details were not provided.
Job Summary
The Datadog Engineer will be responsible for ensuring the high availability and performance of Datadog’s production systems. The role requires operating and scaling infrastructure across various clusters and cloud providers while automating operational tasks to reduce manual intervention. This position plays a vital role in collaborating with development teams to enhance system reliability and observability while aligning technical direction with business goals.
Responsibilities
- Ensure high availability and performance of Datadog's production systems.
- Operate and scale infrastructure across hundreds of clusters and multiple cloud providers.
- Automate operational tasks to minimize manual intervention.
- Collaborate with development teams to enhance system reliability and observability.
- Design and evolve the architecture of large-scale, distributed systems.
- Guide engineering teams on best monitoring practices for scalability, reliability, and performance.
- Evaluate and integrate new technologies and frameworks.
- Maintain services that are reliable, available, fast, and cost-efficient.
- Investigate and resolve service issues across the stack—from OS kernel to application code.
- Build tools and frameworks to enhance engineering productivity and system observability.
Qualifications
- 5+ years of hands-on experience with Datadog’s stack in multi-cloud or hybrid cloud environments.
- Strong background in systems engineering or software development.
- Experience with Kubernetes and cloud platforms (AWS, GCP, Azure).
- Strong proficiency in programming and scripting languages like Go, Python, or Java.
- Familiarity with monitoring, alerting, and incident response practices.
- Deep understanding of cloud-native architectures and microservices.
- Experience with high-throughput, low-latency systems.
- Strong communication skills.
- Experience with CI/CD pipelines and monitoring tools.
- Deep understanding of Windows and Linux systems, networking, and operating system internals.
- Experience with distributed systems and high-availability architectures.
- Strong experience with Docker, Kubernetes, and service mesh technologies.
- Familiarity with tools like Terraform, Ansible, or Pulumi (optional).
Preferred Skills
- Experience building dashboards, monitors, and alert setup systems.
- Familiarity with Jenkins, GitHub Actions, CircleCI, or similar.
- Skills in automating deployments, rollbacks, and testing pipelines.
Experience
- Minimum of 5 years of experience in relevant roles focused on Datadog and related technologies.
Environment
Details regarding the work environment and location were not provided; however, it may involve remote, in-office, or hybrid work settings based on industry standards.
Salary
Salary information was not provided.
Growth Opportunities
Details on potential career advancement opportunities were not provided.
Benefits
Benefits information was not provided.