HumanBit Logo

Software Engineer - Compute Platform - 11030020 | withRemote

full-time
Posted on March 15, 2025

Job Description

Senior Software Engineer - Compute Platform

Company Overview

This startup is building the foundation for decentralized AI development at scale. By combining powerful distributed training infrastructure with an intuitive developer experience, the platform enables researchers and engineers to collaboratively train state-of-the-art models. Fresh off a $5.5M seed round backed by industry leaders, the company is expanding its team to accelerate open-source AGI development.

Job Summary

In this hybrid role, you will work across both the developer-facing platform and the underlying infrastructure layers. Your contributions will directly impact AI workload management and distributed training systems, ensuring a seamless and scalable experience for users. You'll build robust backend APIs, intuitive web interfaces, and high-performance distributed components to empower decentralized AI development.

Responsibilities

Platform Development

  • Build intuitive web interfaces for AI workload management and monitoring.
  • Develop REST APIs and backend services in Python.
  • Create real-time monitoring and debugging tools.
  • Implement user-facing features for resource management and job control.

Infrastructure Development

  • Design and implement distributed training infrastructure using Rust.
  • Build high-performance networking and coordination components.
  • Develop infrastructure automation pipelines with Ansible.
  • Manage cloud resources and container orchestration.
  • Implement scheduling systems for heterogeneous hardware (CPU, GPU, TPU).

Qualifications

Platform Skills

  • Strong Python backend development, with experience in frameworks such as FastAPI and asynchronous programming.
  • Proficiency in modern frontend development using TypeScript, React/Next.js, and Tailwind CSS.
  • Experience designing and implementing RESTful APIs and developer dashboards.

Infrastructure Skills

  • Systems programming experience with Rust.
  • Expertise in infrastructure automation using tools like Ansible and Terraform.
  • Proficiency in container orchestration (e.g., Kubernetes) and cloud platform management (GCP preferred).
  • Familiarity with observability tools such as Prometheus and Grafana.

Additional Requirements

  • Ability to work effectively in a fast-paced, evolving environment.
  • Excellent written and verbal communication skills, with an aptitude for translating technical challenges into clear business impacts.

Preferred Skills

  • Experience with GPU computing and ML infrastructure.
  • Knowledge of AI/ML model architecture and training processes.
  • Background in high-performance networking implementation.
  • Contributions to open-source infrastructure projects.
  • Experience with WebSocket or real-time systems.

Environment

  • Flexible work arrangement: Option to work remotely or from the San Francisco office.
  • A collaborative and innovative team environment with experienced engineers and researchers.
  • Regular team off-sites and retreats that foster camaraderie and shared learning.
  • A culture that values open development, innovation, and continuous improvement.

Compensation & Benefits

  • Competitive compensation with significant equity and token incentives.
  • Full visa sponsorship and relocation support.
  • Professional development budget for courses and conferences.
  • Comprehensive benefits package, including full healthcare coverage.
  • Regular team off-sites and opportunities to attend industry conferences.

Growth Opportunities

Join a team that’s shaping the future of decentralized AI development. You'll work on cutting-edge challenges in both platform and infrastructure engineering while contributing to the broader AI community through research and open-source contributions. If you're passionate about democratizing AI development and thrive on solving complex problems, this is the opportunity for you.

Powered by
HumanBit Logo