HumanBit Logo

Research Engineer | Smallest.ai

full-time
Posted on December 20, 2025

Job Description

Research Engineer

Company Overview

Smallest.ai is an AI research lab pioneering the future of compact, powerful models. We power low latency, high accuracy STT, TTS, S2S and SLM models to power Voice and Multi-Modal AI applications across 100+ industries.

Our platform runs with enterprise-grade security, supports on-prem and private cloud deployments, and is fully SOC2, GDPR, HIPAA, and PCI compliant, making it suitable for regulated and high-trust environments.

Job Summary

This role is focused on transforming messy, real-world data into high-quality inputs that machine learning models can learn from. You will work extensively with speech, language, and real-time systems across multiple languages, emphasizing the importance of data quality and systems in improving model performance.

Responsibilities

  • Data Pipelines: Build high-throughput pipelines for processing audio, text, and multimodal data, both in real-time and batch.
  • Data Quality & Curation: Engage in cleaning, filtering, deduplication, and normalization of data across various formats (e.g., numbers, emails, code-mix).
  • Multilingual Data Systems: Handle data from 50+ languages and accents, focusing on language-aware normalization and segmentation.
  • Training Data Engine: Develop pipelines that continuously generate improved training data from production data via active learning loops and smart data selection strategies.
  • Evaluation & Benchmarking Pipelines: Create scalable evaluation datasets and automate quality tracking for various systems including Automatic Speech Recognition (ASR) and Text-to-Speech (TTS).
  • Data Infrastructure for Research: Collaborate closely with the research team to facilitate rapid experimentation and reduce iteration times significantly.

Qualifications

  • Strong fundamentals in data structures, systems, and pipelines.
  • Experience with large-scale data processing, with a preference for audio and text data.
  • Ability to work with messy, unstructured real-world data.
  • Strong coding skills in Python; systems experience is a plus.
  • Understanding of machine learning/data pipelines, including training and evaluation processes.
  • Excellent data curation skills.

Preferred Skills

  • Experience with speech/audio data (Automatic Speech Recognition (ASR) or Text-to-Speech (TTS)).
  • Familiarity with multilingual datasets.
  • Experience with streaming systems, such as Kafka.
  • Exposure to data-centric AI and data quality frameworks.

Experience

Minimum experience details are not specified.

Environment

Work setting and location details are not specified.

Salary

Salary information is not specified.

Growth Opportunities

Career advancement opportunities are not specified.

Benefits

Details on offered benefits are not specified.

Powered by
HumanBit Logo