Web Developer | HumanBit main
Job Description
VoiceBot Web implementation Doc ServerβClient Event Handling This document explains how the VoiceBot component communicates with the server via WebSocket using structured event-based messages. It covers: Client-sent events (start, media) Server-sent events (media, mark, clear) Audio processing flow π Event Flow Summary [Client] ββ β€ Connect to WebSocket ββ β€ Send 'start' event ββ β€ Start recording microphone ββ β€ Encode PCM(16-bit Mono) (S16LE) β Base64 ββ β€ Send 'media' chunks [Server] ββ β€ Receive 'media' chunks ββ β€ Process + respond with 'media' (Base64) ββ β€ Optionally send 'mark' or 'clear' events [Client] ββ β€ Receive events ββ β€ 'media': decode + play ββ β€ 'clear': stop playback σ°³ Client-Sent Events 1. start EventSent once after WebSocket connection is established. Informs server to begin voice session. { "event": "start", "start": { "call_sid": "test", // placeholder or session identifier (for server) "stream_sid": "test" // stream identifier (for client) } } π Sent from: initializeWebSocket π Sent after: 2-second delay post connection 2. media Event Sent continuously during active microphone recording every 200ms of 200ms of audio. { "event": "media", "media": { "chunk": "inbound", // direction of audio "payload": "<base64_PCM>", // audio chunk (16-bit Mono PCM encoded to Base64) "timestamp": 1721047849.123 // Unix timestamp } } π Server-Sent Events 1. media Event Sent by server as a response audio chunk. Meant to be played by the client. { "event": "media", "media": { "payload": "<base64_PCM>" } } π Handled in: ws.onmessage π Decoded and played in: playAudioResponse() 2. clear Event Instructs client to stop audio playback and clear existing buffer. In this case server has encountered interruption based on voice, this axpect client to stop playing the audio and clear the audio buffer if any for older audio{ "event": "clear" } 3. mark Event (Optional) Typically used as metadata or checkpoints from server. this is sent every time the audio is played till the server mark event and should be returned to server after finishing the audio playback. { "event": "mark", "mark": { "type": "info", "value": "partial_result" } } Timeout if WebSocket doesn't open in 10s π Developer Notes β Make sure server is expecting start and media messages β media event will only be sent after the start even sent by client, all the media event if sent before start will be ignored or may fail β Audio must be 16-bit linear PCM, sampled at 8000 Hz