Hack #2: Cloudflare · Cloudflare
2 Apr, 16:00
Sayd: Your Voice, Everywhere on Your Desktop What We Built Sayd is a macOS desktop app that lets you dictate into any application — email, Slack, code editors, browsers — with a single hotkey. Press the key, speak naturally, release, and polished text appears right where your cursor is. No copy-paste. No switching windows. It just works, everywhere. How We Used ElevenLabs + Cloudflare The entire backend runs on Cloudflare Workers with Durable Objects as a WebSocket server. Each voice session is handled by its own Durable Object instance. Thanks to Cloudflare's global edge network, the backend is always close to the user — no matter where they are — making the round trip insanely fast. Inside each Durable Object, we run TEN VAD (Voice Activity Detection) compiled to WASM to detect and trim silence from the audio in real-time, right at the edge. This means only the meaningful speech segments are sent downstream for transcription, reducing audio length by 30-50% and directly cutting recognition latency. The trimmed audio is then transcribed by ElevenLabs Scribe V2, which powers all of Sayd's speech recognition. Scribe V2's built-in multilingual detection automatically identifies the language being spoken — English, Mandarin, Traditional Chinese, Japanese, or Korean — so users can switch languages mid-conversation without changing any settings. An LLM then polishes the transcript — removing filler words, fixing punctuation, and formatting the text naturally — before the final version is injected right at the user's cursor in whatever app they're using. What Makes It Special Fast by architecture. Edge-based VAD trimming, Cloudflare's global network, and ElevenLabs Scribe V2 combine into a pipeline where every stage is optimized to minimize latency. Users feel the result: speak, and polished text appears in under two seconds. Truly universal text input. Sayd injects text directly where your cursor is — in any macOS app — via the system Accessibility API. It doesn't matter what app you're in; if there's a cursor, Sayd can type there. Global hotkey that works everywhere. Sayd registers a system-level hotkey that works across all apps and contexts. Hold the Fn key (or any custom key combo you set), speak, and release. The interaction is always one key press away, no matter what you're doing. Multilingual out of the box. Powered by ElevenLabs Scribe V2's language detection, Sayd supports five languages with zero configuration. Speak in any supported language and it's recognized and polished correctly — including mixed-language dictation. Resilient. Every recording is saved locally. If anything goes wrong, you can retry transcription with one click.
