300 points · 2 submissions
with Cloudflare
CallWiz is a voice scheduling assistant that finds meeting times across different booking systems like Calendly, Cal.com, Google Calendar, and Workmate, then books the multi-way meeting through a single conversation. It solves the painful back-and-forth of group scheduling by letting someone talk naturally, ask for different dates, and keep refining options without restarting the workflow. On the Cloudflare side, I use Workers and Durable Objects as our agent orchestration layer to maintain session state, coordinate tool calls across the conversation, paginate and refresh availability over time, and make the scheduling agent reliable across both web and phone voice channels. For ElevenLabs, I have a custom agent with initiation webhooks, a cloned voice (my own), client and server tools, and agent settings overrides. Tons of fun building this!
Submitted 29 Mar 2026
with Firecrawl
Quantext is like TV with a commentator who actually did the homework. The problem: YouTube is full of confident claims and hot takes. Watching alone means you either swallow it or pause and Google forever. I wanted watch-together energy: someone in the room who can fact-check, rebut, or roast at the right beat—then get out of the way. What I built: Paste any YouTube URL. The app pulls the transcript (or transcribes), uses an LLM to find high-signal moments (stats, bold claims, missing context), then Firecrawl runs parallel search + scrape across those moments so every interjection ships with real sources, not vibes. At playback, an ElevenLabs Conversational AI agent fires on timestamp—pause → speak → auto-resume—so it feels like the video was edited to include commentary. Ask a question mid-watch and the agent can search again live via a custom Firecrawl client side tool, so ad-hoc rabbit holes don’t break the illusion. Clever Firecrawl use: Research isn’t one big dump. It’s per-claim, pre-orchestrated so interjections are instant at runtime, plus on-demand when the user interrupts. That’s the hack: pre-compute the expensive part, keep Firecrawl in the loop when the human goes off-script. Clever ElevenLabs use: This isn’t TTS reading a script. It’s a full agent session with tooling (pause/resume the player, live search) and persona-conditioned behavior. I cloned my voice and fed roughly a decade of my stand-up comedy writing into one of the agent’s world model so “me” sounds like my timing and voice—not a generic narrator. On top of that: 23 distinct commentator personas (e.g. Grandma Gloria, Rage Rick, Sir David Attenburro, Ackshually Alex, Conspiracy Carl) and two modes—The Full Picture (sourced fact-checking) and The Fool Picture (MST3K-style roasting)—so the same pipeline feels like a roster of shows, not one bot. Engineering that matters: Caching (transcripts, prep artifacts, session payloads) so repeat watches and shares don’t re-pay the full prep tax. Share links so you can send a URL that opens the same video, persona, and mode—demoable, viral-friendly, and actually usable after the hackathon. Why it’s a flex: It chains Firecrawl’s research graph with ElevenLabs’ realtime voice + tools in a loop built around real video time—not a chat window dressed up as a product. The “wait, it stopped the video and cited that?” moment is the whole thesis.
Submitted 25 Mar 2026