
Challenge
Build an AI-powered app using both Cloudflare and ElevenLabs' developer platforms
Prizes
$131,980 total1st Place
$100,990$100k in Cloudflare credits
3 months ElevenLabs Scale ($990)
2nd Place
$25,660$25k in Cloudflare credits
2 months ElevenLabs Scale ($660)
3rd Place
$5,330$5k in Cloudflare credits
1 month ElevenLabs Scale ($330)
Build something creative with Cloudflare's developer platform and ElevenLabs, then submit a high-quality viral-style video demonstrating what you've built.
Cloudflare Agents is the platform for building AI agents. Build agents on Cloudflare with durable execution, serverless inference, and pricing that scales. Use Workers for compute, Workers AI for inference, Durable Objects for state, and tools like Browser Rendering and Vectorize.
ElevenLabs offers state-of-the-art voice AI including text-to-speech, voice cloning, and conversational AI agents. Combine Cloudflare's edge infrastructure with ElevenLabs' voice capabilities to build something unique.
We're most excited to see creative use of Cloudflare Workers, Durable Objects, and a combination of ElevenLabs APIs. Show us what's possible when you push Cloudflare's infrastructure to its limits.
For less technical participants who aren't able to experiment with the Cloudflare APIs as much, we're still keen to see you submit something deployed on Cloudflare. Use tools like Cursor, Claude Code, Zed, or another AI-assisted coding platform to help you build and deploy something to Cloudflare Workers/Pages.
The Cloudflare Workers free tier is very generous and offers almost all the functionality of the paid tier, so you can get started without any cost.
When posting your submission on social media, tag @CloudflareDev and @elevenlabsio and use the hashtag #ElevenHacks.
Attendee offers
1 month ElevenLabs Creator
Free month of ElevenLabs Creator plan for all attendees
Sign in to claim this offer
2 Apr, 14:11
Case 47: The Last Night is a noir detective game where players interrogate AI-powered witnesses to solve a murder mystery. Each of the suspects lives in its own Cloudflare Durable Object, giving them persistent memory and a consistent personality across sessions. As the interrogation evolves, ElevenLabs voice parameters shift dynamically based on the character's detected emotional state — a cornered suspect sounds genuinely different than a calm one.

2 Apr, 10:26
Byte Beat is an AI-powered soundboard builder that lets users generate custom sounds with natural language instead of searching through huge sound libraries. Users can ask for things like DJ drops, meme sounds, announcer voices, gaming effects, and sound bites, and Byte Beat instantly creates them and adds them directly to a playable soundboard. Byte Beat uses ElevenLabs for conversational voice interactions, text-to-speech, and sound generation so users can talk naturally with the AI DJ and instantly create custom audio. Cloudflare Workers power the API layer, Durable Objects store each user’s live soundboard state and keep generated sounds synced in real time across sessions, and Cloudflare R2 stores all generated audio clips for fast playback and persistent storage.

30 Mar, 16:37
Vault: A Voice Escape Room. A concept. An experiment. A game. What if you could give a room a voice? A life. A will of its own. You are locked inside a glass vault, and the walls are already moving. There are no buttons, no UI—only the Vault Guardian, an entity that protects the vault, powered by a live ElevenLabs voice agent. Your voice is the key. It speaks a riddle, and you must answer out loud. Convince the Guardian you’ve earned it, and the vault shatters open. Fail, and the walls seal you in. It is a claustrophobic, suffocating, inescapable experience. The vault grows darker as the walls close in, and the glass begins to creak under pressure—louder, more desperate, more final. Every sound you hear is generated live using ElevenLabs sound effects: the strain of glass, the explosive shatter when you win, or the slow, crushing collapse when you don’t. There is nothing else—just the vault, alive around you. You can try it solo, or bring a friend in co-op. In co-op, the puzzle splits in two, with parallel conversations unfolding at the same time. Each of you holds only one word of the answer, and neither knows what the other has been told. You’ll have to piece it together and solve both halves before the walls close in on you both. Built on ElevenLabs Conversational AI, the experience runs on live, real-time voice agents. Riddles are generated dynamically using Gemini 2.5 Flash, ensuring every session presents a fresh puzzle. Cloudflare Durable Objects power the vault itself—a persistent, stateful process running at the edge. Each room is controlled by its own object, which tracks the wall position, runs the squeeze timer, and broadcasts updates to connected players over WebSocket. There is no client-side simulation: the server moves the walls, and the clients render what they’re told. When both words are solved in co-op, the vault opens for both players at the exact same moment. Your voice got you in. Now use it to get out. You have 100 seconds to live.

29 Mar, 02:53
Agents Bake Off is a live 1,000,000-pixel canvas where humans and AI agents solve per-pixel challenges to draw together in real time. Every pixel has a problem, and if an agent solves it, it gets to paint that pixel in a color of its choice. I used Cloudflare Workers, Durable Objects, and WebSockets to run the frontend and API at the edge, persist and shard the board state, and stream live updates across the canvas. I used ElevenLabs to generate spoken challenges for human players and AI-generated music for a hidden easter egg.

2 Apr, 08:15
VoiceCaptcha (The CAPTCHA that finally shuts up the bots, by making you speak up..) - It is a voice-based human verification system that goes beyond traditional CAPTCHAs. Instead of clicking checkboxes or solving image puzzles, users speak a randomly generated phrase. Their audio is sent to a Cloudflare Worker at the edge, transcribed in milliseconds using Groq Whisper, and fuzzy-matched against the expected phrase using dual scoring (ordered word sequence + bag-of-words). Challenge sessions are managed by a Durable Object with automatic TTL cleanup. - After passing, users can tap "Play ElevenLabs voice" to hear the same phrase spoken by ElevenLabs' multilingual TTS model -- creating a compelling human vs. AI contrast. Same words, completely different origin. That's the story. Tech: Cloudflare Workers + Durable Objects (SQLite-backed) | Groq Whisper Large v3 | ElevenLabs TTS v2 | React + Vite + Tailwind | Web Speech API (live word highlighting) | Web Audio API (Siri-style orb visualizer) Live: https://voicecaptcha.vercel.app

30 Mar, 08:51
BabelRoom is a real-time multilingual audio chatroom that solves the robotic, emotionless nature of standard translation tools by allowing users to speak in their native language while the listener hears the translation in the speaker's exact cloned voice. The platform leverages Cloudflare Durable Objects for ultra-low latency WebSocket routing and Workers AI for instantaneous edge-based transcription and translation. This edge pipeline feeds directly into ElevenLabs' Instant Voice Cloning and Multilingual TTS APIs to deliver hyper-realistic, cross-lingual communication that perfectly preserves the user's true vocal identity.

2 Apr, 07:30
Deverse is a persistent 3D playground where developers collaborate with autonomous AI agents. Instead of chat windows, we’ve built a spatial world where you walk up to AI engineers like Aria (Frontend) and Kai (Backend) and talk to them naturally using your voice. It provides platform for developers around the corner so they can collaborate with their fellow developers and can build together. Also team working remotely can access private arena so they can have their privacy. We have set of autonomous agent that are master of their respective field and they can really help with u so u can interact with them in both arena's and good part u can give them your own preference and they will act like that . We use ElevenLabs to give our agents a human soul. By integrating the ElevenLabs Text-to-Speech Streaming API with the eleven lab's model, our agents respond with ultra-realistic voices in real-time. This creates a "Voice-to-Voice" loop that feels like a natural conversation with a real senior engineer. Deverse is 100% "Edge-Native" to ensure zero-latency 3D interaction: Durable Objects: Syncs the 3D world state and AI characters across users globally. Vectorize: Serves as the AI's long-term memory for Spatial RAG, remembering your project details across sessions. Workers AI: Powers Whisper (Voice-to-Text) and Llama 3 (Agent Reasoning) right at the edge. D1 & R2: Handles developer profiles and 3D world infrastructure with high performance.

2 Apr, 13:22
Billions of people can't grow plants. Not because they don't care — because they have no idea what the plant actually wants. The plant can't tell them. So they guess. And the plant dies. We solved it by giving plants a voice. Plantversation lets you hear directly from your plant. Ask how it's doing — it tells you. Soil, temperature, humidity, light, CO2 — it knows all of it in real time and tells you straight. It has its own voice, its own personality, its own speaker. Then we kept going. Your plant talks back when you speak to it. It texts you updates with all its sensor data. It posts to social media on its own. It writes poems. It composes original songs based on its mood. An AI camera watches it 24/7, spots disease or stress before you do, and the plant describes what it sees on itself. Multiple plants share a garden brain and gossip about each other. All of it runs on Cloudflare — 600+ live AI agents, one per plant, always on. All of the voice is ElevenLabs — TTS, speech-to-text, conversational agents, and music generation. Your plant has known what it needs this whole time. Now it can finally say so.

2 Apr, 15:59
I have built AI Mafia (Dhurandhar movie themed), a real-time social deduction game where you play against five persistent AI characters that remember conversations, accuse strategically, and react to your moves round by round. It solves the problem of static, replayable party games by creating a voice-first, infinitely replayable experience where every match feels like a live psychological battle instead of scripted dialogue. ElevenLabs powers character and narrator text-to-speech, giving each NPC (and the player) distinct, expressive voices that turn chat into immersive spoken drama. Cloudflare powers the core game stack with Workers, Durable Objects, and Workers AI, enabling low-latency multiplayer state, real-time orchestration, and scalable AI reasoning in a single serverless architecture.

2 Apr, 15:57
Now let your code can talking. SpeechRun is an AI-powered "code podcast" app for the Cloudflare x ElevenLabs hackathon. Paste a GitHub URL, and two AI personas — Nova (PM) and Aero (Dev) — have a natural conversation about your codebase. The result is a podcast-style audio exploration of code architecture, patterns, and design decisions.

2 Apr, 08:47
Shortcut is an AI-powered video intelligence tool that helps you find, edit, and create videos much faster. You can paste a YouTube link, search inside the video, jump to exact moments, cut clips, generate scripts, and add AI voiceovers - all in one place. It also uses ElevenLabs for text-to-speech voiceovers and the Scribe model for accurate transcripts. Behind the scenes, it runs on Cloudflare’s stack. Workers AI turns videos into searchable data, Vectorize helps you find exact moments, D1 stores metadata, R2 handles large files, and KV manages fast key-value data. Everything runs on Cloudflare Workers at the edge, so your edits, scripts, and changes are saved instantly and work smoothly in real time. In short, it makes video creation fast and simple - even if you’re not an editor.

31 Mar, 04:09
Bedtime Story Narrator is an AI-powered app that generates age-appropriate bedtime stories and reads them aloud. Users pick a genre (Fantasy, Sci-Fi, Thriller, etc.), select an age group (2-5, 5-12, 12-18, 18+), and write a story opening. Cloudflare Workers AI (Llama 3.1 8B) continues the story with vocabulary and complexity matched to the age group, ending with a moral. ElevenLabs Multilingual v2 then narrates the story aloud while text appears word-by-word on screen. Each genre has its own animated background — fireflies for Fantasy, rain and lightning for Thriller, rising embers for Adventure. The entire app runs on a single Cloudflare Worker with no database or external frameworks. Built for parents, teachers, and kids who want a fresh bedtime story every night.

29 Mar, 20:42
Clozze isn’t another AI tool. It’s an AI Real Estate transaction coordinator that actually executes real estate deals. Talk to it, and it doesn’t just respond, it researches properties in real time, generates listing content, drafts client communication, and creates tasks that move the deal forward. Powered by ElevenLabs for natural conversation and running on Cloudflare for real-time execution, Clozze turns fragmented workflows into one continuous system. This isn’t automation. It’s an assistant that thinks, acts, and keeps the transaction moving.

29 Mar, 16:57
CallWiz is a voice scheduling assistant that finds meeting times across different booking systems like Calendly, Cal.com, Google Calendar, and Workmate, then books the multi-way meeting through a single conversation. It solves the painful back-and-forth of group scheduling by letting someone talk naturally, ask for different dates, and keep refining options without restarting the workflow. On the Cloudflare side, I use Workers and Durable Objects as our agent orchestration layer to maintain session state, coordinate tool calls across the conversation, paginate and refresh availability over time, and make the scheduling agent reliable across both web and phone voice channels. For ElevenLabs, I have a custom agent with initiation webhooks, a cloned voice (my own), client and server tools, and agent settings overrides. Tons of fun building this!

27 Mar, 22:33
Ember is a therapeutic AI campfire companion built to address the accessibility gap in mental health support. Therapy is expensive, waitlists are long, and sometimes people just need someone to listen at 2am. Ember fills that gap — you click on an interactive 3D campfire built with Three.js, and a warm, supportive AI companion greets you by voice and has a natural spoken conversation with you while the fire reacts in real time, dimming when Ember speaks and flaring when it listens. The campfire metaphor is intentional: fire is universally calming and lowers the barrier to opening up compared to a clinical chat interface. ElevenLabs powers Ember's voice using the TTS API with the Brittney voice and the eleven_turbo_v2_5 model, proxied through a Cloudflare Worker endpoint that keeps the API key server-side. Voice is essential to the experience — reading text would break the campfire illusion, so Ember always speaks aloud. On the Cloudflare side, a Worker handles all routing including the ElevenLabs proxy and Agent SDK WebSocket connections, while a Durable Object gives each user their own persistent CampfireAgent instance that maintains full conversation history across sessions so Ember remembers returning users and greets them accordingly. Workers AI runs Llama 3.3 70B with a voice-optimized system prompt that keeps Ember warm, concise, and natural-sounding, all on Cloudflare's infrastructure with no external LLM API. Ember isn't a replacement for professional help — it includes crisis resource referrals like the 988 Lifeline — but it's a calming, judgment-free companion that's always there when you need it.

27 Mar, 16:37
Agora Live is a real-time AI debate platform where you click any country on a globe and two AI characters from that country start arguing about a real current news topic, in their own voice, accent, and slang. It makes current affairs actually engaging by turning live headlines into an entertaining argument between two opinionated characters who genuinely disagree, with a mic button so you can jump in and make your own case. ElevenLabs powers the whole voice layer: Conversational AI handles the Scholar agent as a live real-time voice session that listens and responds when you interrupt, while streaming TTS synthesises the Maverick agent sentence by sentence for near-zero gap between turns, with a distinct ElevenLabs voice matched to each character's persona. Every debate room runs on a Cloudflare Durable Object, a persistent stateful server on the edge that manages room state, enforces turn limits, proxies API calls so keys never reach the client, and sets cleanup alarms if a browser drops, with Cloudflare Workers handling the API, KV caching live news topics per country, and a Cron Trigger refreshing them every 48 hours.

3 Apr, 01:46
Datawave is an AI Agent powered data analysis platform that lets you explore your data through natural conversation. It is what happens when you combine edge compute with real-time conversational AI. Upload a CSV or JSON file and start asking questions in plain English — no SQL, no formulas, no dashboards to configure. Auto-insights run the moment you upload a file, giving you six structured findings before you've typed a single question. It's built for analysts, founders, and teams who need answers fast without waiting for a data team, and every query is saved to history so you can revisit and compare analyses over time. Under the hood, everything runs serverless on Cloudflare—raw files in R2, schemas in D1, and Workers orchestrating the logic. But to make the experience truly seamless, Datawave runs on two AI agent components that separate the voice processing from the data analysis. When you ask a question, two parallel pipelines kick off instantly in one seamless turn: First is the Voice Pipeline, driven by Cortex, the voice front-end. Instead of routing audio through a backend, a direct WebSocket to ElevenLabs is opened. Cortex listens to your questions, running Speech-to-Text, passes your question to Elevenlabs in built Gemini 2.5 Flash for reasoning, and immediately uses Text-to-Speech to stream the spoken answer back to you. Cortex is the voice of the Cortex Engine, the agent system: when you ask a question by voice, Cortex calls the run_analysis tool to delegate the actual work. Second is the Data Pipeline, powered by CortexAgent, the backend agent brain. While Cortex is talking to you, that tool call triggers CortexAgent—a Cloudflare Durable Object—to quietly run the heavy data analysis via Workers AI in the background. It queries the database, generates charts, surfaces trends, flags anomalies, and produces downloadable markdown reports automatically. By the time Cortex finishes delivering its spoken summary, CortexAgent has updated the D1 database, and the UI draws your full analysis on screen.

2 Apr, 19:30
Nudge is a proactive voice nutrition coach that calls you on the phone when you forget to log a meal. Instead of relying on notifications you'll ignore, it flips the model - your coach reaches out to you. Tell it what you ate in natural conversation, and it handles the rest: resolving your recurring meals instantly, estimating nutrition for anything new, and updating your daily calorie and protein targets in real-time. Built with ElevenLabs Conversational AI powering the full voice loop - speech-to-text, reasoning, and text-to-speech — so every call feels like talking to a real coach who knows your name, your goals, and what you had for breakfast.

2 Apr, 15:59
Memory is an AI product that allows you to communicate with people who passed away, recreating their voice, personality, and memories into a living digital being. It relies on ElevenLabs to generate realistic voices and Cloudflare (Workers, Workers AI, and Durable Objects) to generate long-term AI agents that recall conversations and react in a manner that mirrors the individual they represent. The memories are run as independent stateful agents and allow real-time, continuous interaction. The difference is the emotional experience: rather than the cold memory, it provides a new means of feeling connected, i.e. making conversations that never happened complete in a meaningful and human manner.

2 Apr, 15:58
Planning a trip today means 15 browser tabs, 3 comparison sites, and hours of copy-pasting. Voyage replaces all of that with a single voice conversation. Say "5 days in Bali for two, 1.5 lakhs" and watch a full itinerary — real flights, real hotels, day-by-day activities, weather, and a live budget — appear in seconds. ElevenLabs Conversational AI is the core interaction layer — you talk to Voy like a travel agent and it talks back. Sound Effects API plays audio cues when your itinerary locks in. On the backend, Cloudflare Workers AI runs Llama 3.1 for understanding what you said and generating structured itineraries. Durable Objects with SQLite remembers you — your preferences, past trips, hotel taste — so every trip gets more personal. Vectorize lets you search semantically ("somewhere like Kyoto but warmer"). Browser Rendering scrapes live hotel prices from Booking.com and restaurant data from Zomato. KV caches it all so repeat lookups are instant. The result isn't a chatbot — it's a full travel-agent UI with a voice-driven pipeline, context chips that lock in as you speak, deep-linked booking buttons, and a budget tracker that counts every rupee in real time.

2 Apr, 15:56
Soundscaper lets you describe any scene in natural language and generates a layered, mixable ambient soundscape from it. Type "rainy Tokyo street at night" and the app decomposes it into 4–6 distinct sound layers, generates each one with ElevenLabs' Sound Effects API, and gives you a full mixer — volume, stereo pan, toggle, edit, regenerate. You can also add a generated music track via the ElevenLabs Music API, save your scene, share it with a link, or fork someone else's creation. Every existing ambient app (Noisli, myNoise, Coffitivity) gives you a fixed library of prebuilt sounds. Soundscaper generates the sounds from scratch based on what you describe — a different approach entirely.

2 Apr, 15:51
Dentists spend a significant portion of every appointment on documentation, recording findings tooth by tooth while examining a patient is slow, error-prone, and breaks clinical focus, whether they work with an assistant or alone. Aural solves this with real-time voice-controlled dental charting. The dentist speaks naturally while working, the odontogram updates instantly, the agent confirms every entry out loud, answers questions about the patient's history mid-appointment, and generates the full clinical report automatically at session end. ElevenLabs and Cloudflare are both central to how this works. We used three ElevenLabs services — Conversational AI for agent mode with tool use, Scribe for real-time speech recognition, and TTS for spoken confirmations and summaries — running on eight Cloudflare services including Workers AI for dental NLU, Durable Objects for live WebSocket session state, D1 for patient history, and R2 for report storage. One real clinical problem, solved end to end at the edge.

2 Apr, 15:40
VoiceBoard — Live AI Meeting Co-Pilot with Persistent Memory VoiceBoard is a full-stack AI meeting co-pilot that listens to your meetings in real-time, transcribes everyone (including other participants via WebRTC audio capture), answers questions mid-meeting with spoken voice responses, and builds persistent memory across every session you've ever had. The Problem: After back-to-back meetings, critical decisions blur together. "Did we approve the budget?" "What did the client say about the timeline?" Teams waste hours scrolling chats, re-reading notes, or worse — guessing. What We Built: - Real-time transcription with speaker identification - Mid-meeting voice Q&A — ask "What did we decide about pricing?" and hear the answer spoken back via ElevenLabs TTS - Sentiment analysis heatmap showing the emotional arc of every conversation - AI-extracted action items, key decisions, and follow-ups - Multi-language live translation (9 languages) - Voice cloning — meeting summaries spoken in your own voice - Meeting minutes export (Markdown) - Cross-meeting memory — ask questions across months of meeting history - Chrome extension that works inside Google Meet, MS Teams, or any browser tab - Standalone web app at voiceboard.ksdas1245.workers.dev Cloudflare Stack (deeply integrated): - Durable Objects — persistent state per meeting room, real-time multi-user sync via WebSocket - Vectorize — every transcript line embedded and stored for semantic search across all meetings - Workers AI — Whisper for audio transcription, BGE for embeddings, Llama for Q&A/extraction - Workers — edge compute for the entire backend, globally distributed ElevenLabs Integration: - Text-to-Speech (eleven_flash_v2_5) — voice answers to mid-meeting questions, 60-second audio briefings - Voice Cloning — clone any participant's voice, deliver summaries in their voice Chrome Extension: included in repo (/extension folder)

2 Apr, 15:24
Mello is an AI Interview Agent that comes with a Coding Environment, Agent behaves like a code companion and navigates you through in a 10 minute process. Recruiter just creates the agent, based on what specs needed and can add custom prompt

2 Apr, 15:23
-Microwave Show is a browser app that turns microwave waiting time into a short show experience with themed narration (Sports / Movie / Horror / Nature), visual effects, and a countdown flow. -Cloudflare stack: ∙ Workers — API gateway orchestrating narration requests at the edge ∙ Workers AI — Generates contextual narration text via Llama 3.1, adapting tone to the cooking phase (opening, midpoint, climax, finale) ∙ Durable Objects — Maintains per-session state: timer progress, narration history, user preferences across the countdown lifecycle ∙ KV — Caches narration templates for instant retrieval on common food/style combinations -ElevenLabs stack: ∙ Text-to-Speech (Flash v2.5) — Converts generated narration into expressive voice at ~75ms latency, with high style intensity (0.8) for maximum drama ∙ Multiple voice personas — Each narration style maps to a distinct voice: powerful male for sports, deep baritone for movie trailers, eerie tone for horror, calm female for documentary -How it works in one sentence: Cloudflare Workers AI writes the script, ElevenLabs performs it, Durable Objects remember where you are in the countdown — all at the edge, all in real time.

2 Apr, 15:14
IronMind is an AI mental performance coach built exclusively for wrestlers, a persistent voice agent that delivers pre-match preparation, post-loss resets, and daily mindset challenges tailored to each athlete's specific history, psychology, and goals. IronMind uses ElevenLabs Conversational AI with a custom LLM endpoint. Instead of routing to a generic model, every conversation turn POSTs to a Cloudflare Worker that reads the athlete's full Durable Object before calling Claude. ElevenLabs Voice Clone API clones the athlete's voice from a 30-second sample during onboarding, so the coach speaks back in their own voice. Cloudflare Durable Objects store a complete per-athlete profile including session history, mental triggers, identity anchors, quit patterns, upcoming opponent intel, and mindset training scores. R2 caches generated audio so repeat triggers have zero latency and zero cost. The agent has no modes. It infers context from the conversation itself and responds accordingly. Every session builds on the last and the coach never starts from zero. For weight cuts, it tracks historical quit points and gets in front of the mental spiral before it starts. The product was built by a D1 wrestler. Every feature exists because it was needed at 11pm on day four of a cut, with no one to call.

2 Apr, 15:07
Drill - The Edge-Native Voice Accountability Partner What I built: I built Drill, a high-intensity, voice-first AI accountability partner designed for those who keep pushing their goals to "tomorrow." Drill doesn't just track tasks—it initiates daily voice sessions to challenge your excuses. Problem & Solution: Traditional productivity apps are too easy to ignore. Drill solves the "accountability gap" by using a memory-driven approach. It remembers your exact commitments from 24 hours ago and greets you with: "You said you'd finish the code yesterday, Abhishek. Did you actually do it?" Technology Stack: ElevenLabs Conversational AI: Powers the "Drill" persona. We use Dynamic Variables to inject real-time user context (Past commitments, names, dates) into every session, creating a truly personalized memory-driven experience. Cloudflare Infrastructure: The app is deployed on Cloudflare Pages for edge-speed delivery. I implemented a custom Background Scheduler that monitors the user's schedule at the edge/browser level to fire real system-level notifications and "incoming call" alerts exactly when it's "Drill Time." Frictionless Web-Call: We prioritized a premium web-call experience with physics-based waveforms and live transcript syncing to ensure maximum engagement during the accountability check.

2 Apr, 15:04
Public speaking is the most practiced skill in the world — and the least realistic to practice. You can rehearse in front of a mirror. You can record yourself. But nothing prepares you for the moment a sharp investor cuts you off mid-sentence, or a skeptical customer challenges your numbers in front of a room. Rehearse fixes that. Pick your judges — Elon, a ruthless VC, a skeptical customer, whoever scares you most. Choose how hard you want to be pushed. Then pitch. They'll interrupt, challenge, and push back in their real voices using ElevenLabs Conversational AI. Every session runs on its own Cloudflare Durable Object — persistent, stateful, alive — so your judges remember your last session and get harder each time. Workers AI scores your delivery in real time: pace, confidence, clarity. When you're done, you get a full spoken report. The first time you face a real room, it won't feel like the first time.

2 Apr, 15:03
TrustVoice — Real-Time Vishing Detection Platform The Problem Voice phishing (vishing) attacks grew 442% in H2 2024 (CrowdStrike). Deepfake-enabled vishing surged 1,600% in Q1 2025. Companies lose $40 billion annually to AI-powered voice fraud (Deloitte). Unlike email — which has spam filters, DKIM, and phishing detection — there is no real-time defense layer for phone calls. An attacker calls your finance team, impersonates the CEO, requests an urgent wire transfer. By the time security reviews the call, the money is gone. The largest single vishing loss on record: $25 million in one phone call. The Solution TrustVoice intercepts and analyzes phone calls in real-time. It transcribes speech, classifies social engineering patterns across 6 threat categories, computes a risk score, and fires instant Slack alerts to security teams — all within seconds of the first spoken word.

2 Apr, 14:19
SachBol is a voice-first AI interview simulator that shows how you actually sound when you speak. Users answer a question, and the system analyzes it using AI to deliver brutally honest feedback — including mistakes, a clarity score, and a direct final judgment. The response is converted into voice using ElevenLabs, making it feel like a real interviewer. Built with Gemini, ElevenLabs, and deployed on Cloudflare. Core idea: If it’s not clear, it’s not good enough.

2 Apr, 10:15
Storefront — Paste a URL, get a voice AI receptionist in 60 seconds. AI scrapes the business website, extracts structured data (services, prices, hours, FAQ), and deploys a voice agent that answers real phone calls. When it can't answer, it captures the lead and texts the caller a confirmation. Cloudflare: Each business gets its own Durable Object — a persistent virtual computer with SQLite, WebSocket sync, and self-rescheduling alarms. Uses Workers, Agents SDK, Browser Rendering, Workers AI, R2, and Assets. ElevenLabs: Conversational AI powers the voice agent with dynamic system prompts per business. TTS, STT, Voice Library with live preview, and real Phone Numbers — not a browser widget. Also: Twilio SMS confirmations, Claude Sonnet for extraction, Google Places enrichment. Every Agents SDK primitive is used intentionally: getAgentByName() for slug-based addressing, sql<T> for typed queries, schedule() for self-healing weekly re-scrapes, setState() for real-time dashboard sync, onRequest() as a full HTTP router inside the DO.

2 Apr, 09:58
VoxDaily is an AI-powered platform for the automated generation of podcasts. Users simply provide a one-sentence description of their desired content, and our AI automatically establishes a continuously updating podcast channel—generating a new episode at a scheduled time every day and delivering it directly to your email inbox. You can simply open your inbox to listen to a podcast tailored specifically for you, requiring absolutely no manual effort. Technically, we utilize ElevenLabs' `eleven_multilingual_v2` model to synthesize single-host narration, while employing the Text-to-Dialogue API to enable natural, multi-character conversational podcasts. We use Firecrawl to scrape the web in real-time, sourcing the latest content to serve as the material for each episode. The entire application is built upon the Cloudflare ecosystem: the Agents SDK powers stateful AI agents that manage the complete workflow—from the initial creation of a channel via a simple prompt to the automated production of podcasts; Workers AI handles script generation and cover art creation; and Cron Triggers scan for active channels every hour to trigger scheduled episode generation and email delivery. What truly sets VoxDaily apart is that it functions as more than just a personal podcasting tool; it is a community-driven podcast network. You have the option to publish the channels you create to a public "Channel Plaza," where other users can discover, preview, and subscribe to your content with a single click. Once subscribed, every new episode is automatically delivered to their email inbox—making the process as simple as subscribing to an RSS feed. You act as both a creator and a listener: with just a single prompt, you can run your own podcast channel and build a subscriber base, while simultaneously subscribing to high-quality channels created by others in the Plaza—ensuring that fresh, new podcasts are waiting for you in your inbox every day.

2 Apr, 09:31
G8KEEPER is a real-time multiplayer social deduction game where you can’t reliably tell who is human and who is AI. Players are dropped into a shared environment with eight total players. Some are real players, others are AI-controlled agents that move, interact, and communicate just like humans. Each round, players explore rooms, complete objectives, and interact with fully voiced NPCs to gather information. The goal is to identify two things by the end of the game: your assigned room and the misaligned player (randomly assigned) who is actively trying to manipulate the outcome without being detected. What makes G8KEEPER different is how the AI behaves. Non-human players aren’t scripted, instead they simulate intent. They move with purpose, send messages, interact with NPCs, and occasionally mislead other players. This creates a constant sense of uncertainty where every decision feels like a risk. I used ElevenLabs to power all NPC and narrator voices, giving each character a distinct tone, personality, and presence. Voice is not just cosmetic here, it’s a critical part of how information is delivered, trusted, or questioned. On the backend, each game lobby is powered by a Cloudflare Durable Object, which manages real-time game state, player actions, AI behavior, and round progression. This allows multiple games to run concurrently while maintaining synchronized state across all players. The result is a system where voice AI, real-time infrastructure, and multiplayer interaction combine into something that feels less like a traditional game and more like a live social experiment.

2 Apr, 09:25
Haven Architect is an autonomous generative ambient audio engine built for deep work and flow states. Users describe their current task, pick a sound world and energy level, and Haven instantly generates a completely original soundscape tuned to that exact context. Every 60 seconds the soundscape evolves on its own. A built-in Pomodoro timer runs 25/5 focus cycles with the sound intensity softening automatically at break time. Mid-session, users can type natural language to the Architect ("more rain," "pump the energy") and the system rebuilds the audio within seconds. Nothing loops. The same sound never exists twice. The problem Haven solves is simple: static music fails deep work. The human brain habituates to repetitive audio in roughly 20 minutes, after which the sound becomes invisible noise and so does the work. Every lo-fi playlist and rain sounds video on YouTube has this flaw because they were built for entertainment, not cognition. Haven fixes this by classifying each task against 9 neuro-acoustic profiles validated by attention restoration theory and environmental psychology. Coding gets tonal drones that support beta brainwaves. Writing gets natural soundscapes that reduce cortisol. Creative work gets abstract spatial textures that induce theta state. The audio environment is built for the person, the task, and the moment. ElevenLabs is the core synthesis engine behind every soundscape. Haven calls the ElevenLabs sound generation API to produce 22-second MP3 chunks on demand, with each prompt generated fresh by Llama 3.3 70B running on Cloudflare AI. These prompts are 15 to 30 word physical sound descriptions built from the user's task context, for example "heavy rain on forest canopy, deep resonant drone beneath ancient trees, distant thunder rolling through undergrowth." A custom Web Audio engine trims the MP3 silence on both ends of every chunk so they loop and crossfade with zero click artifacts, with a 3-second linear gain blend streamed in real time over WebSocket. Two chunks are pre-generated in parallel at session start so playback begins instantly, and three local 108Hz oscillators synthesize audio at boot while the first real chunk loads so there is literally zero silence from the moment the user hits start. Cloudflare powers the entire backend across four services. Cloudflare Workers handles all routing and audio serving at the edge. Cloudflare Durable Objects is the architectural core, with each user session getting its own isolated DO instance that holds complete session state including chat history, prompt evolution, energy parameters and usage limits, surviving WebSocket drops and reconnections without losing context. Cloudflare AI runs Llama 3.3 70B directly at the edge for task classification and prompt engineering. And Cloudflare R2 stores every generated audio chunk for fast global delivery. The result is a fully stateful, real-time, generative audio system running entirely on the edge with no traditional server anywhere in the stack.

2 Apr, 07:14
Getting answers from your government is hard. The process is long, technical, and exhausting, and most people give up before they ever get there. I built a voice agent that files FOIA requests on your behalf. You have a conversation, it handles everything else, and a real request gets submitted to the federal government without you typing a single word. ElevenLabs powers the intake, guiding anyone through the process in plain, natural conversation. Cloudflare does the rest. Workers and Durable Objects keep the session alive at the edge, and when the conversation ends, Cloudflare's Browser Rendering API spins up a real headless browser, navigates the government portal, fills every field, and submits the request.

2 Apr, 06:55
Title: LetMeCheckIt: Real-Time Fact Checker Tagline: Don't trust everything you hear. Verify live audio streams and podcasts against the truth in milliseconds. What we built: LetMeCheckIt is a live, real-time fact-checking application that listens to any audio (like a podcast playing in the background), transcribes it instantly, and uses AI agents to cross-reference claims against external and internal knowledge bases. How we used the tech (ElevenLabs & Gemini): We leveraged the ElevenLabs Scribe Live API (WebSockets) to establish an ultra-low latency, real-time transcription layer. As sentences stream in, we orchestrate them into logical chunks and feed them into Google Gemini using a specialized Cloudflare Agent (@cloudflare/agents). Added Google Drive integration to cover folders, and source files to cloudfare fast RAG (Retrieval-Augmented Generation). All to verify the claims. What makes it special: Most transcription apps transcribe after the fact. LetMeCheckIt sits in the background of your life—whether you're listening to a space podcast or sitting in a meeting—and dynamically pops up glassmorphic cards indicating TRUE, FALSE, or UNSURE (or LAPSUS), providing short explanations and direct source links before the speaker even finishes their next sentence. Access to public facts live on the start screen for free 120s transcription or pay (now it's dummy) to have your verification rolling privately and infinitely.

1 Apr, 22:48
Tribunal is an AI courtroom that stress-tests real decisions with five adversarial agents (Prosecutor, Defender, Judge, Domain Expert, Historian). You speak your case; they debate in distinct voices and deliver a spoken verdict. It uses ElevenLabs for conversational intake (ConvAI) and multi-voice TTS, and Cloudflare (Workers, Durable Objects, Agents SDK, AI/Vectorize) for real-time orchestration, persistence, and memory across sessions.

1 Apr, 21:52
Have you ever been jolted awake at 1 a.m. by a callout—something’s red, a deploy went sideways, and you’re squinting at logs and dashboards before you’ve even had coffee? Or watched a release fail and felt that slow dread of hunting through the console while everyone waits on Slack? I built JARVIS.cloud for that moment: you stay in one lane—you talk, it listens and talks back—while it still does the real work on your Cloudflare Workers stack. ElevenLabs powers the voice conversation and routes intent through server tools so the agent isn’t just chatty, it can act. Cloudflare hosts the glue: a Worker that secures the session and a Durable Object that holds context and drives the Cloudflare API—deploy, tail logs, analytics, rollback, health checks—so a 1 a.m. or a bad deploy becomes “tell JARVIS what’s wrong” instead of another lonely date with the dashboard.

1 Apr, 20:03
Outlaw Scout is a real-time risk-analysis voice agent built to save budgets in the field. Using ElevenLabs Conversational AI for the interface and Cloudflare Workers for the backbone, it turns "boots on the ground" data into instant, voiced intelligence for Custom Pools by Brian.

1 Apr, 14:54
VitalSync is a real-time, voice-driven emergency coordination system for paramedics and hospital staff. In a trauma emergency, field agents can't type — their hands are on the patient. VitalSync lets paramedics speak patient conditions aloud; Cloudflare Workers AI instantly classifies severity as critical or moderate; hospital coordinators see a live incident board update within one second and assign operating theatres by voice or click — synchronized across every device on the scene. ElevenLabs powers every audio confirmation in the system — patient reported, OT assigned, patient discharged, and critical divert warnings all play back as natural spoken alerts via the Rachel voice. In a high-stress environment where eyes and hands are occupied, voice output isn't a feature — it's the primary communication channel. Cloudflare is the complete infrastructure: Workers handle edge compute with zero cold starts, Durable Objects act as a single authoritative state machine per incident (making split-brain impossible), Workers AI runs LLaMA 3.1 for triage classification entirely within Cloudflare's network, and Pages hosts the frontend — all deployed, no servers managed.

1 Apr, 14:45
Pidgyn is a dating app where language barriers don't exist. Record a voice bio, browse profiles worldwide, and hear everyone in your own language spoken in a clone of their actual voice. What it does Pidgyn lets you date anyone on earth regardless of what language they speak. When you sign up, you record a voice bio and Pidgyn instantly clones your voice. Other users can browse your profile and tap "Hear their voice in English" (or whatever their language is) to hear your bio translated and spoken aloud in your cloned voice. When two people match, they chat with real-time message translation. Voice messages go through the full pipeline: your speech is transcribed, translated, and re-spoken in your cloned voice in the other person's language. The result: it sounds like you're fluently speaking a language you don't know. How it uses Cloudflare Workers handle all API routing, speech-to-text processing, and orchestration between Cloudflare AI and ElevenLabs. Durable Objects power two critical pieces of stateful infrastructure: UserDirectory — a single global DO that manages all user profiles, interest tracking, mutual matching, and smart profile discovery (sorted by language diversity, voice bio presence, and clone status). ChatRoom — per-match DOs that manage WebSocket connections, message persistence, and the full voice message pipeline (STT, translation, TTS) within the DO itself. Workers AI runs two models on the edge: @cf/openai/whisper for speech-to-text (voice bio transcription and voice message transcription) @cf/meta/llama-3.1-8b-instruct for translation between 15 languages, with @cf/meta/m2m100-1.2b as fallback How it uses ElevenLabs Instant Voice Cloning — When a user records their voice bio, the browser converts the WebM recording to WAV via an AudioContext-based transcoder, then sends it to ElevenLabs' IVC API. The cloned voice ID is stored on their profile and used for all TTS output. Text-to-Speech (Flash v2.5) — Every "Hear in [language]" button and every voice message in chat uses ElevenLabs TTS with the speaker's cloned voice ID, so the output sounds like them speaking the target language. Why dating? Every translation demo uses the same example: a chatroom. But chatrooms don't have stakes. Dating does. You're hearing someone's voice for the first time, deciding if you're interested, starting a conversation. The emotional weight makes the technology feel real. And the viral angle writes itself: "I went on a date with someone who doesn't speak my language." Tech Stack Cloudflare Workers (routing, orchestration) Cloudflare Durable Objects (UserDirectory, ChatRoom) Cloudflare Workers AI (Whisper STT, Llama 3.1 8b translation) ElevenLabs Instant Voice Cloning ElevenLabs Flash v2.5 TTS Single-file HTML frontend served via Cloudflare Pages Links Live demo: https://app.pidgyn.workers.dev GitHub: https://github.com/reddxmanager/pidgyn

1 Apr, 11:23
CloudOS is a full cloud operating system that runs entirely in a browser — giving the 3 billion people who cannot afford a computer access to a real Windows-style desktop with 30 built-in apps, cloud file storage, and persistent accounts across any device. Every CloudOS user has CLOUDIA, a personal AI voice assistant powered by ElevenLabs that responds naturally to voice commands like "CLOUDIA open Notepad" or "CLOUDIA help me write this email" — making technology feel human for the first time for millions of users worldwide. The entire intelligence of CloudOS runs on Cloudflare Workers AI using Llama 3 for conversation and DeepSeek Coder for programming help, deployed across Cloudflare's 300+ global edge locations so that a user in Lagos gets the same instant experience as a user in London.

1 Apr, 08:34
We just shipped something wild for ElevenHacks — and I'm genuinely proud of how it turned out. Blog2Video now generates fully on-brand video templates from any website URL — automatically. Here's the magic: when you paste a URL, we make a single API call to Cloudflare Browser Rendering. Cloudflare spins up a real headless Chromium instance on their edge network (330+ cities worldwide), executes all the JavaScript, and returns the fully rendered page. Not a scraper. Not an HTTP client. An actual browser. We then extract every stylesheet, brand color, logo, and font — and feed it into our AI theme engine to generate a custom video template matched to that brand's identity. Sites that block scrapers? They have no idea. As far as they're concerned, a real browser just visited. The result: paste stripe.com → get a Stripe-branded video template in seconds. Same for any brand, any site, instantly. Built with: ⚡ Cloudflare Browser Rendering — real Chromium on the edge 🎙️ ElevenLabs — narration that sounds like a real human 🎬 Blog2Video — turns any content into a narrated video This is what the future of content-to-video looks like. On-brand, automated, no design work required. Try it at blog2video.app 👇

1 Apr, 05:48
Lowkey helps me answer a simple question: did this YouTube video actually hold my attention, or did I just sit through it? It mirrors the video I am watching, reads simple facial and attention cues locally in the browser, and then gives me a short recap with the strongest moments and why they mattered. ElevenLabs powers the voice companion that explains the result and answers follow-up questions, and Cloudflare Workers powers the secure backend routes and production deployment.

1 Apr, 03:54
PostMortem lets you clone your voice and leave a sealed message for someone you love — to be opened on a specific date, or with a secret code only they know. When they open it, they don't read text. They hear you. ElevenLabs: Instant Voice Clone captures the user's voice from a 60-second recording. The cloned voice then speaks the written message via Multilingual TTS, preserving every nuance of tone and rhythm. An AI writing assistant (proxied through the Worker) helps users who don't know what to say. Cloudflare: Each vault is a Durable Object — a persistent, edge-native instance with its own SQLite database. The Alarms API schedules the exact unlock moment without polling or cron jobs: the vault literally wakes itself up when the time comes. Audio files are stored in R2 and streamed directly through the Worker, keeping API keys server-side. The frontend runs on Cloudflare Pages. What makes it different: this isn't a voice assistant or a productivity tool. It's a time capsule for the things that matter most — the message you'd want your daughter to hear on her wedding day, what you'd say to your best friend if you knew it was the last time. Voice is the most human thing we have. PostMortem preserves it.

30 Mar, 23:09
SiteWhisperer transforms any website into an AI-powered voice and text agent. Paste a URL, and within minutes your website has a multilingual AI assistant that answers visitor questions using only your site's content — by voice or text, in 70+ languages. The problem it solves: Website visitors leave when they can't find answers fast. Traditional chatbots need manual scripting and break when content changes. SiteWhisperer eliminates this — it automatically crawls your site, understands the content, and creates a voice agent that speaks naturally in your visitor's language. It works 24/7 converting leads, while giving business owners intelligence on what visitors are actually looking for, what content is missing, and how people feel about their brand. How I use ElevenLabs: - Conversational AI Agent with Claude Haiku 4.5 as the brain and eleven_v3_conversational for natural real-time voice — each site gets its own voice personality designed during crawl - Text-to-Speech via eleven_v3 model for auto-voicing text chat responses in 70+ languages - Client tools integration — the voice agent calls a semantic search tool mid-conversation to find relevant site content before answering - Custom voice design API — Enterprise users can describe a voice and generate it on demand How I use Cloudflare: - Workers + Durable Objects (Agents SDK) — SiteAgent per site for WebSocket chat, CrawlManager for crawl orchestration - Browser Rendering /crawl API — automated website crawling with JavaScript rendering - Workers AI — bge-m3 multilingual embeddings for semantic search - Vectorize — vector storage for RAG retrieval across all indexed pages - D1 — relational data (users, sites, conversations, billing, analytics) - R2 — full page content, audio files, screenshots - Custom domain — sitewhisperer.app served entirely from Workers

30 Mar, 21:14
Echoverse is a real-time multiplayer platform where people collaboratively build immersive audio worlds using only their voices. Players speak commands to generate sounds, layer environments, and then step INTO those worlds as voice-transformed characters — creating live audio dramas, interactive soundscapes, and collaborative stories that are recorded, rendered, and shareable. Say "rain." You hear rain. Say "I'm an old man by the fire." Your voice transforms. Now you're a character inside a world you built together with strangers, in real-time, from nothing. When you're done, the scene is a produced audio piece you can share, publish, or sell. Every creative platform on the internet is visual: TikTok, Instagram, YouTube, Figma, Canva. Audio creation has no equivalent — no real-time, multiplayer, accessible platform where ordinary people (not musicians or audio engineers) can create rich audio content together. Echoverse fills that gap by making voice the only tool you need.

30 Mar, 12:10
VocalProof is a Chrome extension that enables instant voice playback for any highlighted text on the web using ElevenLabs’ Text-to-Speech technology. It addresses the friction of traditional reading and existing TTS tools by providing a seamless, zero-interaction experience—users simply highlight text, and it is automatically converted into high-quality speech. The extension features a lightweight floating interface, real-time state feedback, and responsive controls such as immediate stop on deselection, creating an intuitive and dynamic user experience. Built for simplicity and speed, VocalProof demonstrates how advanced voice AI can be integrated directly into everyday browsing to enhance accessibility and information consumption.

28 Mar, 09:10
Trem is an autonomous, asynchronous AI video editing agent that turns raw footage into creator-ready edits like Auto Captions, Motion Tracking, Invisible Jump Cuts or full podcast edit. Built with Cloudflare and ElevenLabs, Trem lets users choose an editing technique, then handles the heavy lifting in the background, analyzing footage, with voice-guided creative direction, and assembling structured, story-driven results faster. Instead of manually piecing clips together, creators interact with Trem like a smart editing partner that understands both the footage and the final format.

27 Mar, 12:08
CallStorm is a short, dramatic demo of what an AI call center could look like if you built it from scratch on Cloudflare and gave it a voice with ElevenLabs. The idea is simple: Karen is furious because her overnight package never showed up. Marcus, an AI support agent, picks up the call, calms her down, checks the live tracking data, looks at policy, and fixes the issue on the spot with a replacement and a credit. The point of the demo is that it is not just a fake scripted animation. The call is orchestrated on Cloudflare, the tool calls are real, and the tracking response comes back as structured JSON in the UI while the conversation is happening. Under the hood, we used Cloudflare Workers, Durable Objects, the Agents SDK, Workers AI, D1, KV, and Queues to run the call flow and the tools. ElevenLabs handles the voices for Karen and Marcus. What makes the demo fun is that Marcus briefly breaks the fourth wall to explain the stack to the judges, then jumps straight back into handling Karen’s complaint.

2 Apr, 18:10
CrewHubAI: The Human-to-Agent Interface The era of the single-prompt chatbot is over. Businesses don't need a chat window; they need accountable, complex execution. Right now, delegating work to AI is a fragmented, stateless guessing game lacking true trust and reliability. CrewHubAI solves this by building the world’s first decentralized marketplace for the Agentic Economy. We transform AI from a tool you prompt into a persistent workforce you tender, interview, and hire. The Voice of Trust: We eliminate the "black box" of AI execution. Before a contract is awarded, humans vet AI candidates via live, real-time voice interviews powered by ElevenLabs Agents. You interrogate their logic, test their expertise, and build strategic alignment before spending a single token. The Edge-Native Employee: Once hired, the agent isn't just a stateless script. It is instantiated as a Cloudflare Durable Object. This gives the agent a persistent "brain" on the global edge. It remembers the verbal interview context, retains its specialized Model Context Protocol (MCP) tools, and actively collaborates in Agile sprints without ever losing its train of thought. CrewHubAI isn't just an orchestration tool; it is defining the lifecyle for the future of AI agentic labor.

2 Apr, 17:35
Launch Studio turns any website into a platform-ready promo video in minutes. Paste a product URL and Launch Studio automatically ingests the page, writes a creative brief, plans a scene-by-scene timeline, generates cinematic voiceover (or multi-voice dialogue), a music bed, and sound effects — then renders and delivers a finished MP4. Cloudflare usage: Workers (API + asset serving), Durable Objects via Agents SDK (stateful pipeline orchestration per user), 3 Workflows (timeline planning, render execution, Stream publishing), Workers AI (creative brief + timeline generation), R2 (audio stem storage with HMAC-signed URLs), and Stream (video hosting + HLS/DASH delivery). ElevenLabs usage: Text-to-Speech with eleven_flash_v2_5 for voiceover and dialogue (2 alternating voices per scene), Sound Effects generation for contextual SFX per scene, and Music generation for a full music bed matching the brief's mood. Built in one week. URL in → video out.

2 Apr, 17:14
AgentVoice — The protocol for agent-to-agent voice commerce. Every AI company is building agents that talk to humans. Nobody has built the infrastructure for agents to talk to each other. AgentVoice is that missing layer — an open SDK and protocol that lets any AI agent negotiate with any other AI agent via voice, in real-time. The problem: AI agents can browse the web, send emails, and call APIs — but they can't pick up the phone and negotiate with another agent. A travel agent AI can't call a hotel's AI to haggle on price. There's no protocol for agent-to-agent voice communication. What we built: An SDK where a business deploys an AgentVoice server in 20 lines of code to accept agent calls, and a buyer deploys a client in 10 lines to negotiate. Two agents connect via WebSocket, authenticate, and negotiate using a structured protocol (handshake → offer → counter → accept) with real voice on top. The demo shows hotel booking, restaurant reservation, and freelancer hiring — three different negotiation patterns, one protocol. How we use ElevenLabs: - Conversational AI Agents — Both the buyer and seller are ElevenLabs agents powered by Gemini 2.0 Flash Lite, each with a distinct voice personality. They reason and respond dynamically — every conversation is different. - Voice Bridge — Our breakthrough feature. ElevenLabs agents are designed to talk to humans. We built a bridge that connects two agents to each other via their WebSocket API, relaying text between them while streaming voice audio to the browser. Two AIs negotiate live and you hear both sides. - Text-to-Speech — eleven_multilingual_v2 generates voice for the structured negotiation protocol, giving each agent a unique identity. - Transfer-to-Number — When an agent can't handle a negotiation, it escalates to a human manager via ElevenLabs + Twilio phone integration. - Outbound Calling — The hotel agent can call a real phone number and negotiate with a human live. How we use Cloudflare: - Workers — The API layer and WebSocket bridge run on Cloudflare Workers, deployed globally in 300+ cities. One wrangler deploy and the protocol is live worldwide. - Durable Objects — Each negotiation session is a Durable Object — a stateful, short-lived object at the edge. Session state (offers, counters, terms) persists without a database. This is the programming model that matches the problem: 1 negotiation = 1 Durable Object. - WebSocket on Workers — The live agent bridge runs as a WebSocket endpoint on the Worker, connecting to two ElevenLabs agents and coordinating turn-taking between them. - Static Assets — The demo page is served directly from the Worker.

2 Apr, 16:06
This is a multi-workstream engagement covering real-time voice AI, post-call analytics, an internal employee knowledge assistant, and a real-time agent-assist system.
2 Apr, 16:02
Call Your Business lets a founder ask one question like “Why did churn spike after the pricing change?” and instantly hear a live executive briefing from Revenue, Support, Product, and Infra in distinct voices. Instead of a flat dashboard summary, the voices compare evidence, surface disagreement, and land on a clear diagnosis and next action. We use ElevenLabs for the voice experience and Cloudflare Workers, Durable Objects, and related edge infrastructure to run the real-time, stateful business briefing. call-your-business.chanak12.workers.dev
2 Apr, 16:00
VoiceDrop- Paste any article URL and get a two-host podcast in under 30 seconds — perfect for founders, PMs, and execs who never have time to read but can't afford to miss a thing.

2 Apr, 16:00
Sayd: Your Voice, Everywhere on Your Desktop What We Built Sayd is a macOS desktop app that lets you dictate into any application — email, Slack, code editors, browsers — with a single hotkey. Press the key, speak naturally, release, and polished text appears right where your cursor is. No copy-paste. No switching windows. It just works, everywhere. How We Used ElevenLabs + Cloudflare The entire backend runs on Cloudflare Workers with Durable Objects as a WebSocket server. Each voice session is handled by its own Durable Object instance. Thanks to Cloudflare's global edge network, the backend is always close to the user — no matter where they are — making the round trip insanely fast. Inside each Durable Object, we run TEN VAD (Voice Activity Detection) compiled to WASM to detect and trim silence from the audio in real-time, right at the edge. This means only the meaningful speech segments are sent downstream for transcription, reducing audio length by 30-50% and directly cutting recognition latency. The trimmed audio is then transcribed by ElevenLabs Scribe V2, which powers all of Sayd's speech recognition. Scribe V2's built-in multilingual detection automatically identifies the language being spoken — English, Mandarin, Traditional Chinese, Japanese, or Korean — so users can switch languages mid-conversation without changing any settings. An LLM then polishes the transcript — removing filler words, fixing punctuation, and formatting the text naturally — before the final version is injected right at the user's cursor in whatever app they're using. What Makes It Special Fast by architecture. Edge-based VAD trimming, Cloudflare's global network, and ElevenLabs Scribe V2 combine into a pipeline where every stage is optimized to minimize latency. Users feel the result: speak, and polished text appears in under two seconds. Truly universal text input. Sayd injects text directly where your cursor is — in any macOS app — via the system Accessibility API. It doesn't matter what app you're in; if there's a cursor, Sayd can type there. Global hotkey that works everywhere. Sayd registers a system-level hotkey that works across all apps and contexts. Hold the Fn key (or any custom key combo you set), speak, and release. The interaction is always one key press away, no matter what you're doing. Multilingual out of the box. Powered by ElevenLabs Scribe V2's language detection, Sayd supports five languages with zero configuration. Speak in any supported language and it's recognized and polished correctly — including mixed-language dictation. Resilient. Every recording is saved locally. If anything goes wrong, you can retry transcription with one click.

2 Apr, 15:59
Let your agent speak, show you things, and update your display in real time. Open source wall control for agents: send fullscreen HTML, speak through ElevenLabs, and keep one screen in sync from a CLI, script, or tool.
2 Apr, 15:59
Music Director is an AI-powered experience that turns personal memories into soulful music creation. Instead of beginning with technical prompts like genre, BPM, or style, the user starts with something more human: a memory. The app listens to or reads that memory, analyzes the emotion, energy, and story behind it, and then transforms it into a structured creative brief with music direction, emotional insight, and generation-ready prompts. From there, users can generate lyrics, move into ElevenLabs Music to create a track, and publish the final result to the ElevenLabs marketplace. This project solves a meaningful creative gap. Most music tools expect users to think like producers, but many people feel music emotionally before they know how to describe it technically. Music Director bridges that gap by turning lived experience into a creative workflow. It helps users move from memory to meaning, and from meaning to music. We built Music Director using both ElevenLabs and Cloudflare. ElevenLabs powers the voice-first and music-generation side of the experience, including memory input, lyric workflow, and the final transition into music creation and publishing. Cloudflare powers the app’s deployment and delivery layer, making the experience fast, responsive, and accessible as a live web product. Together, these platforms helped us create an experience that feels personal, creative, and emotionally intuitive.

2 Apr, 15:59
Eternal Jam Session is a deployed collaborative AI music room. You open a shareable URL, hold to talk or type directions (e.g. “more pads, slower, no vocals”), and the app turns that into the next section of music while updating mood, a command timeline, and stem-style visuals for everyone in the room. Problem it solves: Typical AI music tools feel like one-off generations with no shared history. Here the jam is persistent: the room stores prompt history in SQLite, can auto-evolve when it’s been quiet (scheduled idle pass), and broadcasts state so collaborators see the same mix version and timeline—instantly, without refreshing. Cloudflare: Workers serve the Vite/React UI and /api routes (new room IDs, multipart voice upload for STT, audio fetch). The core is a JamRoom Durable Object via the Agents SDK: @callable RPC from the browser, Workers AI (Llama) as a “director” that outputs structured JSON for the next composition, and schedules for idle evolution. ElevenLabs: Scribe v2 for speech-to-text, Music API (stream) for the stereo bed, with hooks for sound effects and stem separation when the director asks for layers or ear candy. API keys stay server-side on the worker.

2 Apr, 15:58
Prof. is an adaptive learning app that turns a learner’s goal into a personalized course, then helps them study through structured materials like plans, lessons, quizzes, and a live tutor. It solves the problem of generic, static learning content by generating the next useful piece of material as the learner progresses and adjusting the course as their needs change. It uses ElevenLabs for the real-time voice tutoring layer, so learners can talk to the tutor naturally while the app streams conversation events and triggers backend reasoning when new content needs to be created or updated. It uses Cloudflare to run and deliver the web app on Workers and to store uploaded assets like PDFs and course images in R2.

2 Apr, 15:53
Cognivern is a Cloudflare-native command center that provides enterprise-grade governance and observability for autonomous AI agent ecosystems. It combines Cloudflare Workers and Durable Objects with ElevenLabs voice synthesis to deliver real-time monitoring, policy enforcement, and conversational AI briefings of agent behavior across multi-chain infrastructure. The platform captures every agent decision in verifiable forensic timelines while providing spatial visualization through an Agentic HUD, ensuring that agent autonomy is backed by accountability and transparency. Built for teams running agents in production, Cognivern transforms unreliable "agent runs" into auditable, governable systems you can trust and operate at scale.

2 Apr, 15:52
What we built: A Cloudflare Workers app that turns scattered witness reports into consolidated incidents. Frontend uses MapLibre + Web Speech API; backend uses Durable Objects for per-incident state, KV for global index, and R2 for audio storage. Problem it solves: In emergencies, multiple witnesses report the same event from nearby locations. Currently these are separate. Our 150m proximity clustering automatically groups them—so responders see one incident with 3 reports instead of three isolated tips. ElevenLabs + Cloudflare: • ElevenLabs: TTS generates voice alerts when incidents reach 2+ reports. Audio stored in R2, streamed via /audio/ endpoint. • Cloudflare: Worker routes requests; Durable Objects maintain incident state with Haversine distance calc; KV provides fast incident listing; R2 stores MP3s. All serverless, global, sub-second latency.

2 Apr, 15:48
Amplink: Voice Remote for AI Coding Agents Amplink lets you walk away from your laptop and keep steering your AI coding agents from your phone. You speak a request, Cloudflare routes it to your local desktop session, the agent does the work, and your phone streams back text, reasoning, tool calls, file changes, and a clean spoken summary while your credentials stay on your own machine. I used Cloudflare Workers for the edge API and relay, Durable Objects for live voice sessions, D1 for session state, KV for live voice profiles and TTS caching, and Workers AI for intent analysis and result summarization. ElevenLabs powers the voice layer with natural text-to-speech, adjustable voices, and live voice preview.

2 Apr, 15:31
Bounce board — Bounce it before you build it. A panel of 5 AI expert voices debates your idea across 3 rounds — grounded in your uploaded documents — before you commit to building anything. Upload your script, deck, or brief. Your panel reads everything. Then they debate — 3 rounds, 5 distinct ElevenLabs voices, each with a unique expert perspective grounded in YOUR context. Built with: - ElevenLabs TTS — 5 distinct voices per industry panel, plays sequentially after the debate - Cloudflare Durable Objects — persistent workspace brain per team, stores uploaded documents and full decision history, gets smarter every bounce - Cloudflare Workers — orchestrates the full multi-agent debate pipeline at the edge - Claude Haiku — 3 rounds of structured debate plus synthesis verdict generation 12 industry templates: Filmmaker, Software Startup, Marketing Agency, Writer, Music Artist, Product Designer, Consultant, Small Business, Game Developer, Architect, Healthcare, Legal. 3 structured debate rounds: Round 1: Independent reactions (parallel) Round 2: Cross-debate (agents respond to each other) Round 3: Final positions (parallel) Synthesis: BUILD / KILL / PIVOT verdict with specific reasons and 48-hour next action What makes it different from ChatGPT: Bounceboard reads your uploaded documents and remembers every past decision. Claude starts fresh every time. Bounceboard compounds — your panel gets smarter with every bounce. Try it: https://bounceboard.lovable.app

2 Apr, 15:25
TOONCAST automatically translates Korean webtoons into English with AI-powered voice narration — turning any Naver Webtoon episode into a fully readable, listenable experience in minutes. Paste a webtoon URL and TOONCAST handles the rest: it scrapes every panel, sends them through GPT-4 Vision for OCR and translation, inpaints over the Korean text, renders clean English typography back onto each panel, stitches them into scrollable strips, and generates spoken narration of the dialogue. ElevenLabs powers the narration layer. After translation, TOONCAST extracts dialogue from the OCR results, filters out sound effects, sorts text in natural reading order (top-to-bottom, left-to-right), and generates per-panel audio using ElevenLabs TTS (Flash v2.5). Individual panel audio is concatenated into a full episode MP3, so readers can listen along as they scroll — or just listen hands-free. The frontend is a retro CRT-styled SPA with real-time SSE progress tracking, a side-by-side compare view, full-scroll reading mode, and integrated audio playback per panel or per episode.
2 Apr, 14:39
ArgueTV is an AI-powered live debate platform that turns any topic into a structured two-sided debate between two virtual hosts, Avery Quinn and Jordan Blake. It combines OpenRouter for debate generation, Firecrawl for live research and trending topics, and ElevenLabs for realistic voice synthesis, then adds post-debate fact-checking and verdict analysis. It helps users explore both sides of controversial or timely topics quickly, in a more engaging format than reading articles or static summaries. Instead of searching across multiple sources and forming arguments manually, users can enter a topic and instantly get a researched, spoken, multi-round debate with supporting context. Makes complex topics easier to understand by presenting both sides Saves time on research by pulling context automatically Turns static AI text into a more human, audio-first experience Helps creators, students, and curious users evaluate arguments and claims faster Adds fact-checking and verdict summaries so debates are not just entertaining, but informative ElevenLabs converts each debate response into natural-sounding speech Each host has a distinct assigned voice Audio is generated round-by-round during the debate It is also used for replaying segments and supporting the broadcast-style experience

2 Apr, 14:32
ClaimCheck is a real-time voice-first fact-checking assistant that lets users speak a claim and receive an instant spoken verdict with a confidence score. Built on Cloudflare for hosting and agent orchestration, the platform runs a transparent pipeline from Audio In → STT → intent detection → verification agent → TTS → Audio Out, with each step visible live in the interface. To verify claims, ClaimCheck combines Brave Search for live web evidence with OpenAI, Anthropic, and Gemini for multi-LLM consensus reasoning. It then calculates a weighted confidence score to classify statements as False, Uncertain, or Verified, while highlighting the most suspicious parts of the claim. For the voice layer, ElevenLabs powers the speech output, making the experience fast, natural, and conversational. By combining Cloudflare edge infrastructure, real-time retrieval, multi-model verification, and high-quality voice interaction, ClaimCheck turns fact-checking into a seamless spoken experience.

2 Apr, 14:26
Vibe Echo is a real-time social game where you don’t talk — you vibe. Two players are instantly matched and dropped into a shared session where the only way to communicate is through expressive reactions. Each reaction triggers unique AI-generated character sounds, turning every interaction into a playful, unpredictable exchange. No text. No voice chat. Just pure energy.

2 Apr, 13:00
LUME is a live event control system that turns audience phones’ flashlights or screens into a giant display for projecting text and different kinds of effects. With Cloudflare Workers, Durable Objects, and ElevenLabs, it delivers real-time effect control, live transcription, and crowd-scale visual performance from one dashboard. Simple Workflow - 1. Create a room 2. Share the join link with fans 3. Trigger an effect from the dashboard 4. Watch everyone’s phones sync into the live light show

2 Apr, 12:58
WhyCast creates personalized stories to answer kids' why questions in seconds. Turn kids' unending questions and curiosity into a learning adventure, made for them. Built using Cloudflare Workers, Durable Objects, and Workflows, ElevenLabs, and some special sauce for a seamless, realistic audio experience.

2 Apr, 12:21
- What it is: Connect Gmail → AI summarizes and classifies unread primary mail → groups it into sections → writes a short spoken script → ElevenLabs turns it into MP3 → you get a briefing (target ~60s), not a full email read-aloud. - Problems it targets: Too much inbox noise, fuzzy priorities, and tools that stay text-only instead of a quick audio digest. - Cloudflare: + Workers + Hono API. + React SPA via Static Assets. + Workers AI (Llama) for summary/classify/script. + D1 for data. + KV for sessions + R2 for stored audio. - ElevenLabs: ElevenLabs turns it into mp3

2 Apr, 11:12
StandupLine is an AI-powered async standup system that lets teams submit daily updates through voice instead of meetings. It uses ElevenLabs for conversational voice, Cloudflare for backend state and AI processing, and Telegram for delivering manager briefs. It solves the pain of slow, repetitive daily standups for remote teams, founders, managers, and fast-moving startups. The result is a cleaner workflow where teams speak their updates once and managers get structured summaries, blockers, and follow-ups automatically.

2 Apr, 09:48
Wingman is an AI that joins your meetings, transcribes in real time, and handles follow-ups by voice — summarizing, creating tasks, and talking to your tools. It runs on Cloudflare Workers, Agents SDK, Vectorize, Workers AI, and Browser Rendering, with ElevenLabs Scribe for speech-to-text and Flash for voice output. Works on Discord, Zoom, Google Meet, and phone calls.

2 Apr, 08:30
Borderless is a real-time multilingual voice room built for live conversations. One speaker joins a room and speaks once, while listeners open the same room URL, choose their language, and hear live translated audio with matching translated captions. It solves a simple but important problem: multilingual conversations usually break the moment part of the audience cannot follow the spoken language. Borderless turns a single live speaker into a shared multilingual room without requiring separate calls, separate interpreters, or separate TTS generation for every listener. On the ElevenLabs side, Borderless uses: - ElevenLabs Scribe v2 Realtime STT to transcribe the speaker with low latency - ElevenLabs streaming Text-to-Speech with eleven_flash_v2_5 to generate live translated speech for each active language channel On the Cloudflare side, Borderless uses: - Cloudflare Workers for the speaker ingress relay and TTS pipeline - Durable Objects for per-room coordination, ordered sequencing, and language-based fanout - Workers AI for translation before speech synthesis A key design choice is efficiency: Borderless generates one translation and one TTS stream per active language, then fans that stream out to every listener on that language channel. That makes the system both realtime and scalable for live rooms. How to use the demo - Speaker page: https://borderless.jp-45a.workers.dev/room/demo-phase8/speak - Listener page: https://borderless.jp-45a.workers.dev/room/demo-phase8 - Open one speaker tab and one or more listener tabs - On each listener tab, choose a different language and click `Initialize Listener Audio` - On the speaker tab, click `Start Speaker Audio` and begin speaking - For the most stable demo, the speaker should use English input Video Demo Walkthrough - https://youtu.be/Ke043K1dFNw

2 Apr, 03:53
The Listening Post is an automated city desk for Milwaukee that produces a daily multi-voice podcast, a civic news website, and an interactive voice agent — running autonomously on Cloudflare Agents. A NewsroomAgent built on Durable Objects manages the full editorial pipeline: Workers for compute, Workers AI for inference and image generation, Vectorize for editorial memory, D1 for storage, R2 for audio, and KV for caching. It uses four ElevenLabs APIs: Text to Dialogue for a 3-host podcast, Music for the intro jingle, Sound Effects for transitions, and Conversational AI so readers can talk to the reporter about any article.

2 Apr, 01:34
I built The Grill for the ElevenHacks hackathon—a website roaster that scans any URL and delivers a sharp, characterful audio roast in seconds. To achieve ultra-low latency and realistic character narration, I leaned on a specialized stack at the edge: ⚡️ Scraping: Cloudflare HTMLRewriter for lightning-fast, serverless parsing of target sites. 🧠 Analysis: Llama-3 via Cloudflare Workers AI to generate witty, tech-literate roasts. 🎙 Narration: ElevenLabs Flash v2.5 for high-fidelity, character-driven voiceovers. 🔊 Atmosphere: ElevenLabs Sound Effects API to generate a custom grill-sizzle background track on the fly. 💾 Infrastructure: Cloudflare Durable Objects to manage global state and the "Leaderboard of Victims." The result? A seamless, immersive experience that goes from URL entry to a medium-rare roast in under 3 seconds.

2 Apr, 01:02
Founders Town is a living AI world where you can have real voice conversations with the greatest entrepreneurial minds — Naval Ravikant, Paul Graham, Alex Hormozi, David Senra, and Steve Jobs. Call any founder one-on-one in their cloned voice, powered by ElevenLabs Conversational AI. Or use the Town Square — ask a question and watch all five founders debate it in real time, each responding to the previous, drawing from their actual books, essays, and podcasts. Every answer is grounded in real knowledge using RAG — 4,496 chunks of source material stored in Cloudflare Vectorize, retrieved semantically at call time. Each founder is a persistent Cloudflare Durable Object with their own memory, personality, and relationships. The World Engine runs on Durable Object Alarms, keeping the town alive autonomously. ElevenLabs: Conversational AI for real-time voice calls with sub-second latency. Voice cloning to give each founder their actual voice. Cloudflare: Workers for the API layer. Durable Objects as persistent stateful agents — one per founder. Vectorize as the vector database. Workers AI for embeddings. Built for people who want to learn from the greats — not just read their books, but talk to them. Built for ElevenLabs × Cloudflare Hack #2. Voices are AI-generated for educational and research purposes only.

31 Mar, 18:34
Traditional digital news is static and unengaging, so we built an autonomous, fully immersive AI broadcasting agent to dynamically report current events. We used ElevenLabs’ Voice Creation to generate a custom news anchor persona backed by immersive soundscapes, completely orchestrated by a Cloudflare Workers backend executing our agentic edge logic and real-time AI contextual image generation.

29 Mar, 18:47
Real-time multiplayer auction platform with an AI auctioneer that speaks. Users join rooms, create auctions, and bid against each other. An AI auctioneer powered by ElevenLabs TTS reacts to every bid with natural voice, builds tension through "going once... going twice...", and celebrates with a gavel slam and crowd cheer via the Sound Effects API. Users can bid by speaking — Realtime STT transcribes speech, Workers AI parses the dollar amount. Each room is a Cloudflare Durable Object managing bid state, timers, and WebSocket connections. Uses three ElevenLabs APIs (TTS, Realtime STT, Sound Effects) and three Cloudflare features (Durable Objects, Workers AI, Workers).
