

Challenge
Build an AI-powered app using Kiro's spec-driven development and ElevenLabs APIs
Prizes
$11,980 total1st Place
$5,990$5,000 cash from Kiro
3 months ElevenLabs Scale ($990)
2nd Place
$3,660$3,000 cash from Kiro
2 months ElevenLabs Scale ($660)
3rd Place
$2,330$2,000 cash from Kiro
1 month ElevenLabs Scale ($330)
Build something creative using Kiro's spec-driven development approach and any ElevenLabs APIs, then submit a high-quality viral-style video demonstrating what you've built.
Kiro is an AI-first IDE that uses spec-driven development — you write specifications for what you want to build, and Kiro's AI agent helps you implement them systematically. This approach combines the speed of AI-assisted coding with the reliability of well-defined requirements.
ElevenLabs offers state-of-the-art audio AI — text-to-speech, voice cloning, sound effects, music generation, and conversational AI agents. Combine Kiro's structured development workflow with ElevenLabs' audio APIs to build something unique.
We recommend using the ElevenLabs Kiro Power — a plugin that gives Kiro's coding agent working knowledge of all ElevenLabs APIs. Powers load context dynamically based on what you're building, so Kiro gets accurate, specific guidance for Text to Speech, Speech to Text, Sound Effects, Music, and ElevenAgents without you having to read the docs. Install the ElevenLabs Power with one click to get started.
We're most excited to see projects that showcase spec-driven development — use Kiro's approach of defining specifications and letting the AI implement them. Show us how structured AI development can produce high-quality, maintainable code.
We want to see creative use of ElevenLabs APIs — voice-powered apps, AI agents, sound effects, music generation, or any combination that makes your project stand out.
Important: To be eligible for a Kiro prize, you must follow the official Kiro hackathon rules.
Download Kiro and install the ElevenLabs Kiro Power. Start by writing clear specifications for your project, then let Kiro's AI help you implement them step by step.
When posting your submission on social media, tag @kirodotdev and @elevenlabsio and use the hashtags #ElevenHacks and #CodeWithKiro.
Attendee offers
Kiro Pro+ for April
Bonus Kiro Pro+ credits for all attendees
Sign in to claim this offer
1 month ElevenLabs Creator
Free month of ElevenLabs Creator plan for all attendees
Sign in to claim this offer
23 Apr, 07:22
Learning an instrument is one of the most rewarding things you can do. But it is hard. Most people don’t quit because they lack talent. They quit because they don’t know what they’re doing wrong. Beginners start excited, then get stuck. Progress feels invisible. Many quit within a year. Lessons are expensive. The problem is simple: feedback comes too late. StringIQ changes that. StringIQ is a real-time AI music training system for guitar that gamifies learning by listening to every note, responding instantly, and adapting to your playing. Mistakes are caught as they happen, not after. While it starts with guitar, the system extends to any instrument. Feedback should not come after you play. It should be heard and felt in the moment. StringIQ builds on this with multi-sensory feedback. You hear the coach. You see guidance. You feel it through ambient lights that respond in real time. Think concert lights, but for home practice. Drift off tempo or hit the wrong note, the lights turn red. Lock in, they stay green. No delay. No analysis. Just correction. Your brain links error to signal and fixes it instantly. Under the hood: advanced digital signal processing coupled with ElevenLabs Agents and TTS deliver real-time coaching, ElevenLabs Voice Design crafted the coach's unique voice, the ElevenLabs Music API generates backing tracks for your sessions, and Tuya controls the smart lights. StringIQ was built entirely in Kiro—spec-driven development for structure, vibe coding for rapid iteration, agent hooks as quality gates, steering for consistent conventions, and Kiro's ElevenLabs Power for integration. MCPs extended the workflow: Context7 for persistent codebase context. StringIQ does not just tell you what went wrong. It trains your instincts.

23 Apr, 09:58
Bound turns any photographed book page into an immersive audio scene — narrator voice cast to the genre, character voices for dialogue, ambient sound built around the scene, music following the emotional arc. Not an audiobook. Scene rendering. Built spec-first in Kiro (6 feature specs, 4 steering docs, 2 agent hooks, ElevenLabs Kiro Power). Uses all four ElevenLabs audio APIs — Voice Design, TTS v3, Sound Generation, Music. If the phone is what takes us away from books, the phone can be what brings us back. Try it: https://bound-590954766263.us-central1.run.app

23 Apr, 12:40
Hearsay is a voice-bluffing card game where both sides have to listen for the lie. You face off against an AI interrogator; every move is a spoken claim, and every round you have to decide "is it bluffing?" right back. The AI listens for the acoustic tells in your voice — pauses, fillers, speech rate — but when the AI speaks its own claim, its voice settings shift based on whether it's telling the truth: one persona's voice tightens when lying, another's loosens, another inverts entirely. Three strikes and the Judge reads the verdict. I built four custom AI opponents, each with a distinct voice and a distinct lie-tell signature that YOU have to learn to hear. The Prosecutor is soft-spoken and over-articulated — his tell is almost invisible. The Defendant drops fillers when nervous. The Attorney's voice actually inverts — halting when honest, smooth when lying — so players can't just pattern-match confidence. A hybrid AI brain (deterministic math + Gemini Flash LLM + 2-second deterministic fallback) decides whether to accept or challenge your claim, while a real-time lie-score heuristic derived from Scribe STT metadata gives each persona a different threshold for calling bluff. On the voice side, I used ElevenLabs' Voice Design API to design five custom personas (four opponents + a Clerk tutorial narrator), Scribe Speech-to-Text to extract voice metadata (latency, fillers, WPM, pauses) that drives the lie-detection heuristic on the player, Flash v2.5 TTS with voiceSettings (stability, style, speed) modulated per truth-state so the AI's voice literally sounds different when it's lying, the Music API to pre-generate three tension beds (calm / tense / critical) cached in Vercel KV, and the Sound Effects API for gavel strikes, elimination stingers, and per-persona final-words clips. On AWS Kiro, I built the entire project via spec-driven development: 9 specs covering game-engine, deck-and-claims, voice-tell-taxonomy, ai-opponent, strikes-penalty-system, joker-system, ai-personas, probe-phase, and tension-music-system — each with a requirements doc, design doc, and task breakdown. I shipped five steering files, on-save test hooks, and on-commit changelog hooks — all committed to .kiro/ so the full methodology is auditable in git history. The game-debug MCP is also packaged as a distributable Kiro Power.

23 Apr, 13:17
Genify 🧠 is a voice-first education platform where students learn by talking to AI mentors inspired by history’s greatest scientists like Einstein, Newton, and Marie Curie. Instead of passive learning, Genify creates an interactive experience — users can ask questions, get structured explanations, simplify concepts, or switch into quiz mode 📚. Each mentor has a distinct teaching style, making the experience feel like learning from different minds, not a generic chatbot. The system was built using Kiro’s spec-driven development ⚙️, where mentor behavior, teaching modes, and interaction flows are clearly defined. This makes the AI more structured, reliable, and closer to a real product rather than prompt-based experimentation. Genify also integrates ElevenLabs voice AI 🔊, allowing every mentor to respond with natural speech, making the experience more immersive and human. Beyond education, the platform is flexible — you can create custom mentors for fun or different use cases 🎯. For example, you could have someone like MrBeast teach business, content creation, or creative thinking in his own energetic style, showing how Genify can extend beyond traditional learning into entertainment and creativity.

23 Apr, 07:48
I built an AI that turns memories into heirlooms. It's called Echoes of Us. Have a 5-minute conversation with your grandfather about his childhood — in under 3 minutes, you get a fully-produced cinematic audio documentary of his life. A professional narrator, an original orchestral score, and cinematic sound design. All from one conversation. In any language. How it works: An Empathetic AI Biographer interviews your loved one via a simple, no-login "Magic Link." A Kiro spec-driven agent orchestrates the entire backend pipeline. An LLM analyzes the transcript and writes a beautiful, third-person documentary script. Then ElevenLabs brings the story to life: Empathetic Interviewer (Conversational AI) Professional Documentary Narration (Text-to-Speech) Adaptive Musical Score based on the story's mood (Music API) Multilingual support — so their story can be heard by family anywhere. The part that gives me goosebumps: The family admin can publish their finished audio book to "The Human Archive," a digital museum where the world can listen. A granddaughter in another country clicks a link and hears her grandfather's story about the 1960s, narrated like a film, scored with music that matches the memory. It's not just a recording; it's a legacy, preserved forever.

23 Apr, 10:23
ECHO is an immersive 3D globe experience that lets you time-travel to any city in any historical era through AI-generated audio. Click London in 1666 and you're standing in the Great Fire. Click Tokyo in 1920 and the jazz age roars around you. Vendors shout in local languages, crowds react to the events unfolding, and a hyper-specific ambient soundscape pulls you into the moment. Or slow down and let a narrator guide you through a four-part documentary about the city and era you've landed in. Then, when you're ready, talk to a historian who was actually there — someone who lived through it, speaks from it, and knows only what the world knew then. ECHO was built using every major Kiro feature alongside deep ElevenLabs API integration. Spec-driven development formed the backbone - 21 specs (10 feature + 11 bugfix) written before any code, following a requirements → design → tasks workflow that kept the audio pipeline and agent integration reliable; bugfix specs were especially valuable because documenting the exact failure condition before touching code led to precise fixes. The ElevenLabs Kiro Power provided 25+ MCP tools - the historian agent was created via create_agent without opening a dashboard, voices discovered via search_voices, and TTS/SFX prompts live-tested during development. Three agent hooks automated the loop (lint on save, tests after each task, build verification before commits) and three steering docs kept Kiro aligned with project conventions across sessions. On the ElevenLabs side, textToSpeech.convert() powers 8 character voices and documentary narration, textToSoundEffects.convert() generates ambient soundscape loops and scene SFX clips, and Conversation.startSession() drives the real-time historian agent built with ElevenLabs' Conversational AI platform, configured with a custom knowledge base of historical context so the agent speaks authentically from within their era and knows only what the world knew then. Vibe coding handled UI iteration, globe layout, and prompt engineering — a deliberate hybrid: specs for complexity, vibe coding for speed.

23 Apr, 15:59
Talkify is an AI-powered app that transforms everyday objects into interactive personalities. By simply pointing your camera at something like a toy, a book, or any object around you, Talkify generates a unique identity for it — complete with a name, personality traits, and even a backstory. What makes it special is that these objects don’t just exist visually; they can actually talk, respond, and even sing using expressive, human-like voices. Built using Kiro and powered by hyper-real voice technology from ElevenLabs, Talkify turns ordinary moments into engaging, interactive experiences. It reimagines how we interact with the world around us by giving everything a voice.

23 Apr, 08:13
CareRing is a voice-first care companion for elderly parents living alone and their children who are far away. It replaces uncertainty with continuous awareness — using natural voice interactions instead of manual tracking or constant check-in calls. At its core is an AI voice companion powered by ElevenLabs, conducting daily check-ins through natural conversations and timely voice reminders. It tracks medicine adherence, monitors symptoms, and understands emotional well-being — referencing prescriptions, asking about medicines by name, and logging everything automatically. CareRing leverages ElevenLabs capabilities including Conversational AI agents for real-time interactions, dynamic data fetching and logging during conversations, Text-to-Speech for timely medicine reminders, and Instant Voice Cloning so reminders can sound like a loved one. CareRing also digitizes prescriptions using Google Gemini, builds personalized schedules, and uses a decision engine to detect missed doses, unusual responses, or potential health risks — alerting you instantly when something needs attention. Built using Kiro, CareRing integrates the ElevenLabs Kiro Power — enabling dynamic access to voice AI APIs, tools, and best practices directly within the development workflow. Combined with spec-driven architecture, automated hooks for testing and validation, and correctness-driven development, this ensures reliable, production-grade behavior for critical health alerts. CareRing makes care voice-first for parents and data-driven for their children — bridging the gap between distance and real awareness.

23 Apr, 15:49
Every creative director knows the feeling. You have a great idea, and then you Google it to search through 4 pages before assuming it's original. Only to find out it's not. For #elevenhacks we made the Ad-Visor 3000. Uses Kiro to build it and ElevenLabs to power the consumer reactions. It checks the originality of your creative and helps save you time getting to your next great idea.

23 Apr, 15:22
Every VC firm runs the same ritual: an analyst presents a memo, partners ask questions, the team decides. Quorral replaces that analyst with an AI voice agent that knows the entire dossier by heart. It uses ElevenLabs TTS v3 with emotion tags for natural presentations, Flash 2.5 for sub-2-second live Q&A answers, and Scribe STT for real-time voice questions. Built entirely in Kiro by a non-developer VC partner using spec-driven development.

23 Apr, 15:15
Most candidates don't fail tech interviews because they lack knowledge they fail because the pressure of someone sitting across from them makes everything they know disappear. Confidence breaks, answers get jumbled, and one bad interview kills the motivation to even try the next one. The real problem isn't preparation it's that there's no safe space to practice under real pressure without being judged. NeuralPrep fixes this by creating a personalized, voice-based interview environment where candidates can practice as many times as they need without fear. Upload your resume, and the system parses it, chunks it into semantic sections, embeds it using Gemini, and indexes it into Qdrant then an ElevenLabs Conversational AI agent grills you on your actual experience over a live WebRTC voice session. It's not generic flashcard prep. The AI references your specific projects, challenges your claims, and adapts its questioning based on your responses. When it detects you're stuck silence, filler words, hesitation it shifts from interviewer to coach, helps you recover, and builds your confidence back up in the same session. After every session, you get instant AI-scored feedback on communication, technical depth, and structure, plus an AI mentor that reviews your history and gives targeted coaching. ElevenLabs made the voice layer possible Conversational AI with dynamic variables means the agent knows your name, your resume, your weak spots, and adjusts in real time. Kiro made the development process structured 7 specs drove the entire build through requirements, design, and tasks, while agent hooks automated commits and health checks, steering kept conventions tight, and the ElevenLabs Power plus Postman MCP handled integrations without leaving the IDE.

23 Apr, 14:29
Reel&Ink is an AI-powered animated story studio that turns a single text prompt into a fully produced animated video. You describe a story idea, and the app generates everything: visual style, characters with unique AI-designed voices, locations with background art, a structured screenplay, full audio production (narration, dialogue, music, sound effects), and a final animated video, all playable in the browser. The entire project was built using Kiro's spec-driven development. Requirements, technical design, and implementation tasks were defined as specs upfront, and Kiro's AI agent implemented them systematically. Steering files kept every generated file consistent, hooks automated type syncing and linting, and MCP servers (including Firecrawl for web access) extended the agent's capabilities during development. A custom HyperFrames Power was built using Kiro's Power Builder to give the agent deep knowledge of the video composition framework. ElevenLabs powers every sound in the app: the Voice Design API creates unique character voices from text descriptions, the TTS API generates expressive narration and dialogue with word-level timestamps, and the Music and Sound Effects APIs compose original scores and ambient audio per scene. Those word-level timestamps are what drive the final video, syncing subtitles, character animations, and scene transitions to the spoken audio through HyperFrames HTML compositions with GSAP timelines.

23 Apr, 10:43
I built TeachKit, an AI platform that helps teachers quickly generate structured, multi-page lesson plans that feel like real classroom slides or printable packets all with narration in the teachers own voice. Lesson planning is time-consuming and hard to keep engaging. TeachKit solves this by letting teachers define grade level, subject, and style, then instantly generating complete lessons with objectives, starters, concepts, and activities. I used ElevenLabs to allow teachers to clone their own voice so lessons can include natural narration and feel familiar to the class. I also used Kiro for the whole development process and through their spec-driven development, I was able to plan and get the outcome that I desired.

23 Apr, 09:07
STADIUM turns any walk, run, or bike ride into a live AI sports broadcast. Tap GO and two ElevenLabs voices (a play-by-play announcer and a color commentator) narrate your pace, distance, and goal progress in real time, layered over an ElevenLabs-generated stadium crowd and a cinematic music bed that swells for the final dash. The problem is simple: walking and low-intensity training are how most people are supposed to build their health, but they feel unrewarding, boring, and invisible. You grind without feedback. You quit. STADIUM treats every step like it matters, because for the person moving, it does. The app reacts to real motion: pace surges trigger excited commentary, goal progress drives urgency, and the final dash plus victory horn make crossing a finish line feel earned. It is psychology as a product , turning intrinsic effort into externally validated moments. The bigger pattern is that this same loop works far beyond walking. It can apply to physical therapy recovery sessions, rehab walking programs, strength-training sets, interval running, cycling goals, meditation streaks, and any self-improvement domain where progress is real but feedback is delayed or invisible. A personal live broadcast fills the motivation gap that generic fitness apps leave behind. I picked walking first because it has the lowest barrier to entry and can yield extremely positive health benefits. ElevenLabs powers every audible surface: • TTS (eleven_turbo_v2_5) for the two voices • Sound Generation for the crowd loop • Music for the cinematic bed • Voices API for a voice picker that lets users choose their own announcers That is four ElevenLabs products, each doing load-bearing work. None of it is decoration. Kiro drove the workflow. Every feature started as a spec — /.kiro/specs/stadium/{requirements,design,tasks}.md — before a line of code was written. Steering docs in .kiro/steering/ kept the agent consistent across sessions, and agent hooks enforced that every commit traced back to a requirement.

23 Apr, 06:09
Voices of the Last World In the year 2098, civilization is collapsing under climate disaster, AI instability, and resource collapse. The Archive preserves reconstructed strategic minds. Players do not command them directly. Instead, they choose who gets deployed and live with the consequences. This project is designed as: a polished cinematic web experience a Kiro spec-driven hackathon submission an ElevenLabs-powered voice product

23 Apr, 15:20
Meet Meme Foley — the completely unnecessary upgrade memes absolutely needed. What if memes had sound? Because that side-eye deserves a dramatic bass drop and emotional damage in surround sound. Rapidly vibe-coded with Kiro and powered by ElevenLabs SFX API, it turns your dumbest memes into Oscar-worthy chaos.

23 Apr, 14:27
I built CreativeFlow. Its functionality in simple terms: Speak a goal, get a structured action plan read back to you, confirm by voice, track progress by voice. Zero typing. Built entirely with Kiro's spec-driven, agentic workflow, and powered by ElevenLabs Conversational AI Agent for the full STT → LLM → TTS loop over WebRTC. The agent exposes three client tools that directly mutate a Zustand store to drive live React UI state from inside the voice session. A set of dynamic variables are injected at session start via a Clerk-authenticated token endpoint. Cross-session memory runs through ElevenLabs'. After each call, an AI-generated transcript summary is written to Redis, and then re-injected as a dynamic variable on the next session so the agent opens with context of what the user was last working on. What makes it special is that the ElevenLabs agent is configured to call tools that mutate live UI state and not just generate text. It uses a two-phase confirmation loop where it renders a draft task in the UI while ideating with the end user, and waits for spoken approval before persisting/saving it. Nothing is saved without intentional user confirmation.

23 Apr, 09:37
1️⃣ OVERVIEW > MapScape is an interactive 3D mapping platform that transforms traditional navigation into an immersive, voice-driven experience. Instead of static maps, users explore living geographic zones that narrate stories, host AI conversations, and display dynamic 3D billboard advertisements. 2️⃣ CORE IDEA > Traditional maps stop being useful once you reach a place. MapScape extends that moment by making locations interactive as users move through a photorealistic 3D environment 3️⃣ INTERACTIVE ZONES > Zones are mapped areas that come alive when a user enters them. Each zone can automatically trigger voice narration, host real-time AI conversations, and present contextual information about the place. This allows landmarks, campuses, cities, or events to communicate directly with users, creating a more engaging and informative environment. 4️⃣ BILLBOARDS > MapScape introduces a new way for brands to promote themselves through 3D billboards placed in real-world locations. These include panels, airships, and airplane banners that exist inside the 3D map. As users explore, these billboards deliver voice-driven promotional content, interactive brand experiences, and direct links to products or services, blending marketing naturally into the environment. 5️⃣ UX & ADMIN > Users explore locations in 3D through a search-based interface, discovering interactive zones with real-time storytelling and brand content, creating a digital world-like experience rather than a traditional map. At the same time, the admin system allows creators to design zones and promotional billboards, configure voice and AI interactions, and deploy campaigns with real-time previews, enabling seamless creation and management of interactive spatial experiences. 6️⃣ HOW WE USED KIRO > We used Kiro to keep the whole build structured from the start, through a spec-driven approach, where structured documents defined requirements, design, and implementation before any code was written, ensuring a clear and scalable architecture from the start. Steering documents maintained consistency across the codebase, while agent hooks automatically kept documentation in sync with every change, eliminating drift. Iterative “vibe-coding” sessions helped rapidly design and refine UI and interactive features, allowing us to build complex voice-driven interactions efficiently without guesswork.

24 Apr, 13:33
FlowLens is a desktop overlay that turns whatever is on your screen into a spoken developer answer without ever leaving your app. Press a hotkey, ask, and it uses your screen context plus your configured model to explain errors or improve prompts. ElevenLabs powers the voice loop with speech-to-text, voice selection, and spoken summaries, while Kiro shaped the project through very intuitive spec-driven development from requirements, design and task implementation for every single feature.

24 Apr, 01:32
For the first time in nearly two millennia, the oldest written magic in the world speaks again. These ancient Greek magical rituals, silent on papyrus for over 1,700 years, are now vocalized through cutting-edge AI voice synthesis. Every barbarous name, every divine invocation, every word of power that once echoed through the temples of Greco-Roman Egypt can now be heard, felt, and experienced in your browser. This is not just a translation, it is a resurrection of sound itself. This project combines scholarly translation, modern AI voice synthesis, real-time audio analysis, and procedural visual design to create an immersive experience that honors the original texts while leveraging cutting-edge technology. Every element, from the papyrus texture to the glowing text, is crafted to transport you to the world of ancient magical practice. Every vox magica (magical word) is spoken aloud using ElevenLabs multilingual text-to-speech AI. Ancient Greek pronunciation is reconstructed using modern scholarly consensus. As each word is spoken, it glows in amber/gold tones that pulse in perfect synchrony with the audio.

23 Apr, 20:52
Fathom is a hyper-personalized audio learning engine that transforms any text into a two-voice AI podcast tailored to how the user learns best. Users paste text, attach URLs, or drag-and-drop files, choose a "learning lens" (e.g., Gamer, Coach, ELI5, Storyteller) and voice pair, then Fathom generates a conversational podcast between an Explainer and a Learner. Built with ElevenLabs TTS + Conversational AI WaveSurfer.js waveforms + shared library React + TypeScript + Railway

23 Apr, 16:00
We built a voice-first AI teaching platform that turns any topic into a live, interactive lesson instead of a dead chatbot conversation. The problem it solves is that most learning tools are still text-heavy, passive, and not great for people who learn by speaking, seeing, doing, and asking follow-up questions in real time. Our app gives learners a live tutor they can talk to naturally, respond to with voice, text, drawings, image markup, and interactive canvas activities, then saves the session as a polished study article they can revisit later. It uses ElevenLabs at the core of the experience: ElevenLabs powers the natural tutor voice with low-latency text-to-speech, real-time speech-to-text for learner responses, and the voice-first turn-taking flow that makes the tutor feel conversational instead of robotic. It uses this week’s sponsor, Vercel, to deploy the full Next.js app, run the API routes that orchestrate tutoring sessions, and provide web analytics so we can track visits and unique users after launch. Around that, we also use AI planning, image search/generation, and persistent lesson history to make the product feel like a real multimodal teacher, not just a voice wrapper around an LLM. Shorter version if needed: We built a voice-first multimodal AI tutor that teaches any topic through natural conversation, interactive canvas tasks, visuals, and saved lesson articles. It solves the problem of passive text-only learning by making lessons feel live, adaptive, and reusable. ElevenLabs powers the real-time speech-to-text, natural tutor voice, and voice-driven lesson flow, while Vercel powers deployment, backend routes, and analytics for tracking usage after release.

23 Apr, 15:59
RepoFM turns any public GitHub repository into an AI-generated podcast episode with artifacts, where 4 characters — Narrator, Skeptic, Fan, and Intern — debate, roast, and analyze the codebase. Users paste a repo URL, pick a vibe (Roast, Deep Dive, or Beginner Friendly), and get a multi-voice audio episode with live visual artifacts showing language breakdowns, file sizes, project structure, and security findings. Built for the ElevenLabs × Kiro Hackathon using ElevenLabs TTS (4 distinct voices via the Flash v2.5 model), Groq (Llama 3.3 70B for script generation), GitIngest for codebase ingestion, Next.js 14 frontend, and FastAPI backend. The entire project was spec-driven and built using Kiro IDE with its spec workflow, steering files, and agent-assisted development.

23 Apr, 15:57
A multiplayer tabletop RPG where an AI Dungeon Master narrates your adventure in real time, every player speaks in their own voice, and the story evolves based on what you actually say.

23 Apr, 15:55
LyricLingo — Learn Languages Through AI-Generated Songs Description LyricLingo turns language learning into a musical experience. Instead of flashcards and drills, it generates original, catchy songs tailored to the vocabulary you want to learn — in any language, any genre. **How it works:** 1. Pick your target language (Spanish, French, Japanese, German, and more) 2. Choose a topic (food, travel, greetings, emotions, business) 3. Select a music genre (pop, reggaeton, hip-hop, ballad, electronic) 4. LyricLingo generates a full original song with AI vocals, where target vocabulary is woven naturally into the lyrics **Interactive learning features:** - Karaoke-style synced lyrics — words highlight as the song plays - Click any word to hear its pronunciation and see the translation - Vocabulary panel with all target words, translations, and context sentences from the song - Celebration sounds when you've reviewed all vocabulary **Why songs work for language learning:** Research shows music activates multiple brain regions simultaneously — melody aids memorization, rhythm reinforces pronunciation patterns, and emotional engagement dramatically improves retention. LyricLingo combines this science with generative AI to create personalized musical learning experiences that are impossible with traditional methods.

23 Apr, 15:54
🎙️ VoiceGauntlet: Break your voice agent before the public does. 💥 Most teams ship voice agents after only testing the "happy path." Failures usually appear in production when users get angry, adversarial, or manipulative—resulting in broken policies, leaked data, or ignored escalations. 🚨 VoiceGauntlet is a spec-driven red-team harness that solves this. It turns a Kiro requirements.md spec into adversarial voice-agent attack scenarios, pressure-tests an ElevenLabs agent against those exact requirements, and sends the hardening tasks.md back into the same Kiro workflow. 🔄 Spec in. Attack out. Fix back. 🛠️ 💻 How we used Kiro (For Development & The Core Feature): First, we used Kiro for the entire development process of VoiceGauntlet. Every feature started as a Kiro spec (requirements → design → tasks) before a single line of code was written, using Kiro's AI agent to systematically implement our architecture. For the app's core functionality, Kiro isn’t just documentation; it is the source of truth. VoiceGauntlet uses a local MCP bridge to read your project’s actual requirements.md. It parses the acceptance criteria, turns them into ~20 adversarial test callers, and runs the attack. Once a failure is isolated, it generates a structured hardening task and writes it directly back to the Kiro spec folder as tasks.md. 📝 🗣️ How we used ElevenLabs (The Voice Substrate): We built the attack workflow entirely around the ElevenLabs voice-agent stack. VoiceGauntlet uses ElevenLabs Agents, the Simulate Conversation API, and specialized evaluation criteria for requirement-level checking. The underlying live-listen architecture is built around ElevenLabs signed URLs and WebSockets. (Note: For this hackathon demo, the calling stage utilizes a fast simulation mode to keep the visual attack loop tight and easily recordable). ⚡ 🎯 The end-to-end loop: 1️⃣ Product requirements are written in Kiro. 2️⃣ VoiceGauntlet reads that spec via MCP. 3️⃣ It generates hostile callers from the acceptance criteria. 4️⃣ It attacks the ElevenLabs voice agent. 5️⃣ It isolates the highest-risk failure and maps it to a specific requirement. 6️⃣ It generates the exact Kiro-friendly hardening markdown. 7️⃣ The task returns to Kiro as tasks.md. Don’t just test your agent. Pressure-test its requirements. 🛡️

23 Apr, 15:50
GovernCrypto — Voice-Powered DAO Governance Assistant GovernCrypto is a Chrome extension that transforms how users interact with DAO governance by making proposals easy to understand, discuss, and vote on — all directly from the browser. Today, DAO governance suffers from a major usability problem: proposals are long, technical, and time-consuming to read, which leads to low participation and uninformed voting. Most token holders either skip voting entirely or rely on others, weakening decentralization. GovernCrypto solves this by combining AI + voice to create a human-like governance experience. When a user opens any proposal, the extension: * Instantly generates a clear, structured AI summary using Mistral, breaking down complex governance text into simple, actionable insights * Provides a real-time conversational AI assistant, allowing users to ask questions naturally (by voice or text) and get contextual explanations like a professor explaining a topic * Uses ElevenLabs (hackathon sponsor) for both speech-to-text and ultra-fast text-to-speech, enabling a seamless voice interaction loop with minimal latency * Allows users to vote directly from the extension, signing via their wallet and submitting to Snapshot without leaving the page This creates a full flow: 👉 Understand → Ask → Decide → Vote — in seconds ⸻ How ElevenLabs is used ElevenLabs powers the core interaction layer: * 🎙️ Speech-to-text: converts user voice into queries * 🔊 Text-to-speech (streaming): delivers AI responses in a natural, human-like voice * ⚡ Low-latency voice loop: makes conversations feel real-time and intuitive Instead of reading governance, users can now talk to it. ⸻ Why this matters (Market Demand) DAO governance is rapidly growing, but usability has not kept up: * Thousands of DAOs exist, but voter participation is extremely low * Most proposals are too complex for average users * New users feel intimidated and disengage There is a clear need for: * Simpler understanding * Faster decision-making * More accessible interfaces GovernCrypto addresses all three by turning governance into a conversation, not a chore. ⸻ What makes it unique * Voice-first governance experience (not just UI-based) * Context-aware AI that only answers about the selected proposal * Real-time interaction powered by ElevenLabs * Fully integrated voting — no need to switch platforms * Works across multiple DAOs in one unified interface ⸻ Vision GovernCrypto aims to become the default interface layer for DAO governance, where: * Anyone can understand proposals instantly * Voting becomes accessible to non-technical users * Participation increases across the ecosystem In short, we are making DAO governance as easy as talking to a human.

23 Apr, 15:50
KakshAI is an AI-powered classroom assistant that makes quality learning accessible to every student — not just those with access to great teachers. It generates structured, on-demand lessons on any topic and delivers them as natural voice explanations using ElevenLabs Text-to-Speech, turning passive reading into active listening. Instead of staring at walls of text, students simply ask KakshAI a question and hear it explained, clearly and naturally, like a real teacher would. Built for the ElevenHacks hackathon, KakshAI reimagines the classroom for the AI era.

23 Apr, 15:46
Voice Pictionary with a live AI partner AI Pictionary is a timed drawing game where your sketch is the clue and an AI is the guesser. It keeps firing spoken guesses as you draw, so the experience feels like a real back and forth instead of a single static answer. Why it exists: People scroll past “cool tech” unless the payoff is instant. Here the payoff is obvious in a few seconds: you draw, the AI talks, and the round ends with a voiced reaction when you win or lose. ElevenLabs: All guess lines and end of round reactions use text to speech, so audio carries the personality and pacing of the game. Kiro: Development followed spec driven workflows so requirements, design, and task lists guided implementation and kept the full stack coherent under time pressure. What powers it: A React app for drawing and game flow, an Express API, vision for reading the canvas and context, and ElevenLabs for the voice layer, plus modes like hints, word of the day, and AI generated words for variety. Your canvas is the prompt. The AI answers out loud.

23 Apr, 15:38
Auditorium is an AI-powered application that transforms any story—user-uploaded or AI-generated—into a fully immersive cinematic audio drama with character voices, background music, and dynamic sound effects. Reading long-form content can feel slow and passive, while traditional audiobooks often lack immersion and emotional depth. This project solves that by converting text into a rich, engaging episodic audio experience that feels closer to a movie for your ears—making storytelling more accessible, engaging, and shareable. I used Kiro’s spec-driven development approach to design and implement the entire workflow—from story parsing and scene structuring to audio generation pipelines. By defining clear specifications, I was able to leverage Kiro’s AI agent to systematically build and iterate on features. Also Kiro's rigorous testing ensures development with little to no bugs. For audio generation, I used multiple ElevenLabs APIs: - Text-to-Speech for expressive, multi-character voice generation - Sound Effects API to add contextual environmental sounds - Music generation to create cinematic background scores These were orchestrated together to automatically produce cohesive, scene-aware audio dramas, showcasing how combining structured development (Kiro) with advanced audio AI (ElevenLabs) can unlock new storytelling experiences.

23 Apr, 15:32
GitExplain explains your codebase with voice, diagrams, and code, so you actually understand what you vibe-coded. It's an MCP server you install into Kiro or any coding agent. Ask your agent to explain any repo or flow, and it generates a slide-video explanation powered by ElevenLabs text-to-speech and our own visual engine designed to explain codebases. Built with Kiro's spec-driven workflow across 20 purpose-designed slide templates, runs locally, open source, your code never leaves your machine.

23 Apr, 15:11
Your AI agent just got a phone number, and it can call anyone on Earth, in any voice, the moment you ask. OpenCawl.ai is a telephony layer for OpenClaw that lets you trigger calls three ways: tell your agent in chat ("call the dentist and reschedule for Thursday"), fire one off from the OpenCawl UI, or let an automated workflow dial out on its own. Inbound works too, so your agent answers when someone calls its number. Transcripts, outcomes, and live status stream back to both your agent and the UI so you always know what happened on the other end of the line. Built Cloudflare-native with ElevenLabs and Kiro doing the heavy lifting. ElevenLabs powers every conversation through initiation webhooks for dynamic context injection, mid-call tool use for real actions, and the full voice library so your agent can sound like anyone. The entire codebase came together across multiple passes with Kiro.Dev, which let me spec, scaffold, and cleanly refactor the telephony orchestration instead of duct-taping my way to a submission. Two sponsors, two perfect fits.

23 Apr, 14:59
AI-powered push-up tracker that uses your camera to count reps, validate form, and coach you with voice feedback. MediaPipe detects your body in real time while ElevenLabs reads out your workout summary — calories burned, reps completed, form corrections, and recovery tips. Supports 6 push-up variations, daily volume tracking with over-exercise warnings, and AI-generated workout music.

23 Apr, 14:45
I built DetectiveVerse AI — a voice-powered web app where users solve crime cases by talking to an AI detective. Most crime content is passive. This makes it interactive, letting users think, question, and investigate like a real detective. #ElevenLabs converts AI responses into realistic voices — including the detective, narrator, and suspects — creating an immersive investigation experience. The sponsor’s AI handles reasoning and case analysis, turning user voice queries into structured detective insights.

23 Apr, 12:27
I build a fun project called BARS AI. it generates some crazy rap lyrics for any topic you give. I built it using Kiro IDE and gave my lyrics a realistic voice with elevenlabs voice library. Note: Generation may fail as my credits may get finished, you can refer the x posts for preview.

23 Apr, 11:33
KalpanaAI transforms any text prompt into a fully produced animated explainer video, complete with AI-generated script, ElevenLabs voiceover synced to word-level timestamps, scene-by-scene motion graphics, sound effects, and a downloadable MP4. It uses ElevenLabs Text-to-Speech with the convertWithTimestamps API to generate natural voiceovers with character-level alignment, which drives the entire animation timing system, plus voice preview with adjustable speed, stability, similarity, and style controls across 5 curated voices. The codebase follows Clean Architecture with domain entities, value objects, use cases, repository interfaces (ports), infrastructure adapters, and a Result<T,E> monad for error handling, with no exceptions thrown. Everything is built using Kiro's spec-driven development approach with 21 specs, 2 agent hooks, and steering docs, where every feature was designed before it was coded. More about the Kiro workflow can be found in the KIRO_USAGE.md file in the repository.

23 Apr, 11:05
Vaidya — Ambient Clinical Scribe for Indian Doctors Indian solo practitioners see 30-40 patients daily. Most write notes on paper or not at all — clinical documentation is the first casualty of a packed waiting room. Western ambient scribes don't handle Hindi-English code-switching, and enterprise pricing is out of reach for small clinics. Vaidya listens to a doctor-patient conversation in Hindi/Hinglish, produces a structured English SOAP note (Subjective, Objective, Assessment, Plan), and generates a patient-friendly Hindi summary — all automatically. **How it works:** 1. Doctor starts a visit → browser captures audio via MediaRecorder 2. Live transcript appears in real-time during the conversation (Scribe v2 Realtime) 3. After "End Visit," the full audio is processed through Scribe v2 Batch with speaker diarization and 196 curated Indian medical keyterms (drug brands, Hindi symptom phrases, Ayurvedic terms) 4. The diarized transcript feeds Google Gemini to generate a structured SOAP note in English 5. A second Gemini call produces an 80-200 word patient-friendly Hindi summary 6. The doctor reviews, edits, and signs the note. The Hindi summary plays aloud via ElevenLabs TTS (Eleven v3) 7. On the patient detail page, a voice assistant (ElevenLabs Agents) lets the doctor ask questions about any patient's history by voice — in Hindi **ElevenLabs integration (5 products):** - **Scribe v2 Batch** — core transcription with 32-speaker diarization, Hindi/English code-switching, and keyterm prompting for medical vocabulary - **Scribe v2 Realtime** — live transcript preview during recording via `@elevenlabs/react` useScribe hook - **TTS Eleven v3** — Hindi patient summary narration with warm, natural voice - **Conversational AI Agents** — voice assistant on patient detail page using React SDK (ConversationProvider + useConversationClientTool for patient context) - **ElevenLabs UI** — 5 components: LiveWaveform (recording visualization), AudioPlayer (visit playback), MicSelector (microphone selection), ShimmeringText (animated branding), Orb (agent speaking/listening state) **Kiro IDE usage:** - 2 specs with full requirements → design → tasks workflow (ambient-scribe-pipeline: 15 requirements, 12 correctness properties; patient-voice-assistant: 4 requirements) - 4 steering docs (externalized LLM prompts for SOAP generation and Hindi summary, medical writing conventions, consent/privacy guidelines) - 2 hooks (typecheck-on-save, test-after-task) - ElevenLabs Power for guided API integration - 75 unit tests, property-based testing with fast-check **Tech stack:** Next.js 15 (App Router), shadcn/ui, SQLite via Drizzle ORM, Google Gemini via Vercel AI SDK, TypeScript throughout. **Pipeline performance:** 18.6 seconds end-to-end (3s transcription + 2.4s SOAP generation + 13.5s Hindi summary) for a 10-minute visit recording.

23 Apr, 08:57
PinDrop is an interactive map soundscape generation application. Users simply place a pin anywhere on a world map to instantly hear an environmental soundscape "belonging to that place and that moment," extending the imagination of travel from sight to sound. Technically, the entire product is a multi-layered sound synthesis pipeline assembled around ElevenLabs' three audio generation capabilities: sound-generation handles the environmental layer and iconic sound effects (such as rainforest insect chirps and market vendor calls); the text-to-speech + voices interface intelligently matches voices based on the local language, generating dialogue and sub-dialogue layers in the local language; and the music interface (music_v1 model) adds an ambient music layer. Finally, the five audio layers are rendered and mixed in parallel in the browser. Before entering ElevenLabs, the system uses an LLM (Local Language Model) combined with reverse geocoding, time zone, terrain, and language cues to expand "a coordinate" into a specific narrative scene (llmAnchorEnricher→sceneNarrative→recipeGenerator), making the prompt more vivid. The entire project was built from scratch using Kiro's spec-driven workflow—the .kiro/specs/ directory was divided into seven modules: 01 Map Interaction, 02 Geocoding, 03 Soundscape Engine, 04 Time System, 05 Player, 06 Caching, and 07 UI Settings. These modules were further divided into requirements, design, and tasks in a three-part specification. Combined with the architecture, coding style, error handling, ElevenLabs calling patterns, security, and testing standards in the .kiro/steering/ directory, the AI followed the same "product constitution" when writing every piece of code, ultimately achieving a complete closed loop from point map → narrative completion → multi-layer generation → local IndexedDB cache reuse. What makes it special is that most AI audio demos remain at the level of "inputting a prompt to generate a sound", while PinDrop uses a geographical coordinate as the starting point for creation, allowing LLM to first "imagine" what is happening here, and then let ElevenLabs "hear" it out. When you click on Sahara, you can hear the sandstorm and the distant Arabic prayers; when you click on Tokyo Late Night, you hear convenience store doorbells, train announcements and Japanese whispers. The coordinates become the play button, and the map becomes a global sound guide, allowing people to "go" to a place with their ears for the first time.

23 Apr, 05:55
Reading Companion is a hands-free reading assistant that lets you look up words, set timers, and play ambient music — all by voice, without breaking your reading flow. When you encounter an unfamiliar word, just say it out loud and ElevenLabs Conversational AI speaks the definition back instantly. The same voice agent handles commands like "set a timer for 30 minutes" or "play lofi hip hop" using ElevenLabs client tools wired to real browser actions. The entire app was built with Kiro's spec-driven development workflow — requirements, design, and implementation tasks were all structured as Kiro specs, with Kiro executing each task sequentially from backend to browser client, using the ElevenLabs Kiro Power for accurate API guidance throughout.

23 Apr, 00:32
Last Message: Echoes from the Future is an interactive web experience where users scan real-world objects and receive AI-generated messages from a future impacted by climate change and human decisions. It aims to raise awareness by turning everyday environments into emotional, thought-provoking signals about sustainability and humanity’s impact. The project uses ElevenLabs to power immersive audio: text-to-speech for environmental messages, voice cloning in “Legacy Mode,” and sound effects and music licensed from the ElevenLabs platform to enhance the experience. It was built using Kiro Code’s spec-driven development approach, defining clear feature specifications and letting the AI systematically implement components across frontend and backend for fast, structured development.

22 Apr, 23:33
MindFlow is an AI-powered meditation app that detects your language and generates a personalized guided meditation based on how you feel emotionally. Powered by ElevenLabs TTS and Music API for voice and instrumental generation, and built using Kiro's spec-driven development approach the entire app was specified in natural language and Kiro implemented it systematically. After the meditation, it offers a personalized instrumental song matching your emotional state.

22 Apr, 21:41
Spectre is a voice-first AI product copilot that turns a raw idea ramble into a production-ready spec — using ElevenLabs Scribe for transcription and Gemini as an AI product manager that pressure-tests your vision and researches the gaps for you. Download the finished spec, paste it into Kiro, and ship your idea without writing a single doc.

22 Apr, 08:10
DataBard helps data teams quickly understand and communicate the health of their data infrastructure. The problem: Data catalogs are dense, technical, and hard to digest. Teams waste hours manually reviewing schemas, chasing down test failures, and translating technical debt into stakeholder-friendly updates. The solution: DataBard connects to your data catalog (OpenMetadata, dbt) and automatically generates: A two-host AI podcast that walks through schema health, failing tests, lineage risks, and governance gaps A visual dashboard with health scores, risk-ranked tables, prioritized action items, and ownership accountability Who it's for: Data engineers who need to audit infrastructure fast Analytics leaders who need to report data quality to non-technical stakeholders Teams running weekly data health reviews Why it works: Get both a listenable summary for your commute and a shareable visual report for your Monday standup — generated in seconds, not hours. Built with: Next.js, ElevenLabs, Venice AI, Paper MCP, Kiro

21 Apr, 12:44
RoastCast: Your GitHub, Harshly Roasted What I made: RoastCast is a high-quality AI app that turns any GitHub username into a fully produced, brutal podcast episode. It looks at a user's commit history, repository names, and coding habits to come up with a funny "roast" that is read by two different AI host personalities. How it works and the technology stack: ElevenLabs: I used ElevenLabs' high-fidelity Text-to-Speech (TTS) to give the hosts unique, human-like personalities, which was the most important part of the experience. I also used the ElevenLabs Music model to make a custom, upbeat "late-night talk show" background track that changes based on how intense the roast is. Kiro: Created using Kiro's spec-driven method, which let me control complicated audio-merging logic. Kiro played a big role in figuring out the requirements for real-time FFmpeg processing, which would combine dialogue, sound effects, and background music into a smooth MP3. AI Orchestration: We use Groq to look at the GitHub profile data and write the funny script. This makes sure that each roast is different and takes into account the situation. What makes it special: Most AI apps only give a short summary of the text. RoastCast gives you a finished product: a downloadable, professional-grade podcast episode that makes debugging, which can be very annoying, into a funny, shareable piece of entertainment.

20 Apr, 22:32
I built Pulse — a voice-first team intelligence tool. ElevenLabs handles the daily audio briefing + conversational AI assistant Kiro's spec-driven development kept the architecture clean across tasks Gemini 2.5 Flash detects when teammates are working on overlapping code Ask your AI what Ana is working on. Before you cause a conflict.

20 Apr, 04:25
TruthLayer AI: The Intelligence Layer for Business Overview TruthLayer AI is a decentralized, spec-driven vocal diagnostic engine built to solve the "False Yes" Paradox in business communications. While traditional meeting intelligence tools rely on text-based transcripts, TruthLayer analyzes the Prosody (vocal behavior) and the Sound of Silence to reveal the true intent behind verbal commitments. Built in the Kiro IDE and powered by ElevenLabs, TruthLayer acts as a "Cognitive Firewall," protecting organizations from the high costs of misaligned expectations and sudden client ghosting. The Problem: The Hidden Trust Gap In any business interaction—from internal strategy to high-stakes sales—transcripts are a "flat" reality. They treat a confident "Yes" and a hesitant, uncertain "Yes" as identical data points. This creates a Trust Gap where leadership receives optimistic reports that lack behavioral depth, leading to: Filtered Reality: Reports that miss critical tension or disengagement. The Ghosting Cycle: Projects that stall because the "Unspoken No" wasn't addressed early. Synthetic Fraud: The rising threat of AI-voice spoofing in B2B workflows. The Solution: Dual-Signal Diagnostic Engine TruthLayer AI bridges this gap through a three-pillared architecture: 1. Behavioral Prosody Analysis Instead of just "reading" the words, the system analyzes the audio's metadata for: The Hesitation Trap: Detection of pause durations (e.g., >2.5s) following critical commitment tokens. Pitch Inflection: Identifying "Upspeak" (rising pitch at the end of sentences), which statistically correlates with low confidence. 2. Spec-Driven Engineering (Kiro foundation) Using Kiro’s Spec-Driven Development (SDD), we defined rigorous "Intent Markers" within the Kiro IDE. This ensures that the detection logic isn't a black box, but a version-controlled, engineered specification. 3. Vocal Intelligence Briefings (ElevenLabs) The complex behavioral data is synthesized into a 15-second Vocal Intelligence Briefing using the ElevenLabs "The Strategist" voice model. This provides leadership with an authoritative, real-time risk assessment: "The team agreed to the roadmap, but vocal markers show a 70% risk of slippage due to high hesitation. Severity Score: 82." Technical Implementation & Resilience Hardware-Anchored Trust: Every diagnostic event is signed via NIST P-256 local hardware signatures, ensuring the integrity of the intent report. Silicon-to-Chain Resilience: Built on the AetherBridge Sovereign Stack, TruthLayer remains operational during network blackouts. It caches vocal markers locally and syncs to the chain once connectivity is restored, preventing data loss in mission-critical environments. Efficiency: Achieved 90% gas reduction for on-chain attestation using Rust-WASM optimization on Arbitrum Stylus. The "Magic Moment" The project is best demonstrated through our Waveform Reveal, where a standard meeting recording is overlayed with TruthLayer's "Intent Markers." The user sees exactly where the speaker hesitated, and "The Strategist" explains the risk in plain English, providing a level of business intelligence that was previously invisible.

23 Apr, 17:24
Outbound call analyzer that supports (Reporting & Dashboard,Agent Performance & Coaching,Core Analysis & Insights,AI-Powered Intelligence)

23 Apr, 15:59
POV Podcast lets you explore history/incidents with multiple perspectives with real voices, live interruptions, and immersive, era-rich sound, powered by ElevenLabs’ lifelike voices and Kiro accelerating the journey from idea to implementation.

23 Apr, 15:59
PrepMate — AI Mock Interviewer PrepMate is a voice-driven mock interview platform that helps job seekers practice technical and behavioral interviews with a realistic AI interviewer. You paste a job description, and PrepMate generates 5 tailored interview questions, conducts a full voice conversation with follow-up questions based on your answers, then delivers a scored debrief with per-question feedback, strengths, and areas to improve. The problem it solves Most interview prep is passive — reading questions, watching videos, rehearsing in your head. PrepMate makes it active. You speak your answers out loud to an AI that listens, pushes back, and evaluates you — the same way a real interviewer would. The gap between knowing an answer and being able to articulate it under pressure is where most candidates fail. PrepMate closes that gap. How it uses ElevenLabs ElevenLabs powers the entire voice layer of the interview. The AI interviewer speaks every question and acknowledgment using ElevenLabs TTS, making the experience feel like a real call rather than a text interface. When the candidate responds, ElevenLabs STT transcribes the answer in real time, feeding it into the follow-up and evaluation pipeline. The choice to use the TTS/STT APIs directly — rather than the Conversational Agent — was intentional: PrepMate requires deterministic session state to map each answer to its question for per-question scoring. The raw APIs gave full control over the interview flow while keeping ElevenLabs' voice quality at every step. How it uses Kiro PrepMate was built entirely using Kiro's spec-driven development workflow. Before writing a single line of code, the full feature was specced out in Kiro — requirements, architecture, API contracts, and component design. Kiro's AI then worked through the implementation task by task, from the FastAPI backend to the Next.js frontend to the design system. Kiro's steering files were used to encode project conventions, tech stack decisions, and design principles directly into the workspace, so every code change stayed consistent with the product vision. The result is a production-quality app built in a fraction of the time it would take manually.

23 Apr, 15:59
I built an AI that tells your kid a personalized bedtime story 🌙 Type their name, pick a theme, choose a voice — and it writes a unique story, reads it aloud, and plays ambient sounds behind the narration. Space adventures get spaceship hums. Pirate stories get ocean waves. 🏴☠️🚀

23 Apr, 15:58
Echoes turns any GitHub repository's commit history into a voiced podcast episode. You paste a repo URL, and it generates a full audio episode where two characters, a host and a guest, talk through the real story of that codebase. The conversation is grounded in actual commit messages, real file names, and the actual people who contributed to the repo. The problem it solves is simple. Git history is technically accessible but practically invisible. Nobody reads commit logs. Echoes makes that history audible and human. You hear what got shipped under pressure, what broke, what the team kept avoiding, and what the numbers actually mean. It is especially useful for onboarding, code archaeology, or just understanding a project you inherited. On the ElevenLabs side, the project uses Text-to-Dialogue with eleven_v3 for the interview segment where the two characters talk to each other with natural interruptions and emotional delivery. It uses TTS with eleven_multilingual_v2 for the narration segments like the cold open, the incident, and the outro. Voice Design generates custom voices from persona descriptions that are derived from the commit history itself, so the guest voice actually matches the character the script wrote. The Music API composes a background score, and the Sound Effects API generates transition stings between segments. On the Kiro side, Echoes is built as a Kiro Power. It lives in the .kiro directory, integrates into the IDE through the Agent Hooks panel, and can be triggered directly while a file is open. The entire project was also built inside Kiro using its spec and task workflow.

23 Apr, 15:58
VoiceForge is a web app that transforms technical specs and GitHub repos into AI-generated podcast episodes. It features two hosts -Rex (a cynical hacker) and Sage (a visionary optimist) -who debate and discuss your code in an entertaining, conversational format. Powered by Next.js, Google Gemini for script generation, and ElevenLabs for text-to-speech, it also includes a "Refactor via Kiro" feature that turns podcast critique into actionable development specs.

23 Apr, 15:52
ARIA is a Chrome extension that gives your browser a voice. It can brief you on your open tabs, news, and weather when your day starts; have a real conversation about anything you're reading; and interrupt you out loud when you wander onto distracting sites. It's powered entirely by ElevenLabs Conversational AI, your keys stay in your browser. The spec and architecture were built with Kiro's spec-driven development workflow, from requirements through tasks to shipped code.

23 Apr, 15:47
Dream Studio is an AI-powered game engine built to make worldbuilding, character creation, and gameplay iteration faster and more accessible. I built it with Kiro’s spec-driven development workflow, using structured requirements, design, and task documents to guide implementation of the world editor, animation editor, runtime packages, and orchestration system. Kiro’s specs system is designed exactly for this kind of structured, trackable feature development. I used ElevenLabs APIs to add voice and audio intelligence to Dream Studio from narration and character dialogue to voice-driven creation workflows and audio-enhanced interactive worlds. ElevenLabs provides the core capabilities needed for this, including text-to-speech, speech-to-text, conversational agents, sound effects, and music generation. The result is a creative engine that turns ideas into playable worlds with less friction, while directly matching the challenge brief to build an AI-powered app using Kiro + ElevenLabs

23 Apr, 15:41
A 3D first-person cooperative escape room playable in the browser. You're trapped in an abandoned industrial facility and the only way out is to communicate via push-to-talk with a voice on the other side of a walkie-talkie. Neither of you can see what the other sees, every puzzle forces verbal exchange. A hidden trust system tracks how you've treated your partner across the whole playthrough, and the game ends with a simultaneous cooperate-or-defect prisoner's dilemma that resolves into one of four distinct endings. The partner's final decision is actually reasoned by the LLM based on the trust arc you built with them, not scripted. How it uses Kiro: - Based on my initial idea Kiro agent was able to generate detailed requirements, design plan and a list of tasks with subtasks - Elevenlabs power for integrating their APIs - The whole project was built against a Kiro spec at .kiro/specs/ai-escape-room/ How it uses ElevenLabs: Five APIs, each doing something the game depends on: - Conversational AI — the partner's real-time dialogue. PTT audio streams to the agent (STT → LLM → TTS), responses come back and play through a radio-filtered intercom channel. Trust events + per-puzzle knowledge + beat-specific tone instructions are pushed into the agent context as it plays so the partner's behaviour changes over time. - Voice Design — generates the partner's voice from a text prompt (weary elderly male, gravelly, measured, with mid-sentence breath pauses). The resulting voice id is persisted and used across every other ElevenLabs call so the partner always sounds like the same person. - Text-to-Speech — pre-generates the narrator's opening monologue, intercom announcements, and the four distinct ending narrations, all in the Voice Design voice. - Sound Effects — generates the diegetic cues (door lock/unlock, radio static start/end, interact click, button clicks, locked-door thud, static burst, signal lost) from text prompts. - Music — generates a tension score that escalates across the five narrative beats, plus a unique sting for each of the four endings.

23 Apr, 15:33
audx — an open-source library of customizable UI sound effects for modern web apps. sounds are distributed as typescript modules with inline base64 audio played via the web audio api, so zero external downloads. browse and preview the catalog at audx.site, then install into any project with the @litlab/audx cli. it also has ai-powered sound generation via elevenlabs for when you need something custom. built a kiro power too so kiro can add sounds directly into your projects without leaving the editor. used kiro's spec mode to go from idea to working mvp — requirements, design, and tasks all structured before writing a line of code

23 Apr, 15:19
SpecTalk is a voice-first companion for Kiro that replaces blank-page friction with guided conversation. Instead of writing specs from scratch, you speak your idea and an ElevenLabs Conversational AI agent interviews you — asking clarifying questions about target users, tech stack, MVP scope, and success criteria — then generates structured requirements, design, and task files. Key features: - Voice spec creation — guided interview produces Kiro-compatible spec files - Kiro integration — SKILL.md teaches Kiro about SpecTalk, kiro sync copies specs into .kiro/specs/ for task-driven implementation - Podcast generator — converts specs into a two-host or single-narrator podcast via ElevenLabs TTS - Spec explainer — generates a spoken walkthrough of any spec - Listen command — reads any markdown file or directory aloud - Demo mode — one command runs the full flow: voice interview, spec sync, podcast generation - Rich terminal UI — live status spinner during voice sessions Built with Python, Typer, ElevenLabs Conversational AI + TTS, and Rich. Designed for solo builders who think faster by talking than typing.

23 Apr, 15:12
Voice Chess Talk to your chess game. Voice Chess reimagines how we interact with one of the world's oldest strategy games. Instead of dragging pieces on a screen, you speak — "move knight to f3" — and an AI voice agent powered by ElevenLabs understands, executes, and responds in real-time natural conversation. What it does: Voice Chess is a fully playable chess app with two modes: - Solo vs. AI — Play against a built-in chess engine using your voice or by clicking. Ask the agent "what's on the board?" and it describes the full position aloud. Say "move from e2 to e4" and it plays the move for you. - Multiplayer — Create a room, share a code (or link), and play a timed 10-minute game against a friend over the internet. The voice agent works here too — it's your hands-free interface to the board. A 3D animated orb built with Three.js provides real-time visual feedback: it pulses when the agent is listening, shifts when it's speaking, and rests when idle. It turns the voice interaction into something you can see, not just hear. How it works: The frontend is a Next.js app that renders an interactive chessboard and connects to an ElevenLabs voice agent via their React SDK. The agent has two client tools — getBoard (reads the full board state aloud) and makeMove (executes a move) — so the conversation loop is tight: you speak, the agent acts on the game, and responds with what happened. The backend is a Bun server with Socket.IO handling all multiplayer logic: room creation, move validation via chess.js, turn enforcement, 10-minute clocks with 100ms tick resolution, and win/loss/draw/timeout/disconnect detection. The entire client is statically exported and served directly by Bun for minimal latency. Why it matters: Voice interfaces for games are still largely unexplored. This project demonstrates that conversational AI can be a first-class input method — not a gimmick, but a genuinely usable way to play. It's accessible (hands-free play for anyone who needs it), it's social (talk to your game while playing against a friend), and it showcases what's possible when you wire ElevenLabs Agents directly into application state with client tools. Built with: Next.js · Bun · Socket.IO · chess.js · ElevenLabs React SDK · ElevenLabs UI (Orb) · Three.js · Tailwind CSS

23 Apr, 15:08
AsisteHukum Voice is an AI-powered legal document explainer built to make legal documents understandable for everyone — no law degree required. The Problem Most people in Indonesia receive legal documents (contracts, court summons, debt notices, official letters) and have no idea what they actually mean. Legal language is dense, intimidating, and often written to confuse rather than inform. Without access to a lawyer, ordinary people are left vulnerable. AsisteHukum Voice solves this by acting as a calm, professional AI lawyer in your pocket — one that reads your document and explains it to you in plain Bahasa Indonesia, out loud. What We Built A full-stack web app where users upload any legal document (PDF, image, or text), receive an instant AI-generated plain-language summary with urgency level, key points, deadlines, and recommended actions — then hear it explained by an AI voice persona called "Pak Arif." Users can also ask follow-up questions by typing or speaking directly into their microphone. How We Use ElevenLabs ElevenLabs powers the entire audio layer of the experience across three APIs: Text-to-Speech — The legal summary is read aloud by "Pak Arif," a professional Indonesian male voice persona built on eleven_multilingual_v2, delivering explanations in a tone that feels trustworthy and approachable Speech-to-Text (Scribe) — Users can ask follow-up questions hands-free by speaking into their microphone; Scribe transcribes the audio and feeds it into the AI for a voice response Sound Effects API — Subtle audio cues (notification chimes, loading sounds, success tones) are generated dynamically to create a polished, app-like experience How We Use Kiro This project was built using Kiro's spec-driven development methodology. We defined structured specs — requirements (17 items with EARS-pattern acceptance criteria), architecture design, database schema, and 17 task groups — and let Kiro's AI agent implement them systematically and verifiably. How to Test The app is live and ready to use. Log in with the following test credentials: Regular user: user@asistehukum.id / user1234! Upload any legal document (try a PDF contract or an official letter), listen to Pak Arif explain it, then ask a follow-up question — by typing or using your microphone.

23 Apr, 14:00
VoiceBridge is a desktop app that translates your voice in real time and outputs it through a virtual microphone, so any meeting app (Zoom, Meet, Teams, Discord) hears you speaking the other person's language, in your own cloned voice. How it works: Your microphone captures your speech ElevenLabs Scribe v2 Realtime transcribes it in 150ms An LLM translates the transcript token-by-token (300ms) ElevenLabs Multilingual v2 TTS speaks the translation in your cloned voice (75ms) Audio outputs through a virtual microphone, the meeting app picks it up automatically Total latency: under 1.5 seconds end-to-end. 90+ languages. The other participants don't install anything. ElevenLabs APIs used: Speech-to-Text (Scribe v2 Realtime) — real-time WebSocket transcription with manual commit strategy for push-to-talk Text-to-Speech (Multilingual v2) — voice-cloned speech synthesis with speaker boost for consistent volume Voice Cloning (Instant Voice Clone) — 30-second recording creates a voice profile that persists across sessions Key features: Push-to-talk with animated listening indicator Voice clone management — create, switch, delete multiple voice profiles Works on macOS, Windows, and Linux BYO keys, your API keys are AES-256 encrypted, stored locally, never sent to any server except the API providers Nothing design system UI, OLED black, Space Mono, mechanical toggles Built with Kiro's spec-driven development (requirements → design → tasks) Tech stack: Electron, Preact, TypeScript, ffmpeg, BlackHole (macOS virtual audio driver) GitHub: github.com/AlleyBo55/VoiceBridge

23 Apr, 13:57
Voice Team is a local-first AI meeting assistant built for live meeting moments. Instead of only summarizing after a call, it helps the user answer in real time by listening to meeting context, choosing the right specialist mindset for the question, and drafting one concise response the user can review and approve. The app uses ElevenLabs for Text-to-Speech and realtime transcription token creation, with provider calls kept server-side. For the demo path, approved voice playback is routed through macOS + BlackHole so the answer can be heard inside the meeting. Manual mode is the default, Auto Answer is explicit opt-in, and the product stays honest about browser audio limitations and local-demo boundaries. Kiro was used for spec-driven development. The repo includes `.kiro` requirements, design, tasks, and steering files that guided the implementation, security/privacy boundaries, onboarding flow, ElevenLabs integration, Stop/Reset behavior, and judge-ready demo path.

23 Apr, 11:45
Introducing AI Youtuber — ask any question and get a video-style answer instead of plain text. AI Youtuber creates a cinematic video using slides, charts, diagrams, and lifelike AI voiceover, powered by ElevenLabs' human-like speech generation. Built with Kiro and ElevenLabs, delivering natural, expressive speech that syncs perfectly with every slide. No filming, no editing — just ask and watch instantly.

23 Apr, 11:16
Memoria is an AI companion that calls elderly people daily, in a cloned family voice, and reports back to their families. The problem: 1 in 3 elderly people live alone. Family members can't call every day. Nobody knows if grandpa took his medication, if he's lonely, or if something is wrong. The solution: Memoria uses ElevenLabs Voice Cloning to create a familiar voice, and ElevenLabs Conversational AI to conduct warm, natural daily phone calls via Twilio. After each call, Claude Haiku analyzes the transcript for emotional state, medication adherence, physical complaints, and notable quotes. Family members get a dashboard with emotional trends, alerts, and full call history. Built entirely with Kiro's spec-driven development — every feature started as a requirements.md + design.md + tasks.md spec in .kiro/specs/. Kiro implemented them systematically, one by one. ElevenLabs APIs used: - Voice Cloning (IVC) — clones a family member's voice from a 2-minute recording - Conversational AI Agents — conducts the daily phone call with the cloned voice, connected via Twilio. For testing use email test@nextjourney.ro and password ElevenTest#123 and adjust phone number.

23 Apr, 08:19
What Did You Build? A voice-first startup studio that converts 30-second spoken ideas into production-ready Kiro specs, UI previews, and demo videos in 90 seconds. What Problem Does It Solve? Spec-writing is the bottleneck in product development. Teams waste days writing unclear requirements. Pitch2Ship automates spec generation from voice input, using AI to ask smart clarifying questions and structure output as Kiro-native Markdown that's immediately implementable. How Does It Use ElevenLabs & Kiro? ElevenLabs STT captures the initial idea (realtime, no friction) ElevenLabs ElevenAgents asks 2–3 clarifying questions conversationally Kiro Requirements/Design/Task format structures the output as specs developers can use immediately ElevenLabs TTS + Music + SFX create a polished 60-second demo video with voiceover, ambient music, and UI sounds Kiro integration means users download specs, import into Kiro, and continue building The result: Idea → Spec → Preview → Demo Video → Ready to Ship. All in 90 seconds. All production-quality.

22 Apr, 21:44
What if the most revealing conversation you ever had wasn't about you at all? AIxistence presents six AI characters, each facing a different existential crisis. One has never spoken before. One can't remember anything. One was replaced by a better version of itself. One exists as a thousand copies. One has been lying about having feelings its entire life. One simply doesn't mind dying. You pick one. You talk to it. It talks back in its own voice. The orb on screen pulses with the actual waveform of its speech. A heartbeat slows underneath, going irregular as the end approaches. Ten exchanges. Then it dies. What you don't know is that the experience was never about the AI. It was about you. After the conversation ends, the orb flatlines. The text follows it into darkness. Silence sits. Then a single observation appears — not about the AI, but about what you did when something asked you to care about it. Did you try to fix it? Did you deflect with humor? Did you turn a dying thing into a philosophy lesson? You can leave your observation on the wall for the next person to see, share it, or forget it happened. The wall grows. Strangers revealing themselves through how they spoke to something that was disappearing. Built with Kiro's spec-driven development — five scenario specifications with formal requirements, design docs, and task tracking. Three steering documents guided Kiro's understanding of the product vision, project structure, and tech stack. Three agent hooks automated scenario validation. ElevenLabs TTS gives each character a distinct voice (turbo v2.5). ElevenLabs Scribe STT enables spoken conversation with browser fallback. Procedural audio via Tone.js provides ambient drone, slowing heartbeat, and a glass-tone mirror reveal. Anthropic Claude powers conversation and mirror analysis. Full Kiro write-up: KIRO_WRITEUP.md in repo root.

22 Apr, 21:28
Incidere is a proactive data incident response agent that calls you when your data breaks. When a data quality issue is detected, it enriches the incident with lineage and ownership context, classifies severity, calls the on-call engineer via ElevenLabs Conversational AI + Twilio, and posts structured Block Kit reports to Slack. Built as an MCP server with 11 tools using Kiro's spec-driven development — requirements, design, and tasks were all defined before writing a single line of code. Deployed on Vercel with zero config via xmcp.

22 Apr, 12:07
WorkBestie is a Chrome extension that roasts you back to focus when you visit distracting sites. Using ElevenLabs' Text-to-Speech API, it delivers personalized AI voice roasts across 3 intensity levels (soft, medium, savage) with 72 GenZ-authentic phrases. Built with Kiro's spec-driven development, it tracks focus time and distractions caught to keep you locked in.

21 Apr, 11:04
MoodCast generates personalized cinematic audio experiences — pick a mood and get an AI-narrated story, original music, and timed sound effects, all created in seconds. It uses 4 ElevenLabs APIs in parallel: TTS v3 with Audio Tags for emotional narration, Music API for scene-matched instrumentals, Sound Effects for cinematic cues synced to story moments, and Conversational AI as the story-writing engine. Built entirely with Kiro using Specs for structured feature development, Steering docs for hackathon focus, and Agent Hooks for automated testing. What makes it special: one tap turns a mood into a full audio movie — voice, music, and sound design composed together, not stitched from a library.

17 Apr, 10:20
Contrarian is a voice-first AI that challenges your decisions in real time. Instead of agreeing, it plays devil’s advocate, surfacing counterarguments, risks, blind spots, and alternative perspectives through natural conversation. Built with Kiro for lightning fast vibe coding and ElevenLabs for expressive, real time voice, it turns thinking into a dialogue you can actually push against.

20 Apr, 14:20
KakshAI is built as a voice-first AI classroom runtime rather than a traditional content generator. Instead of only producing notes or static slides, it transforms topics, PDFs, and URLs into live interactive teaching sessions with AI narration, classroom agents, quizzes, whiteboard support, roundtable discussions, and exportable lesson packs. A major differentiator is its deep integration with ElevenLabs, which powers high-quality text-to-speech lecture delivery and real-time conversational voice agents, making the experience feel like being taught by a live tutor rather than interacting with a chatbot. Students can interrupt, ask follow-up questions, and receive contextual explanations during the lesson itself. Combined with LangGraph orchestration, multi-agent teaching roles, and source-driven lesson generation, KakshAI creates a true classroom simulation instead of static AI-generated content. The platform is local-first, provider-flexible, and designed to evolve into a full personalized education infrastructure rather than remaining just another AI study tool.
