Catxi

1,600 points · 8 submissions

Submissions

with Cursor

+200

Hands covered in grease—yet you still have to tap the screen? Busy stir-frying, only to constantly pause just to check the next step? Enter CookTalk: a voice-first AI kitchen assistant. You speak; it executes. We developed the entire application using the Cursor platform, leveraging its AI-assisted coding capabilities to efficiently construct a full-stack architecture built on TanStack Start and React 19. We also deeply integrated the ElevenLabs API—not only for generating speech for recipe searches and cooking conversations but also to enable voice library browsing, voice cloning management, and the ability to bind specific voices to recipes for step-by-step read-aloud guidance. This ensures that for every dish, you can be guided through the cooking process using a voice you truly love. Simultaneously, we utilize OpenAI-compatible interfaces to handle tasks such as converting videos into structured recipes, facilitating AI-driven recipe conversations, and generating cover images. Video and audio extraction are processed locally within the browser using ffmpeg.wasm, and all data is encrypted via AES-GCM before being stored in IndexedDB—adopting a "local-first" approach throughout the entire workflow that requires no backend server. What makes CookTalk truly unique is how it takes the concept of "keeping your hands free in the kitchen" to the absolute extreme: featuring parallel timers, an always-on screen, and wake-word activation, users can navigate the entire journey—from searching for a recipe to completing a dish—without ever having to touch the screen. It truly elevates voice from a mere "assistive tool" to the "primary mode of interaction," allowing you to shift your full attention back to the pot on the stove.

Repo Demo

X LinkedIn

Submitted 14 May 2026

Hack #7: v0

with v0

+200

We used v0 to give the open-source cheat sheet QuickRef.ME a complete visual redesign, transforming the traditional cold, hard document list into a developer knowledge note wall with a "kraft paper + handwritten notes + stamp" style, making the tedious task of looking up commands and reading syntax feel like flipping through a notebook for taking notes. Technically, we used v0 to generate and iterate all React components and the Tailwind visual system (card rotation, paper texture, tape and pins, hover lift, detail page expansion animation, bilingual i18n switching, recent browsing history, and site-wide search) from scratch. We also used v0 to seamlessly integrate the Next.js 16 / React 19 project structure. ElevenLabs was responsible for "bringing the paper to life"—using the Sound Generation API (/v1/sound-generation) to generate paper sound effects such as page turning and tearing in real time based on prompts, and using the Text-to-Speech API (eleven_multilingual_v2 multilingual model + Rachel voice) to allow the note assistant in the lower right corner to read commands aloud in both Chinese and English. Combined with browser voice input, this creates a natural language query loop of "say a sentence, search for a command, and listen to it read it to you." What makes it special is that it is a "skin-changing" re-creation of a tool site (quickref.me) that global developers use every day—using generative AI to repackage the most familiar and rational content of engineers into a warm, tactile, and even sound physical desktop experience, proving that v0 + ElevenLabs can upgrade a plain text tool site into a multimodal product with emotional memory points.

Repo Demo

Hack #6: Zed

with Zed

+200

Rhythm Grid — Transforming the minimalist black-and-white aesthetic of the 2010s into a portal leading to infinite soundscapes. Do you remember that sense of pure tension? The screen scrolling endlessly downward, your fingertips hovering between black and white—where a single touch could be a matter of life and death. We have preserved the soul of that minimalism: no flashy lanes, just the binary aesthetic of right and wrong; a tempo that accelerates with your heartbeat; and the unforgiving arcade brutality where a single mistake spells the end. Yet, we have transformed every successful tap into a creative brushstroke. When the final black tile vanishes beneath your fingertips, what plays is no longer a pre-programmed MIDI loop, but an original AI composition generated in real-time via the ElevenLabs Music API (music_v1). Your combo count, your speed tier, and your chosen genre tags are assembled in real-time into a prompt injected directly into a neural network. "Easy" mode might conjure flowing Lo-fi piano melodies, while the dizzying descent speed of "Expert" mode unlocks a completely reconstructed Drum & Bass track or a symphonic variation. The same level, three distinct universes—and absolutely zero canned music. Built on a tech stack of React, TypeScript, and Vite (developed using Zed), the game features three built-in starter tracks. No API key is required—allowing you to dive back into that "purely for the high score" golden era in under 10 seconds. But this time, every S-rank playthrough expands a personal music library that never repeats itself. The black-and-white aesthetic serves as the nostalgic shell, while AI generation provides the futuristic core.

Repo Demo

Hack #5: Kiro

with AWS Kiro

+200

PinDrop is an interactive map soundscape generation application. Users simply place a pin anywhere on a world map to instantly hear an environmental soundscape "belonging to that place and that moment," extending the imagination of travel from sight to sound. Technically, the entire product is a multi-layered sound synthesis pipeline assembled around ElevenLabs' three audio generation capabilities: sound-generation handles the environmental layer and iconic sound effects (such as rainforest insect chirps and market vendor calls); the text-to-speech + voices interface intelligently matches voices based on the local language, generating dialogue and sub-dialogue layers in the local language; and the music interface (music_v1 model) adds an ambient music layer. Finally, the five audio layers are rendered and mixed in parallel in the browser. Before entering ElevenLabs, the system uses an LLM (Local Language Model) combined with reverse geocoding, time zone, terrain, and language cues to expand "a coordinate" into a specific narrative scene (llmAnchorEnricher→sceneNarrative→recipeGenerator), making the prompt more vivid. The entire project was built from scratch using Kiro's spec-driven workflow—the .kiro/specs/ directory was divided into seven modules: 01 Map Interaction, 02 Geocoding, 03 Soundscape Engine, 04 Time System, 05 Player, 06 Caching, and 07 UI Settings. These modules were further divided into requirements, design, and tasks in a three-part specification. Combined with the architecture, coding style, error handling, ElevenLabs calling patterns, security, and testing standards in the .kiro/steering/ directory, the AI followed the same "product constitution" when writing every piece of code, ultimately achieving a complete closed loop from point map → narrative completion → multi-layer generation → local IndexedDB cache reuse. What makes it special is that most AI audio demos remain at the level of "inputting a prompt to generate a sound", while PinDrop uses a geographical coordinate as the starting point for creation, allowing LLM to first "imagine" what is happening here, and then let ElevenLabs "hear" it out. When you click on Sahara, you can hear the sandstorm and the distant Arabic prayers; when you click on Tokyo Late Night, you hear convenience store doorbells, train announcements and Japanese whispers. The coordinates become the play button, and the map becomes a global sound guide, allowing people to "go" to a place with their ears for the first time.

Repo

Hack #4: turbopuffer

with turbopuffer

+200

Echoverse is a web-based interactive AI audio narrative engine. Users simply input a story premise to receive an immersive audio experience including narration, sound effects, and background music, with real-time choices driving the plot. The project uses ElevenLabs as the "voice of the world"—the TTS API generates narration, the Sound Effects API generates scene sound effects, and the Music API generates adaptive background music. These three layers of audio are mixed and played in real-time via the Web Audio API. Turbopuffer is used as the "memory of the world"—storing world elements, player decisions, and profiles in vector form for RAG retrieval to drive narrative generation. It also performs semantic vectorization caching of generated sound effects and background music, allowing them to be reused directly when the similarity between a new request and an existing asset exceeds a threshold, without repeatedly calling the generation API. This semantic caching mechanism is the project's core innovation: it creates a cost flywheel between ElevenLabs and turbopuffer—the more stories, the richer the cache, the fewer API calls, the faster the response, and the lower the cost. At the end of a single story, the cache hit rate can climb from approximately 10% initially to 40-50%. All user data and API keys are stored locally in the browser (localStorage + IndexedDB), with zero server-side persistence, prioritizing privacy.

Repo Demo

Hack #3: Replit

with Replit

+200

PodChat transforms podcasts from "one-way listening" to "two-way dialogue"—upload an audio episode, and the system automatically identifies each speaker, clones each person's voice, builds a knowledge base, and generates a whole group of AI host avatars. In monologue podcasts, you can have one-on-one voice conversations with the host at any time; in multi-person podcasts, you can enter group chat mode, ask questions to all members or @ a specific guest for follow-up questions, and each AI character responds to you with their own realistic voice. The product is almost entirely built on ElevenLabs: Speaker Diarization separates speakers, Instant Voice Cloning clones each character, Voice Design API automatically analyzes each person's emotional style based on scripts to generate tuning parameters, Conversational AI WebSocket supports real-time voice calls, and Text-to-Speech drives quick summary reading with emotion switching—one API key runs the entire chain from recognition and cloning to dialogue; the application is deployed on Replit, requiring zero registration, and all data is stored locally in the browser; Firecrawl supplements the knowledge base by crawling web content mentioned in podcasts. Existing podcast AI tools are limited to transcription and summarization, while PodChat allows you to chat with the "host" in person anytime, transforming one-way content consumption into a multi-role, interactive experience with follow-up questions.

Repo Demo

Hack #2: Cloudflare

with Cloudflare

+200

VoxDaily is an AI-powered platform for the automated generation of podcasts. Users simply provide a one-sentence description of their desired content, and our AI automatically establishes a continuously updating podcast channel—generating a new episode at a scheduled time every day and delivering it directly to your email inbox. You can simply open your inbox to listen to a podcast tailored specifically for you, requiring absolutely no manual effort. Technically, we utilize ElevenLabs' `eleven_multilingual_v2` model to synthesize single-host narration, while employing the Text-to-Dialogue API to enable natural, multi-character conversational podcasts. We use Firecrawl to scrape the web in real-time, sourcing the latest content to serve as the material for each episode. The entire application is built upon the Cloudflare ecosystem: the Agents SDK powers stateful AI agents that manage the complete workflow—from the initial creation of a channel via a simple prompt to the automated production of podcasts; Workers AI handles script generation and cover art creation; and Cron Triggers scan for active channels every hour to trigger scheduled episode generation and email delivery. What truly sets VoxDaily apart is that it functions as more than just a personal podcasting tool; it is a community-driven podcast network. You have the option to publish the channels you create to a public "Channel Plaza," where other users can discover, preview, and subscribe to your content with a single click. Once subscribed, every new episode is automatically delivered to their email inbox—making the process as simple as subscribing to an RSS feed. You act as both a creator and a listener: with just a single prompt, you can run your own podcast channel and build a subscriber base, while simultaneously subscribing to high-quality channels created by others in the Plaza—ensuring that fresh, new podcasts are waiting for you in your inbox every day.

Repo Demo

Hack #1: Firecrawl

with Firecrawl

+200

PodCraft – Creating sound with sound, AI podcasting has never been easier. Simply speak, and PodCraft will handle everything for you: gathering materials, writing scripts, customizing voices, and creating a complete podcast episode. Fully voice-interactive, freeing your hands and allowing creation to return to its most natural state. Powered by ElevenLabs for an ultimate voice experience, and Firecrawl for intelligent content search, AI collaborates with you to create every podcast episode with heart.

Repo Demo