2,550 points · 8 submissions
with Cursor
Super Nova It is an AI companion you operate entirely with your voice. Tap the orb, say what you want — "play lofi", "weather in Tokyo", "directions to Times Square", "today's NASA picture" — and answers appear as living widgets while Vee, an ElevenLabs voice, speaks the reply. Zero keyboard. Zero clicks. https://voice-companion-rust.vercel.app
Submitted 14 May 2026
with v0
spotify.trm - A terminalcore redesign of Spotify with an AI radio DJ. I cloned Spotify's product shape (library, search, queue, player) in v0 and reimagined every pixel as a terminal: monospace fonts, green-on-black palette, command logs, splash gate, and a real CLI tab where you can type play weeknd, queue lofi, or dj on and it just works. Same mental model people already know — re-skinned for keyboard-first users. Then I added a voice layer with ElevenLabs. The app has four AI DJ hosts — Nova, Velvet, Pulse, Kai — each mapped to a distinct ElevenLabs voiceId plus a per-host delivery string for v3-style acting control. When a track is about to end, Groq's llama-3.1-8b-instant reads the next track's metadata and writes a one-line intro in that host's persona. ElevenLabs speaks it server-side via /api/tts. The next iTunes preview starts quietly under the voice-over so handoffs feel like a real broadcast instead of dead air. Same queue, different host, completely different vibe. Stack: Next.js 16, React 19, Tailwind 4, v0, Groq, ElevenLabs, Vercel. Code: github.com/anirxdh/spotify-for-hackers
with Zed
It's 2 AM. Joe wakes to a scream from the apartment next door. He goes up to check on Mrs. Hollis. He won't be coming back. Five cassette tapes. Four locks. One key. One way out — and it's not really a way out at all. Apartment 4B is a 15–20 minute browser horror game built in 36 hours for ElevenHacks. The thesis is simple: audio IS the game. Every meaningful beat of fear, bonding, and betrayal lands through ElevenLabs-generated voice — the cold-open narration, Mrs. Hollis's six tape recordings, Joe's reactions, the screams, the heartbeat under the climax, even the music bed. Three custom voices carry the entire emotional arc: Mora as Mrs. Hollis, Zelda as Joe, Jerry B. as the narrator. Every audio file is pre-baked at build time and committed as MP3 — the deployed site makes zero runtime API calls. The visual side is intentionally bare-bones. Every mesh is a Three.js primitive (box, cylinder, plane). Every texture — wallpaper, polaroids, the wall calendar, the spice rack — is drawn programmatically into a CanvasTexture. No GLTF imports, no Sketchfab. The PS1-era lo-fi aesthetic is on-trend for indie horror and lets sound design carry the weight. A 10-minute lighter battery counts down whenever you're playing. When it runs out, the game ends. Solve five gated tape puzzles to find the front-door key, turn it, and trigger the time-loop ending — pixel-identical to the cold open. You wake on the couch at 2:47 AM. There is no escape. Built with: Vite · React · TypeScript · React Three Fiber · Three.js · Drei · Howler.js · Zustand · @react-three/postprocessing · ElevenLabs (TTS + Sound Generation + Music API) · Zed (with its AI agent for pair-programming).
with AWS Kiro
Voices of the Last World In the year 2098, civilization is collapsing under climate disaster, AI instability, and resource collapse. The Archive preserves reconstructed strategic minds. Players do not command them directly. Instead, they choose who gets deployed and live with the consequences. This project is designed as: a polished cinematic web experience a Kiro spec-driven hackathon submission an ElevenLabs-powered voice product
with turbopuffer
Jungle Safari 🌿 - is an audio adventure for toddlers (ages 1–3). A kid hears a real animal sound, taps the right animal, and a warm AI mascot voice responds with a tailored fun fact — different every time, based on exactly what they picked. ElevenLabs does double duty: Sound Effects for every animal cry, Text-to-Speech ("Bella" voice) for the mascot. Each mascot reply is generated fresh by GPT-4o-mini reacting to the specific wrong guess vs the right answer, then spoken aloud. No pre-recorded lines. turbopuffer indexes every animal as a 1536-dim OpenAI embedding with metadata, making the library semantically searchable for themed expeditions — RAG for audio. Upstash Redis caches every mascot response as base64 audio so no voice reply is ever computed twice for any user worldwide. Live: thejunglesafari.netlify.app · github.com/anirxdh/JungleSafari
with Replit
FaceTime From Mars 2159 - A voice-powered web app where you call AI colonists living on Mars in the year 2159. - Humans migrated to Mars in 2070. Now, 90 years later, you can pick up a quantum relay and have a real voice conversation with three Mars colonists — each with a unique ElevenLabs voice and AI personality powered by Claude. The colonists: • Zeph (16) — born on Mars, never seen Earth, thinks rain is terrifying • Chef Riku (34) — recreates Earth food from archives, has never tasted pizza • Dr. Nova (41) — terraforming chief, making Mars breathable How it uses ElevenLabs: Each character has a custom ElevenLabs voice ID with tuned settings (low stability for expressiveness, high style for personality). Voice responses are processed through a Web Audio API radio filter with static bursts to simulate a real deep-space transmission. Three distinct voices — a kid, an adult male, and an adult female — make each conversation feel completely different. How it uses Replit: Built entirely with Replit Agent assistance. Backend (FastAPI) and frontend (Next.js) both deployed on Replit as a single project with two workflows. Secrets manager handles API keys. Published as a web app anyone can use. Additional features: 3D interactive Mars/Earth scene (React Three Fiber), walkie-talkie push-to-talk, camera transitions between planets, ambient space sounds, signal glitches, Mars clock, suggested conversation topics, and end-call transmission report.
with Cloudflare
VoiceCaptcha (The CAPTCHA that finally shuts up the bots, by making you speak up..) - It is a voice-based human verification system that goes beyond traditional CAPTCHAs. Instead of clicking checkboxes or solving image puzzles, users speak a randomly generated phrase. Their audio is sent to a Cloudflare Worker at the edge, transcribed in milliseconds using Groq Whisper, and fuzzy-matched against the expected phrase using dual scoring (ordered word sequence + bag-of-words). Challenge sessions are managed by a Durable Object with automatic TTL cleanup. - After passing, users can tap "Play ElevenLabs voice" to hear the same phrase spoken by ElevenLabs' multilingual TTS model -- creating a compelling human vs. AI contrast. Same words, completely different origin. That's the story. Tech: Cloudflare Workers + Durable Objects (SQLite-backed) | Groq Whisper Large v3 | ElevenLabs TTS v2 | React + Vite + Tailwind | Web Speech API (live word highlighting) | Web Audio API (Siri-style orb visualizer) Live: https://voicecaptcha.vercel.app
with Firecrawl
ScreenSense Voice is a multi-agent browser orchestrator that replaces the screenshot-paste-wait-read loop with a single voice command. - The problem: Every day, millions of people screenshot their screen, paste it into ChatGPT, ask what to do, read the answer, then manually go back and do it themselves. Over and over. - ScreenSense eliminates this entirely. Hold one key, speak naturally — and a pipeline of 6 AI agents kicks in: 1. ElevenLabs transcribes your voice in real-time 2. Firecrawl scrapes the full page into clean markdown — giving the AI complete page context, not just what's visible 3. A vision agent captures your screen and extracts every interactive element with precise CSS selectors 4. Claude AI reasons about the screenshot + full page content + your command — and returns a structured action 5. The browser executes it autonomously — clicking, typing, scrolling, navigating 6. The loop repeats with fresh context until your task is complete (up to 25 steps) - How it uses ElevenLabs: Voice-to-text transcription (primary STT) and natural voice readback (TTS) using the streaming API for instant audio response. - How it uses Firecrawl: Every voice command triggers a Firecrawl scrape that converts the full page into LLM-ready markdown. This gives the AI agent context about content below the fold — forms it can't see in the screenshot, data hidden in tabs, full article text. The agent reads the entire page before acting, enabling intelligent form-filling (it knows what fields to ask about) and deep page understanding. - Built as a Chrome MV3 extension with a FastAPI backend. 434 tests. Open source.
Submitted 7 May 2026
Submitted 30 Apr 2026
Submitted 23 Apr 2026
Submitted 16 Apr 2026
Submitted 6 Apr 2026
Submitted 2 Apr 2026
Submitted 23 Mar 2026