Hack #1: Firecrawl · Firecrawl
25 Mar, 04:39
Quantext is like TV with a commentator who actually did the homework. The problem: YouTube is full of confident claims and hot takes. Watching alone means you either swallow it or pause and Google forever. I wanted watch-together energy: someone in the room who can fact-check, rebut, or roast at the right beat—then get out of the way. What I built: Paste any YouTube URL. The app pulls the transcript (or transcribes), uses an LLM to find high-signal moments (stats, bold claims, missing context), then Firecrawl runs parallel search + scrape across those moments so every interjection ships with real sources, not vibes. At playback, an ElevenLabs Conversational AI agent fires on timestamp—pause → speak → auto-resume—so it feels like the video was edited to include commentary. Ask a question mid-watch and the agent can search again live via a custom Firecrawl client side tool, so ad-hoc rabbit holes don’t break the illusion. Clever Firecrawl use: Research isn’t one big dump. It’s per-claim, pre-orchestrated so interjections are instant at runtime, plus on-demand when the user interrupts. That’s the hack: pre-compute the expensive part, keep Firecrawl in the loop when the human goes off-script. Clever ElevenLabs use: This isn’t TTS reading a script. It’s a full agent session with tooling (pause/resume the player, live search) and persona-conditioned behavior. I cloned my voice and fed roughly a decade of my stand-up comedy writing into one of the agent’s world model so “me” sounds like my timing and voice—not a generic narrator. On top of that: 23 distinct commentator personas (e.g. Grandma Gloria, Rage Rick, Sir David Attenburro, Ackshually Alex, Conspiracy Carl) and two modes—The Full Picture (sourced fact-checking) and The Fool Picture (MST3K-style roasting)—so the same pipeline feels like a roster of shows, not one bot. Engineering that matters: Caching (transcripts, prep artifacts, session payloads) so repeat watches and shares don’t re-pay the full prep tax. Share links so you can send a URL that opens the same video, persona, and mode—demoable, viral-friendly, and actually usable after the hackathon. Why it’s a flex: It chains Firecrawl’s research graph with ElevenLabs’ realtime voice + tools in a loop built around real video time—not a chat window dressed up as a product. The “wait, it stopped the video and cited that?” moment is the whole thesis.
