350 points · 2 submissions
with Cloudflare
1:1 video calls on Cloudflare Workers with real-time translation: Whisper (STT) → m2m100 (Workers AI) → ElevenLabs (TTS). Signaling and translated audio go through a Durable Object per room; WebRTC carries video only.
Submitted 2 Apr 2026
with Firecrawl
Otto turns your camera into a real-world AI brain. Point at a book, pair of shoes, restaurant menu, or supplement bottle. Ask "What is this?", "Cheaper nearby?", "Veg options?", or "Safe dosage?". Otto uses Gemini Vision to understand what it sees, Firecrawl to scrape live web data (prices, reviews, medical guidance) all of this up-to-date information, and ElevenLabs to speak structured answers aloud. No typing exact names or juggling tabs, just point, ask, know. Real examples: Books: Identifies title/edition, pulls price/ratings/summary Shopping: Finds same item cheaper nearby or online + maps Menus: Extracts veg/non-veg options + reviews/phone Health: Dosage ranges, safety warnings, alternatives Why it matters: 90% of decisions happen physically (shops, streets, pharmacies). Otto eliminates the gap between seeing something and knowing what to do about it. Vision-first AI for everyday life.
Submitted 23 Mar 2026