Hack #5: Kiro · AWS Kiro
23 Apr, 08:57
PinDrop is an interactive map soundscape generation application. Users simply place a pin anywhere on a world map to instantly hear an environmental soundscape "belonging to that place and that moment," extending the imagination of travel from sight to sound. Technically, the entire product is a multi-layered sound synthesis pipeline assembled around ElevenLabs' three audio generation capabilities: sound-generation handles the environmental layer and iconic sound effects (such as rainforest insect chirps and market vendor calls); the text-to-speech + voices interface intelligently matches voices based on the local language, generating dialogue and sub-dialogue layers in the local language; and the music interface (music_v1 model) adds an ambient music layer. Finally, the five audio layers are rendered and mixed in parallel in the browser. Before entering ElevenLabs, the system uses an LLM (Local Language Model) combined with reverse geocoding, time zone, terrain, and language cues to expand "a coordinate" into a specific narrative scene (llmAnchorEnricher→sceneNarrative→recipeGenerator), making the prompt more vivid. The entire project was built from scratch using Kiro's spec-driven workflow—the .kiro/specs/ directory was divided into seven modules: 01 Map Interaction, 02 Geocoding, 03 Soundscape Engine, 04 Time System, 05 Player, 06 Caching, and 07 UI Settings. These modules were further divided into requirements, design, and tasks in a three-part specification. Combined with the architecture, coding style, error handling, ElevenLabs calling patterns, security, and testing standards in the .kiro/steering/ directory, the AI followed the same "product constitution" when writing every piece of code, ultimately achieving a complete closed loop from point map → narrative completion → multi-layer generation → local IndexedDB cache reuse. What makes it special is that most AI audio demos remain at the level of "inputting a prompt to generate a sound", while PinDrop uses a geographical coordinate as the starting point for creation, allowing LLM to first "imagine" what is happening here, and then let ElevenLabs "hear" it out. When you click on Sahara, you can hear the sandstorm and the distant Arabic prayers; when you click on Tokyo Late Night, you hear convenience store doorbells, train announcements and Japanese whispers. The coordinates become the play button, and the map becomes a global sound guide, allowing people to "go" to a place with their ears for the first time.
