Audio to Text — Gemini Free AI
Transcribe audio directly in your browser sidebar — powered by Google Gemini AI.
Audio to Text - Gemini Free AI is a Chrome extension that brings fast, accurate speech-to-text capabilities into your browser without ever leaving the page you are on. It uses the Google Gemini 2.0 Flash-Lite model to transcribe audio files uploaded directly from your computer.
Why This Extension?
Seamless Browser Integration
The extension lives in Chrome's side panel, so you can upload audio, monitor transcription progress, and copy results — all while continuing to browse the web. There is no need to switch tabs or open another application.
Powered by Gemini AI
Google Gemini 2.0 Flash-Lite offers high-quality speech recognition with low latency. The extension handles the full lifecycle: upload, server-side processing (polling), transcription generation, and file cleanup.
Privacy-First Design
Your API key and preferences are stored locally in chrome.storage.local and are never sent to any server other than Google Gemini API for the transcription request itself. Audio files are deleted from Google servers immediately after processing completes. The extension operates without its own backend infrastructure.
Customizable Prompts
Fine-tune transcription output with custom prompts. You can specify formatting rules, request translations, summarize content, or adapt the output style to your needs — all from the Settings panel.
Key Features
- Side Panel UI — Access transcription from Chrome's side panel without disrupting your workflow
- Audio File Upload — Upload MP3, WAV, M4A, and other common audio formats
- Drag & Drop Support — Drag audio files directly into the extension
- Real-time Progress — Visual feedback for upload, processing, and transcription stages
- Custom Prompts — Default or user-defined prompts to control output format
- One-Click Copy — Copy transcription text or error details with a single click
- Export as TXT — Download transcription results as a plain-text file
- Task Cancellation — Cancel an in-progress transcription at any time
- Stale File Cleanup — Automatic cleanup of orphaned files on Google servers at startup
How It Works
- Configure: Enter your Google Gemini API key in Settings (get one free from Google AI Studio).
- Upload: Select or drag an audio file into the extension.
- Transcribe: Click "Start Transcription". The extension uploads the file, waits for Google Gemini to process it, and streams the transcription result.
- Use: Copy the transcribed text or export it as a
.txt file.
Use Cases
- Journalists & Researchers — Quickly transcribe interview recordings and meeting notes.
- Students — Convert lecture recordings into searchable text for study notes.
- Content Creators — Generate captions or transcripts for video and podcast content.
- Professionals — Document meetings and conference calls efficiently.
- Language Learners — Practice listening comprehension with accurate text transcripts.
Technical Overview
- Framework: WXT (Web Extension Toolbox)
- Build Tool: Vite
- Styling: Tailwind CSS via PostCSS
- Language: TypeScript
- AI SDK:
@google/genai (Google Gemini SDK)
- Storage:
chrome.storage.local
- Permissions:
sidePanel, storage
- Minimal Service Worker: Only used to open the side panel on icon click
Architecture
All core logic runs directly within the side panel context, eliminating cross-process messaging overhead. The service layer consists of three classes:
- GeminiService — Handles file upload, polling, transcription generation, and cleanup via the Google GenAI SDK.
- StorageService — Encapsulates
chrome.storage.local for API key, prompt, and settings persistence.
- FileStateManager — Manages task cancellation via
AbortController and reports progress back to the UI.
Version: 1.0.0 — MIT License
Privacy Policy | All Projects