Audio to Text — Gemini Free AI

Transcribe audio directly in your browser sidebar — powered by Google Gemini AI.

Audio to Text - Gemini Free AI is a Chrome extension that brings fast, accurate speech-to-text capabilities into your browser without ever leaving the page you are on. It uses the Google Gemini 2.0 Flash-Lite model to transcribe audio files uploaded directly from your computer.

Why This Extension?

Seamless Browser Integration

The extension lives in Chrome's side panel, so you can upload audio, monitor transcription progress, and copy results — all while continuing to browse the web. There is no need to switch tabs or open another application.

Powered by Gemini AI

Google Gemini 2.0 Flash-Lite offers high-quality speech recognition with low latency. The extension handles the full lifecycle: upload, server-side processing (polling), transcription generation, and file cleanup.

Privacy-First Design

Your API key and preferences are stored locally in chrome.storage.local and are never sent to any server other than Google Gemini API for the transcription request itself. Audio files are deleted from Google servers immediately after processing completes. The extension operates without its own backend infrastructure.

Customizable Prompts

Fine-tune transcription output with custom prompts. You can specify formatting rules, request translations, summarize content, or adapt the output style to your needs — all from the Settings panel.

Key Features

Side Panel UI — Access transcription from Chrome's side panel without disrupting your workflow
Audio File Upload — Upload MP3, WAV, M4A, and other common audio formats
Drag & Drop Support — Drag audio files directly into the extension
Real-time Progress — Visual feedback for upload, processing, and transcription stages
Custom Prompts — Default or user-defined prompts to control output format
One-Click Copy — Copy transcription text or error details with a single click
Export as TXT — Download transcription results as a plain-text file
Task Cancellation — Cancel an in-progress transcription at any time
Stale File Cleanup — Automatic cleanup of orphaned files on Google servers at startup

How It Works

Configure: Enter your Google Gemini API key in Settings (get one free from Google AI Studio).
Upload: Select or drag an audio file into the extension.
Transcribe: Click "Start Transcription". The extension uploads the file, waits for Google Gemini to process it, and streams the transcription result.
Use: Copy the transcribed text or export it as a .txt file.

Use Cases

Journalists & Researchers — Quickly transcribe interview recordings and meeting notes.
Students — Convert lecture recordings into searchable text for study notes.
Content Creators — Generate captions or transcripts for video and podcast content.
Professionals — Document meetings and conference calls efficiently.
Language Learners — Practice listening comprehension with accurate text transcripts.

Technical Overview

Framework: WXT (Web Extension Toolbox)
Build Tool: Vite
Styling: Tailwind CSS via PostCSS
Language: TypeScript
AI SDK: @google/genai (Google Gemini SDK)
Storage: chrome.storage.local
Permissions: sidePanel, storage
Minimal Service Worker: Only used to open the side panel on icon click

Architecture

All core logic runs directly within the side panel context, eliminating cross-process messaging overhead. The service layer consists of three classes:

GeminiService — Handles file upload, polling, transcription generation, and cleanup via the Google GenAI SDK.
StorageService — Encapsulates chrome.storage.local for API key, prompt, and settings persistence.
FileStateManager — Manages task cancellation via AbortController and reports progress back to the UI.

Version: 1.0.0 — MIT License