Audio to Text — Gemini Free AI

Transcribe audio directly in your browser sidebar — powered by Google Gemini AI.

Audio to Text - Gemini Free AI is a Chrome extension that brings fast, accurate speech-to-text capabilities into your browser without ever leaving the page you are on. It uses the Google Gemini 2.0 Flash-Lite model to transcribe audio files uploaded directly from your computer.

Why This Extension?

Seamless Browser Integration

The extension lives in Chrome's side panel, so you can upload audio, monitor transcription progress, and copy results — all while continuing to browse the web. There is no need to switch tabs or open another application.

Powered by Gemini AI

Google Gemini 2.0 Flash-Lite offers high-quality speech recognition with low latency. The extension handles the full lifecycle: upload, server-side processing (polling), transcription generation, and file cleanup.

Privacy-First Design

Your API key and preferences are stored locally in chrome.storage.local and are never sent to any server other than Google Gemini API for the transcription request itself. Audio files are deleted from Google servers immediately after processing completes. The extension operates without its own backend infrastructure.

Customizable Prompts

Fine-tune transcription output with custom prompts. You can specify formatting rules, request translations, summarize content, or adapt the output style to your needs — all from the Settings panel.

Key Features

How It Works

  1. Configure: Enter your Google Gemini API key in Settings (get one free from Google AI Studio).
  2. Upload: Select or drag an audio file into the extension.
  3. Transcribe: Click "Start Transcription". The extension uploads the file, waits for Google Gemini to process it, and streams the transcription result.
  4. Use: Copy the transcribed text or export it as a .txt file.

Use Cases

Technical Overview

Architecture

All core logic runs directly within the side panel context, eliminating cross-process messaging overhead. The service layer consists of three classes:

Version: 1.0.0 — MIT License

Privacy Policy | All Projects