Farsi Transcriber - Jonah Saidian

Live Farsi Audio Transcription Tool

A web app that transcribes Farsi (Persian) audio files to text — deployed live and handling real-world file sizes through intelligent chunked processing.

Farsi Transcriber was built to solve a practical problem: accurately converting spoken Farsi audio into clean, readable text. The app leverages OpenAI's gpt-4o-transcribe model — one of the most capable multilingual speech recognition models available — and wraps it in a clean, accessible Streamlit interface that anyone can use without technical knowledge.

A core challenge with audio transcription is handling large files reliably. Rather than attempting to upload multi-hundred-megabyte files directly to the API, the app uses pydub to split audio into 200-second chunks and transcribes them sequentially, showing live progress as each segment completes. This makes the tool robust even for long recordings like lectures or interviews.

Technical Implementation

Architecture

UI Layer: Streamlit app managing multi-step session state (API setup → upload → transcription → results)
Core Logic: Dedicated api/transcriber.py module handling chunking and OpenAI calls
Audio Processing: pydub + FFmpeg for format conversion and segmentation
Deployment: Containerized via Docker, live at farsi-transcriber.jonahsaidian.com

Processing Pipeline

User uploads audio file via drag-and-drop
File loaded in a background thread with progress estimation
chunk_audio() splits into 200-second WAV segments
Each segment sent to OpenAI with language="fa"
Transcribed text accumulates live in the UI
Temp files cleaned up after each chunk

Features

File Upload

Drag-and-drop support for MP3, MP4, MPEG, MPGA, M4A, WAV, and WEBM formats.
Real-Time Progress

Live progress bar with time estimates and streaming transcription display as each chunk completes.
Text Export

Download the full transcription as a .txt file with one click.

Chunked Processing

Handles large audio files reliably by processing in 200-second segments — no file size bottlenecks.
Auto API Key

Reads OPENAI_API_KEY from environment or .env, with a fallback UI input field.
Clean Reset

One-click restart clears all state and the file uploader, ready for a new transcription.

Live Deployment

The app is publicly accessible at farsi-transcriber.jonahsaidian.com — built with OpenAI gpt-4o-transcribe for state-of-the-art Farsi speech recognition.

Live Farsi Audio Transcription Tool

Technical Implementation

Architecture

Processing Pipeline

Features

File Upload

Real-Time Progress

Text Export

Chunked Processing

Auto API Key

Clean Reset

Live Deployment

Project Info

Technology