A web app that transcribes Farsi (Persian) audio files to text — deployed live and handling real-world file sizes through intelligent chunked processing.
Farsi Transcriber was built to solve a practical problem: accurately converting spoken Farsi audio into clean, readable text. The app leverages OpenAI's gpt-4o-transcribe model — one of the most capable multilingual speech recognition models available — and wraps it in a clean, accessible Streamlit interface that anyone can use without technical knowledge.
A core challenge with audio transcription is handling large files reliably. Rather than attempting to upload multi-hundred-megabyte files directly to the API, the app uses pydub to split audio into 200-second chunks and transcribes them sequentially, showing live progress as each segment completes. This makes the tool robust even for long recordings like lectures or interviews.
api/transcriber.py module handling chunking and OpenAI callschunk_audio() splits into 200-second WAV segmentslanguage="fa"Drag-and-drop support for MP3, MP4, MPEG, MPGA, M4A, WAV, and WEBM formats.
Live progress bar with time estimates and streaming transcription display as each chunk completes.
Download the full transcription as a .txt file with one click.
Handles large audio files reliably by processing in 200-second segments — no file size bottlenecks.
Reads OPENAI_API_KEY from environment or .env, with a fallback UI input field.
One-click restart clears all state and the file uploader, ready for a new transcription.
The app is publicly accessible at farsi-transcriber.jonahsaidian.com — built with OpenAI gpt-4o-transcribe for state-of-the-art Farsi speech recognition.