An AI-based system that converts speech into multilingual Indian language captions for Deaf users and low-literacy individuals.
- Real-time speech-to-text captioning in major Indian languages for Deaf users
- Simplified captions for easy comprehension by low-literacy users
- Low latency, high accuracy, and accessibility across diverse platforms
captionist-ISL/
|
|-- testing/ <- Phase 1 test scripts and sample audio
| |-- testing_dependencies.py <- Whisper install and Hindi transcription test
| |-- conversion.py <- Audio format conversion (m4a, mp3, wav)
| |-- audio.wav <- Test audio (mic recorded)
| |-- audio.mp3 <- Converted audio
| `-- Recording.m4a <- Raw recording for testing
|
|-- Video_captioning.py <- Phase 2A main script
|-- extract_audios/ <- Output folder (auto-created)
| |-- {filename}_audio.wav <- Extracted audio from video
| `-- {filename}_captions.srt <- Generated subtitle file
|
|-- README.md
|-- Requirements.txt
`-- .gitignore
| Component | Tool |
|---|---|
| Audio capture | sounddevice + scipy |
| Format convert | ffmpeg |
| Transcription | faster-whisper |
| GPU acceleration | CUDA 12.7 + RTX 2050 (float16) |
| Caption format | SRT (universal subtitle format) |
| Video playback | VLC |
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121pip install openai-whisper faster-whisper sounddevice numpy scipyWindows: Download from https://ffmpeg.org/download.html
Ubuntu : sudo apt install ffmpeg
macOS : brew install ffmpeg
Download from: https://www.videolan.org/
python -c "import torch; print(torch.cuda.is_available())"
# Should print: True| Code | Language | Script |
|---|---|---|
| hi | Hindi | हिन्दी |
| ta | Tamil | தமிழ் |
| te | Telugu | తెలుగు |
| bn | Bengali | বাংলা |
| mr | Marathi | मराठी |
| gu | Gujarati | ગુજराती |
| kn | Kannada | ಕನ್ನಡ |
| ml | Malayalam | മലയാളം |
Tests Whisper installation, Hindi transcription, and audio conversion.
cd testing
python testing_dependencies.py
python conversion.pyWhat was tested:
- Whisper base and small model loading
- Hindi Devanagari script output with language="hi"
- Audio format conversion using ffmpeg (m4a to wav, mp4 to wav)
- GPU detection with torch.cuda.is_available()
Full pipeline: video or audio file in, SRT captions out, VLC auto-launched.
python Video_captioning.pyEnter video/audio file path: E:\videos\speech.mp4
Enter language code (default hi): hi
Step 1 - extract_audio() Video/audio file -> 16kHz WAV (ffmpeg)
Step 2 - transcribe() WAV -> timestamped segments (faster-whisper GPU)
Step 3 - generate_srt() Segments -> .SRT subtitle file
Step 4 - play_vlc() Original video + SRT -> VLC player
Two files are saved to the script directory:
speech_audio.wav <- Extracted audio
speech_captions.srt <- Subtitle file
SRT format example:
1
00:00:00,000 --> 00:00:03,640
तो जब मैं मुंबई आया
2
00:00:03,640 --> 00:00:07,200
मेरे दिमाग में बहुत सारे सपने थे
| Model | Speed | VRAM used |
|---|---|---|
| small | 2.5x realtime | ~1.5 GB |
| medium | 1.5x realtime | ~2.5 GB |
| Model | Speed | Accuracy | RAM needed |
|---|---|---|---|
| tiny | fastest | low | ~1 GB |
| base | fast | medium | ~1 GB |
| small | balanced | good | ~2 GB |
| medium | slow | better | ~5 GB |
| large | slowest | best | ~10 GB |
Recommended: small for speed, medium for accuracy on Indian languages.
- 48KBPS audio produces garbled transcription - use audio above 128KBPS for best results
- Hindi and Urdu sound identical to Whisper - always set language="hi" explicitly
- Small model may mix Hindi and English on code-switched speech - use medium model
Phase 1 (done) Dependency testing, Hindi transcription, audio conversion
Phase 2A (done) Video -> SRT -> VLC pipeline with GPU acceleration
Phase 2B (next) Live mic streaming, chunked real-time captions under 2s
Phase 3 Bhashini API integration for more regional languages
Phase 4 Low-literacy feature: chunk audio -> LLM summarize -> simple caption
Phase 5 Web UI and platform integration
torch.cuda.is_available() returns False Reinstall PyTorch with correct CUDA version:
pip uninstall torch torchaudio -y
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121ffmpeg not found Add ffmpeg to system PATH or reinstall from https://ffmpeg.org/download.html
VLC not launching Check VLC is installed at one of these paths:
C:\Program Files\VideoLAN\VLC\vlc.exe
C:\Program Files (x86)\VideoLAN\VLC\vlc.exe
Or add your custom VLC path to the VLC_PATHS list in Video_captioning.py
Hindi output comes in Urdu script Always pass language="hi" explicitly - never let Whisper auto-detect for Indian languages
Script exits after transcription with no error Add this line at the top of Video_captioning.py after imports:
sys.stdout.reconfigure(encoding='utf-8')