captionist-ISL

An AI-based system that converts speech into multilingual Indian language captions for Deaf users and low-literacy individuals.

Project Aim

Real-time speech-to-text captioning in major Indian languages for Deaf users
Simplified captions for easy comprehension by low-literacy users
Low latency, high accuracy, and accessibility across diverse platforms

Project Structure

captionist-ISL/
|
|-- testing/                        <- Phase 1 test scripts and sample audio
|   |-- testing_dependencies.py     <- Whisper install and Hindi transcription test
|   |-- conversion.py               <- Audio format conversion (m4a, mp3, wav)
|   |-- audio.wav                   <- Test audio (mic recorded)
|   |-- audio.mp3                   <- Converted audio
|   `-- Recording.m4a               <- Raw recording for testing
|
|-- Video_captioning.py             <- Phase 2A main script
|-- extract_audios/                 <- Output folder (auto-created)
|   |-- {filename}_audio.wav        <- Extracted audio from video
|   `-- {filename}_captions.srt     <- Generated subtitle file
|
|-- README.md
|-- Requirements.txt
`-- .gitignore

Tech Stack

Component	Tool
Audio capture	sounddevice + scipy
Format convert	ffmpeg
Transcription	faster-whisper
GPU acceleration	CUDA 12.7 + RTX 2050 (float16)
Caption format	SRT (universal subtitle format)
Video playback	VLC

Installation

Step 1 - Install PyTorch with CUDA

pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

Step 2 - Install Whisper and audio packages

pip install openai-whisper faster-whisper sounddevice numpy scipy

Step 3 - Install ffmpeg

Windows: Download from https://ffmpeg.org/download.html
Ubuntu : sudo apt install ffmpeg
macOS  : brew install ffmpeg

Step 4 - Install VLC

Download from: https://www.videolan.org/

Step 5 - Verify GPU

python -c "import torch; print(torch.cuda.is_available())"
# Should print: True

Supported Languages

Code	Language	Script
hi	Hindi	हिन्दी
ta	Tamil	தமிழ்
te	Telugu	తెలుగు
bn	Bengali	বাংলা
mr	Marathi	मराठी
gu	Gujarati	ગુજराती
kn	Kannada	ಕನ್ನಡ
ml	Malayalam	മലയാളം

Phase 1 - Dependency Testing

Tests Whisper installation, Hindi transcription, and audio conversion.

cd testing
python testing_dependencies.py
python conversion.py

What was tested:

Whisper base and small model loading
Hindi Devanagari script output with language="hi"
Audio format conversion using ffmpeg (m4a to wav, mp4 to wav)
GPU detection with torch.cuda.is_available()

Phase 2A - Video Caption Pipeline

Full pipeline: video or audio file in, SRT captions out, VLC auto-launched.

Usage

python Video_captioning.py

Enter video/audio file path: E:\videos\speech.mp4
Enter language code (default hi): hi

Pipeline Steps

Step 1 - extract_audio()    Video/audio file -> 16kHz WAV (ffmpeg)
Step 2 - transcribe()       WAV -> timestamped segments (faster-whisper GPU)
Step 3 - generate_srt()     Segments -> .SRT subtitle file
Step 4 - play_vlc()         Original video + SRT -> VLC player

Output

Two files are saved to the script directory:

speech_audio.wav        <- Extracted audio
speech_captions.srt     <- Subtitle file

SRT format example:

1
00:00:00,000 --> 00:00:03,640
तो जब मैं मुंबई आया

2
00:00:03,640 --> 00:00:07,200
मेरे दिमाग में बहुत सारे सपने थे

Performance on RTX 2050

Model	Speed	VRAM used
small	2.5x realtime	~1.5 GB
medium	1.5x realtime	~2.5 GB

Whisper Model Reference

Model	Speed	Accuracy	RAM needed
tiny	fastest	low	~1 GB
base	fast	medium	~1 GB
small	balanced	good	~2 GB
medium	slow	better	~5 GB
large	slowest	best	~10 GB

Recommended: small for speed, medium for accuracy on Indian languages.

Known Issues

48KBPS audio produces garbled transcription - use audio above 128KBPS for best results
Hindi and Urdu sound identical to Whisper - always set language="hi" explicitly
Small model may mix Hindi and English on code-switched speech - use medium model

Phase Roadmap

Phase 1     (done)    Dependency testing, Hindi transcription, audio conversion
Phase 2A    (done)    Video -> SRT -> VLC pipeline with GPU acceleration
Phase 2B    (next)    Live mic streaming, chunked real-time captions under 2s
Phase 3               Bhashini API integration for more regional languages
Phase 4               Low-literacy feature: chunk audio -> LLM summarize -> simple caption
Phase 5               Web UI and platform integration

Troubleshooting

torch.cuda.is_available() returns False Reinstall PyTorch with correct CUDA version:

pip uninstall torch torchaudio -y
pip install torch torchaudio --index-url https://download.pytorch.org/whl/cu121

ffmpeg not found Add ffmpeg to system PATH or reinstall from https://ffmpeg.org/download.html

VLC not launching Check VLC is installed at one of these paths:

C:\Program Files\VideoLAN\VLC\vlc.exe
C:\Program Files (x86)\VideoLAN\VLC\vlc.exe

Or add your custom VLC path to the VLC_PATHS list in Video_captioning.py

Hindi output comes in Urdu script Always pass language="hi" explicitly - never let Whisper auto-detect for Indian languages

Script exits after transcription with no error Add this line at the top of Video_captioning.py after imports:

sys.stdout.reconfigure(encoding='utf-8')

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
temp_scripts		temp_scripts
testing		testing
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
Requirements.txt		Requirements.txt
captionist.py		captionist.py
working.md		working.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

captionist-ISL

Project Aim

Project Structure

Tech Stack

Installation

Step 1 - Install PyTorch with CUDA

Step 2 - Install Whisper and audio packages

Step 3 - Install ffmpeg

Step 4 - Install VLC

Step 5 - Verify GPU

Supported Languages

Phase 1 - Dependency Testing

Phase 2A - Video Caption Pipeline

Usage

Pipeline Steps

Output

Performance on RTX 2050

Whisper Model Reference

Known Issues

Phase Roadmap

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

captionist-ISL

Project Aim

Project Structure

Tech Stack

Installation

Step 1 - Install PyTorch with CUDA

Step 2 - Install Whisper and audio packages

Step 3 - Install ffmpeg

Step 4 - Install VLC

Step 5 - Verify GPU

Supported Languages

Phase 1 - Dependency Testing

Phase 2A - Video Caption Pipeline

Usage

Pipeline Steps

Output

Performance on RTX 2050

Whisper Model Reference

Known Issues

Phase Roadmap

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages