# Quick Reference: PaddleOCRVL Timeout Fix ## Problem Solved ✓ Program no longer hangs when PaddleOCRVL encounters problematic seal images ✓ 60-second timeout protection on all PaddleOCRVL calls ✓ Graceful degradation to other OCR methods ## Quick Commands ### Run Test with Timeout Protection ```bash python test_accuracy_batch_full.py --ocr-model paddleocr_vl --batch --batch-size 20 ``` ### Run Test Without PaddleOCRVL (Faster) ```bash python test_accuracy_batch_full.py --ocr-model ppocr_v5 --batch --batch-size 20 --disable-paddleocrvl ``` ### Verify Timeout Mechanism ```bash python test_paddleocrvl_timeout.py ``` ## What Changed | File | Change | Lines | |------|--------|-------| | test_accuracy_batch_full.py | Added `_run_ocr_vl_wrapper()` | 721-784 | | test_accuracy_batch_full.py | Updated `run_ocr_recognition_vl()` | 787-850 | | test_accuracy_batch_full.py | Updated call site 1 | 1334 | | test_accuracy_batch_full.py | Updated call site 2 | 1356 | | test_accuracy_batch_full.py | Added `--disable-paddleocrvl` | 2419, 2495-2500 | ## Command-Line Options | Option | Description | |--------|-------------| | `--ocr-model ppocr_v5` | Use PP-OCRv5 model (faster, 85% accuracy) | | `--ocr-model paddleocr_vl` | Use PaddleOCRVL (slower, with timeout protection) | | `--disable-paddleocrvl` | Skip PaddleOCRVL initialization entirely | | `--batch` | Run batch testing mode | | `--batch-size N` | Process N PDFs | ## Expected Behavior ### Before Fix ``` 2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed... [program hangs indefinitely] ``` ### After Fix ``` 2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed... 2026-03-03 09:44:56,229 - WARNING - PaddleOCRVL recognition timeout (60s) for ... [continues to next seal] ``` ## Key Features ✓ **60-second timeout** per PaddleOCRVL call ✓ **Automatic cleanup** of hung processes ✓ **Graceful degradation** to other OCR methods ✓ **Windows compatible** (uses spawn mode) ✓ **User control** via --disable-paddleocrvl flag ## Test Results ``` Timeout mechanism: PASSED Normal completion: PASSED ``` ## Troubleshooting ### Issue: Still seeing timeouts **Solution**: Use `--disable-paddleocrvl` flag or switch to `ppocr_v5` model ### Issue: Processing is too slow **Solution**: Use `--ocr-model ppocr_v5` for faster processing (85% accuracy) ### Issue: Need to debug timeout **Solution**: Check logs for "timeout after 60s" messages and examine seal images ## Technical Details **Implementation**: Multiprocessing with 60s timeout **Process**: terminate() → wait 5s → kill() if needed **Result**: Returns empty dict on timeout, allows fallback OCR **Compatibility**: Windows (spawn), Linux (fork) ## Files - `test_accuracy_batch_full.py` - Main implementation - `test_paddleocrvl_timeout.py` - Verification test - `PADDLEOCRVL_TIMEOUT_FIX_SUMMARY.md` - Detailed documentation