2.8 KiB
Quick Reference: PaddleOCRVL Timeout Fix
Problem Solved
✓ Program no longer hangs when PaddleOCRVL encounters problematic seal images ✓ 60-second timeout protection on all PaddleOCRVL calls ✓ Graceful degradation to other OCR methods
Quick Commands
Run Test with Timeout Protection
python test_accuracy_batch_full.py --ocr-model paddleocr_vl --batch --batch-size 20
Run Test Without PaddleOCRVL (Faster)
python test_accuracy_batch_full.py --ocr-model ppocr_v5 --batch --batch-size 20 --disable-paddleocrvl
Verify Timeout Mechanism
python test_paddleocrvl_timeout.py
What Changed
| File | Change | Lines |
|---|---|---|
| test_accuracy_batch_full.py | Added _run_ocr_vl_wrapper() |
721-784 |
| test_accuracy_batch_full.py | Updated run_ocr_recognition_vl() |
787-850 |
| test_accuracy_batch_full.py | Updated call site 1 | 1334 |
| test_accuracy_batch_full.py | Updated call site 2 | 1356 |
| test_accuracy_batch_full.py | Added --disable-paddleocrvl |
2419, 2495-2500 |
Command-Line Options
| Option | Description |
|---|---|
--ocr-model ppocr_v5 |
Use PP-OCRv5 model (faster, 85% accuracy) |
--ocr-model paddleocr_vl |
Use PaddleOCRVL (slower, with timeout protection) |
--disable-paddleocrvl |
Skip PaddleOCRVL initialization entirely |
--batch |
Run batch testing mode |
--batch-size N |
Process N PDFs |
Expected Behavior
Before Fix
2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed...
[program hangs indefinitely]
After Fix
2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed...
2026-03-03 09:44:56,229 - WARNING - PaddleOCRVL recognition timeout (60s) for ...
[continues to next seal]
Key Features
✓ 60-second timeout per PaddleOCRVL call ✓ Automatic cleanup of hung processes ✓ Graceful degradation to other OCR methods ✓ Windows compatible (uses spawn mode) ✓ User control via --disable-paddleocrvl flag
Test Results
Timeout mechanism: PASSED
Normal completion: PASSED
Troubleshooting
Issue: Still seeing timeouts
Solution: Use --disable-paddleocrvl flag or switch to ppocr_v5 model
Issue: Processing is too slow
Solution: Use --ocr-model ppocr_v5 for faster processing (85% accuracy)
Issue: Need to debug timeout
Solution: Check logs for "timeout after 60s" messages and examine seal images
Technical Details
Implementation: Multiprocessing with 60s timeout Process: terminate() → wait 5s → kill() if needed Result: Returns empty dict on timeout, allows fallback OCR Compatibility: Windows (spawn), Linux (fork)
Files
test_accuracy_batch_full.py- Main implementationtest_paddleocrvl_timeout.py- Verification testPADDLEOCRVL_TIMEOUT_FIX_SUMMARY.md- Detailed documentation