report-detect/archive/docs/QUICK_FIX_REFERENCE.md

2.8 KiB

Quick Reference: PaddleOCRVL Timeout Fix

Problem Solved

✓ Program no longer hangs when PaddleOCRVL encounters problematic seal images ✓ 60-second timeout protection on all PaddleOCRVL calls ✓ Graceful degradation to other OCR methods

Quick Commands

Run Test with Timeout Protection

python test_accuracy_batch_full.py --ocr-model paddleocr_vl --batch --batch-size 20

Run Test Without PaddleOCRVL (Faster)

python test_accuracy_batch_full.py --ocr-model ppocr_v5 --batch --batch-size 20 --disable-paddleocrvl

Verify Timeout Mechanism

python test_paddleocrvl_timeout.py

What Changed

File Change Lines
test_accuracy_batch_full.py Added _run_ocr_vl_wrapper() 721-784
test_accuracy_batch_full.py Updated run_ocr_recognition_vl() 787-850
test_accuracy_batch_full.py Updated call site 1 1334
test_accuracy_batch_full.py Updated call site 2 1356
test_accuracy_batch_full.py Added --disable-paddleocrvl 2419, 2495-2500

Command-Line Options

Option Description
--ocr-model ppocr_v5 Use PP-OCRv5 model (faster, 85% accuracy)
--ocr-model paddleocr_vl Use PaddleOCRVL (slower, with timeout protection)
--disable-paddleocrvl Skip PaddleOCRVL initialization entirely
--batch Run batch testing mode
--batch-size N Process N PDFs

Expected Behavior

Before Fix

2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed...
[program hangs indefinitely]

After Fix

2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed...
2026-03-03 09:44:56,229 - WARNING - PaddleOCRVL recognition timeout (60s) for ...
[continues to next seal]

Key Features

60-second timeout per PaddleOCRVL call ✓ Automatic cleanup of hung processes ✓ Graceful degradation to other OCR methods ✓ Windows compatible (uses spawn mode) ✓ User control via --disable-paddleocrvl flag

Test Results

Timeout mechanism: PASSED
Normal completion: PASSED

Troubleshooting

Issue: Still seeing timeouts

Solution: Use --disable-paddleocrvl flag or switch to ppocr_v5 model

Issue: Processing is too slow

Solution: Use --ocr-model ppocr_v5 for faster processing (85% accuracy)

Issue: Need to debug timeout

Solution: Check logs for "timeout after 60s" messages and examine seal images

Technical Details

Implementation: Multiprocessing with 60s timeout Process: terminate() → wait 5s → kill() if needed Result: Returns empty dict on timeout, allows fallback OCR Compatibility: Windows (spawn), Linux (fork)

Files

  • test_accuracy_batch_full.py - Main implementation
  • test_paddleocrvl_timeout.py - Verification test
  • PADDLEOCRVL_TIMEOUT_FIX_SUMMARY.md - Detailed documentation