report-detect/archive/docs/PADDLEOCRVL_TIMEOUT_FIX_SUM...

5.9 KiB

PaddleOCRVL Timeout Fix - Implementation Summary

Problem

The test_accuracy_batch_full.py script was hanging indefinitely when PaddleOCRVL's predict() method encountered certain seal images. The program would stop responding with no timeout protection.

Root Cause

PaddleOCRVL's predict() method has no built-in timeout mechanism. When processing certain problematic images, the method can block indefinitely, causing the entire program to hang.

Solution Implemented

A comprehensive timeout protection mechanism using Python's multiprocessing module:

1. Module-Level Wrapper Function

Added _run_ocr_vl_wrapper() function (line 721) that:

  • Can be pickled and run in a subprocess (required for Windows compatibility)
  • Re-initializes PaddleOCRVL pipeline in the subprocess
  • Handles exceptions gracefully
  • Returns results via a multiprocessing.Queue

2. Timeout-Protected OCR Function

Replaced run_ocr_recognition_vl() function (line 787) with:

  • Default timeout of 60 seconds
  • Subprocess-based execution
  • Automatic termination after timeout
  • Graceful cleanup with terminate() and fallback to kill()
  • Proper error handling and logging

3. Updated Call Sites

Updated both PaddleOCRVL call sites:

  • Line 1334: Backup OCR after unwarp failure
  • Line 1356: Direct OCR when unwarp is unavailable

Both now include timeout=60 parameter.

4. Command-Line Option

Added --disable-paddleocrvl flag to:

  • Allow users to completely skip PaddleOCRVL initialization
  • Provide faster execution for batch testing
  • Enable quick workaround if timeout issues persist

Files Modified

  1. test_accuracy_batch_full.py - Main implementation

    • Added _run_ocr_vl_wrapper() function
    • Replaced run_ocr_recognition_vl() function
    • Updated 2 call sites with timeout parameter
    • Added --disable-paddleocrvl command-line option
  2. test_paddleocrvl_timeout.py - New test script

    • Verifies timeout mechanism works correctly
    • Tests both timeout and normal completion scenarios
    • All tests PASSED

Usage

Option 1: Use with Timeout Protection (Default)

# Uses PaddleOCRVL with 60s timeout protection
python test_accuracy_batch_full.py --ocr-model paddleocr_vl --batch --batch-size 20

Option 2: Disable PaddleOCRVL (Faster)

# Skip PaddleOCRVL entirely, use only ppocr_v5
python test_accuracy_batch_full.py --ocr-model ppocr_v5 --batch --batch-size 20 --disable-paddleocrvl
# Use ppocr_v5 for both primary and backup OCR
python test_accuracy_batch_full.py --ocr-model ppocr_v5 --batch --batch-size 20

Test Results

Timeout Test

Timeout mechanism: PASSED
Normal completion: PASSED

[OK] All tests passed! The multiprocessing timeout mechanism works correctly.
  PaddleOCRVL calls will be protected from hanging indefinitely.

Key Features

  1. 60-Second Timeout: Each PaddleOCRVL call is limited to 60 seconds
  2. Graceful Degradation: Timeout returns empty result, allowing other OCR methods to be tried
  3. Resource Cleanup: Subprocesses are properly terminated even if they hang
  4. Windows Compatible: Uses module-level functions to avoid pickle issues
  5. Detailed Logging: All timeouts are logged with context for debugging

Benefits

  1. No More Hanging: Program will never block indefinitely on PaddleOCRVL
  2. Predictable Runtime: Maximum of 60 seconds per seal image
  3. Better Error Handling: Clear error messages when timeouts occur
  4. User Control: Option to disable PaddleOCRVL if needed
  5. Backward Compatible: Existing code continues to work with minimal changes

Technical Details

Multiprocessing on Windows

Windows uses "spawn" mode for multiprocessing, which requires:

  • Target functions to be picklable
  • Functions defined at module level (not nested)
  • Re-import of modules in subprocess

This is why _run_ocr_vl_wrapper is defined at module level and re-initializes the PaddleOCRVL pipeline.

Timeout Mechanism Flow

  1. Main process creates multiprocessing.Queue
  2. Subprocess starts with wrapper function
  3. Main process waits with 60-second timeout
  4. If timeout occurs:
    • terminate() sends SIGTERM
    • Wait 5 seconds for cleanup
    • If still alive, kill() sends SIGKILL
  5. Return failure result to allow fallback

Error Handling

The implementation handles multiple error scenarios:

  • Process timeout (most common)
  • Process crash during execution
  • Queue communication failures
  • PaddleOCRVL initialization failures
  • File I/O errors

Recommendations

  1. For Testing: Use --ocr-model ppocr_v5 for faster batch processing
  2. For Production: Keep default timeout (60s) for PaddleOCRVL backup
  3. For Debugging: Check logs for "timeout after 60s" messages to identify problematic seals
  4. For Speed: Consider increasing timeout only if legitimate cases need more time

Future Improvements

  1. Add adaptive timeout based on image size
  2. Cache PaddleOCRVL results to avoid re-processing
  3. Add statistics on timeout frequency
  4. Consider using ProcessPoolExecutor for better resource management

Verification

To verify the fix works:

# Run timeout test
python test_paddleocrvl_timeout.py

# Run batch test with PaddleOCRVL
python test_accuracy_batch_full.py --ocr-model paddleocr_vl --batch --batch-size 5

# Verify no hanging occurs
# Check test_reports_full/test_report.json for results
  • test_accuracy_batch_full.py - Main implementation (lines 721-850)
  • test_paddleocrvl_timeout.py - Timeout verification test
  • test_reports_full/test_report.json - Test results output

Conclusion

The PaddleOCRVL timeout issue has been successfully resolved. The program will no longer hang indefinitely when processing problematic seal images. The timeout mechanism provides a balance between allowing sufficient time for legitimate processing and preventing indefinite blocks.