report-detect/archive/docs/QUICK_FIX_REFERENCE.md

98 lines
2.8 KiB
Markdown

# Quick Reference: PaddleOCRVL Timeout Fix
## Problem Solved
✓ Program no longer hangs when PaddleOCRVL encounters problematic seal images
✓ 60-second timeout protection on all PaddleOCRVL calls
✓ Graceful degradation to other OCR methods
## Quick Commands
### Run Test with Timeout Protection
```bash
python test_accuracy_batch_full.py --ocr-model paddleocr_vl --batch --batch-size 20
```
### Run Test Without PaddleOCRVL (Faster)
```bash
python test_accuracy_batch_full.py --ocr-model ppocr_v5 --batch --batch-size 20 --disable-paddleocrvl
```
### Verify Timeout Mechanism
```bash
python test_paddleocrvl_timeout.py
```
## What Changed
| File | Change | Lines |
|------|--------|-------|
| test_accuracy_batch_full.py | Added `_run_ocr_vl_wrapper()` | 721-784 |
| test_accuracy_batch_full.py | Updated `run_ocr_recognition_vl()` | 787-850 |
| test_accuracy_batch_full.py | Updated call site 1 | 1334 |
| test_accuracy_batch_full.py | Updated call site 2 | 1356 |
| test_accuracy_batch_full.py | Added `--disable-paddleocrvl` | 2419, 2495-2500 |
## Command-Line Options
| Option | Description |
|--------|-------------|
| `--ocr-model ppocr_v5` | Use PP-OCRv5 model (faster, 85% accuracy) |
| `--ocr-model paddleocr_vl` | Use PaddleOCRVL (slower, with timeout protection) |
| `--disable-paddleocrvl` | Skip PaddleOCRVL initialization entirely |
| `--batch` | Run batch testing mode |
| `--batch-size N` | Process N PDFs |
## Expected Behavior
### Before Fix
```
2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed...
[program hangs indefinitely]
```
### After Fix
```
2026-03-03 09:43:56,229 - WARNING - Seal #1: Unwarp OCR failed...
2026-03-03 09:44:56,229 - WARNING - PaddleOCRVL recognition timeout (60s) for ...
[continues to next seal]
```
## Key Features
**60-second timeout** per PaddleOCRVL call
**Automatic cleanup** of hung processes
**Graceful degradation** to other OCR methods
**Windows compatible** (uses spawn mode)
**User control** via --disable-paddleocrvl flag
## Test Results
```
Timeout mechanism: PASSED
Normal completion: PASSED
```
## Troubleshooting
### Issue: Still seeing timeouts
**Solution**: Use `--disable-paddleocrvl` flag or switch to `ppocr_v5` model
### Issue: Processing is too slow
**Solution**: Use `--ocr-model ppocr_v5` for faster processing (85% accuracy)
### Issue: Need to debug timeout
**Solution**: Check logs for "timeout after 60s" messages and examine seal images
## Technical Details
**Implementation**: Multiprocessing with 60s timeout
**Process**: terminate() → wait 5s → kill() if needed
**Result**: Returns empty dict on timeout, allows fallback OCR
**Compatibility**: Windows (spawn), Linux (fork)
## Files
- `test_accuracy_batch_full.py` - Main implementation
- `test_paddleocrvl_timeout.py` - Verification test
- `PADDLEOCRVL_TIMEOUT_FIX_SUMMARY.md` - Detailed documentation