- Add PaddleOCRVL as optional OCR model for seal text recognition
- New parameter: --ocr-model {ppocr_v5,paddleocr_vl}
- PaddleOCRVL achieves 100% accuracy on test cases (vs 84% for PP-OCRv5)
- Backward compatible: defaults to PP-OCRv5
- Fix CMA recognition regression
- Ensure ocr_engine is always initialized for CMA extraction
- PaddleOCRVL only used for seal text, not CMA recognition
- Add comprehensive integration guide
- PADDLEOCRVL_INTEGRATION.md with usage examples
- test_paddleocr_vl_quick.py for validation
Implementation details:
- run_ocr_recognition_vl(): New function for PaddleOCRVL recognition
- extract_seals_and_institutions(): Enhanced with OCR model selection
- Automatic fallback to PP-OCRv5 if PaddleOCRVL unavailable
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>