report-detect

Commit Graph

Author	SHA1	Message	Date
黄仁欢	5baf0ac18e	fix(cma): implement robust CMA code extraction with fallback mechanism Add comprehensive CMA code extraction module with template matching primary method and full-page OCR fallback to handle various PDF formats. Key improvements: - Add cma_extraction_template_primary.py module - Support 11-12 digit CMA codes (prioritize 12-digit matches) - Implement template matching + ROI OCR as primary method - Add full-page OCR fallback when template matching fails - Fix critical bug where low template match confidence prevented fallback - Improve scoring algorithm considering position, confidence, and format Fixed issues: - YDQ23_001838.pdf: Extracts 210020349096 (12-digit code) - WTS2025-21283.pdf: Extracts 220020349627 (12-digit code) - Both PDFs now use fullpage_fallback successfully Technical details: - Template match threshold: 0.4 confidence - ROI calculation: extends rightward from logo center - Fallback triggers on: template load failure, match failure, or low confidence - Scoring weights: confidence100 + starts_with_250 + top_right*30 Co-Authored-By: Claude Code <noreply@anthropic.com>	2026-02-16 14:16:34 +08:00

Author

SHA1

Message

Date

黄仁欢

5baf0ac18e

fix(cma): implement robust CMA code extraction with fallback mechanism

Add comprehensive CMA code extraction module with template matching
primary method and full-page OCR fallback to handle various PDF formats.

Key improvements:
- Add cma_extraction_template_primary.py module
- Support 11-12 digit CMA codes (prioritize 12-digit matches)
- Implement template matching + ROI OCR as primary method
- Add full-page OCR fallback when template matching fails
- Fix critical bug where low template match confidence prevented fallback
- Improve scoring algorithm considering position, confidence, and format

Fixed issues:
- YDQ23_001838.pdf: Extracts 210020349096 (12-digit code)
- WTS2025-21283.pdf: Extracts 220020349627 (12-digit code)
- Both PDFs now use fullpage_fallback successfully

Technical details:
- Template match threshold: 0.4 confidence
- ROI calculation: extends rightward from logo center
- Fallback triggers on: template load failure, match failure, or low confidence
- Scoring weights: confidence*100 + starts_with_2*50 + top_right*30

Co-Authored-By: Claude Code <noreply@anthropic.com>

2026-02-16 14:16:34 +08:00

1 Commits