Integrated Workflow: Paddlex Layout Analysis + OCR

CMA Code Extraction (Full-page OCR + Position Filtering)

Method: Full-page OCR with position-based filtering (top-right area priority)

Algorithm: Extract all text → Filter by position → Regex match → Score candidates

Extracted CMA Code

202319017008

Confidence: 99.93%

Raw Text: "202319017008"

Position: (376, 411)

Detection Visualization:

1. Document Layout Detection (Paddlex PP-DocLayout-L)

File: 关于中检测试技术(广东)集团有限公司检验检测资质的调查取证函(局长件)_pages11-14.pdf | Detected Regions: 21

2. Refined Seal Extraction, Unwarping & OCR Recognition

Seal Area #0

Detection Overlay

Unwarped Image

OCR Recognition Result

江西省润华教育装备集团有限公司

Confidence: 92.02%

Seal Area #1

Detection Overlay

Unwarped Image

OCR Recognition Result

中检广东)集务限公司

Confidence: 79.85%

OCR Results Summary (JSON)

[
  {
    "seal_index": 0,
    "text": "江西省润华教育装备集团有限公司",
    "score": 0.9202076196670532,
    "success": true
  },
  {
    "seal_index": 1,
    "text": "中检广东)集务限公司",
    "score": 0.7985407114028931,
    "success": true
  }
]