检验检测报告识别
Go to file
黄仁欢 0d760ee656 fix(ocr): remove multiprocessing to fix Windows Queue synchronization issue
PROBLEM:
- Institution names were successfully extracted by PaddleOCRVL subprocess
- But main process received empty result due to Windows multiprocessing Queue delay
- Result: API returned empty institutions array despite successful OCR extraction

ROOT CAUSE:
- Used multiprocessing.Process with Queue for inter-process communication
- On Windows, Queue has synchronization delay when process.join() returns
- Subprocess put data in Queue, but main process called get_nowait() too early
- Result: Data loss even though subprocess succeeded

SOLUTION:
- Remove multiprocessing entirely
- Direct call to vl_pipeline.predict() in main process
- No Queue synchronization issues
- Simpler code (150 lines → 100 lines)
- Faster execution (no subprocess overhead)

TESTING:
- Tested with 1.pdf: CMA 20211901583 extracted (99.91% confidence)
- Institution extracted: 深圳市中多质量检验认证有限公司 (15 chars)
- Flask API returns populated institutions array
- Java backend successfully saves to database
- End-to-end integration verified

CHANGES:
- test_accuracy_batch_full.py: run_ocr_recognition_vl() function
  - Removed: multiprocessing.Process, Queue, subprocess wrapper
  - Added: Direct call to vl_pipeline.predict()
  - Simplified error handling and result parsing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-05 09:52:45 +08:00
archive chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
data 暂存 2026-02-05 13:57:22 +08:00
report_viz chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
scripts 暂存 2026-02-05 13:57:22 +08:00
src chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
.gitignore chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
CLEANUP_COMPLETE.md docs(cleanup): add cleanup completion report 2026-03-03 14:35:50 +08:00
CLEANUP_PLAN.md docs(test): add comprehensive documentation for batch testing script 2026-03-03 14:32:04 +08:00
IMPLEMENTATION_SUMMARY.md chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
TEST_ACCURACY_BATCH_DEPENDENCIES.md docs(test): add comprehensive documentation for batch testing script 2026-03-03 14:32:04 +08:00
TEST_ACCURACY_BATCH_README.md docs(test): add comprehensive documentation for batch testing script 2026-03-03 14:32:04 +08:00
cma_extraction_final.py feat(cma): add CMA extraction module fallback implementation 2026-03-03 14:51:58 +08:00
cma_extraction_template_primary.py chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
pom.xml chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
settings.xml chore(project): conservative cleanup - archive temp scripts and old docs 2026-03-03 14:35:06 +08:00
test_accuracy_batch_full.py fix(ocr): remove multiprocessing to fix Windows Queue synchronization issue 2026-03-05 09:52:45 +08:00