PROBLEM: - Institution names were successfully extracted by PaddleOCRVL subprocess - But main process received empty result due to Windows multiprocessing Queue delay - Result: API returned empty institutions array despite successful OCR extraction ROOT CAUSE: - Used multiprocessing.Process with Queue for inter-process communication - On Windows, Queue has synchronization delay when process.join() returns - Subprocess put data in Queue, but main process called get_nowait() too early - Result: Data loss even though subprocess succeeded SOLUTION: - Remove multiprocessing entirely - Direct call to vl_pipeline.predict() in main process - No Queue synchronization issues - Simpler code (150 lines → 100 lines) - Faster execution (no subprocess overhead) TESTING: - Tested with 1.pdf: CMA 20211901583 extracted (99.91% confidence) - Institution extracted: 深圳市中多质量检验认证有限公司 (15 chars) - Flask API returns populated institutions array - Java backend successfully saves to database - End-to-end integration verified CHANGES: - test_accuracy_batch_full.py: run_ocr_recognition_vl() function - Removed: multiprocessing.Process, Queue, subprocess wrapper - Added: Direct call to vl_pipeline.predict() - Simplified error handling and result parsing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> |
||
|---|---|---|
| archive | ||
| data | ||
| report_viz | ||
| scripts | ||
| src | ||
| .gitignore | ||
| CLEANUP_COMPLETE.md | ||
| CLEANUP_PLAN.md | ||
| IMPLEMENTATION_SUMMARY.md | ||
| TEST_ACCURACY_BATCH_DEPENDENCIES.md | ||
| TEST_ACCURACY_BATCH_README.md | ||
| cma_extraction_final.py | ||
| cma_extraction_template_primary.py | ||
| pom.xml | ||
| settings.xml | ||
| test_accuracy_batch_full.py | ||