report-detect/archive/tools/ocr_bridge_cross_platform.py

51 lines
1.4 KiB
Python
Raw Permalink Normal View History

chore(project): conservative cleanup - archive temp scripts and old docs Major cleanup to improve project organization and maintainability. Changes: - Moved 34 temp/debug/test scripts to archive/temp_scripts/ - Moved 9 auxiliary tools to archive/tools/ - Moved 3 CRT test scripts to archive/crt_tests/ - Moved 4 OCR test scripts to archive/ocr_tests/ - Moved 14 old documentation files to archive/docs/ - Deleted 4 useless files (duplicates, temp files) Root directory: - Before: 67 files (cluttered) - After: 10 core files (clean and organized) Core files retained: - test_accuracy_batch_full.py (main script) - cma_extraction_template_primary.py (CMA extraction) - cma_extraction_final.py (backup CMA extraction) - CLAUDE.md (project guide) - TEST_ACCURACY_BATCH_README.md (usage guide) - TEST_ACCURACY_BATCH_DEPENDENCIES.md (dependency docs) - CLEANUP_PLAN.md (cleanup plan) - CLEANUP_SUMMARY.md (this file) - IMPLEMENTATION_SUMMARY.md (implementation summary) - requirements.txt (dependencies) Archive structure: archive/ ├── temp_scripts/ (34 files: test_, debug_, analyze_, etc.) ├── tools/ (9 files: find_, show_, visualize_, etc.) ├── crt_tests/ (3 files: CRT extraction tests) ├── ocr_tests/ (4 files: OCR timeout tests) └── docs/ (14 files: old reports and guides) Benefits: ✓ Cleaner root directory - easier navigation ✓ Better organization - clear separation of concerns ✓ Preserved history - all files archived, not deleted ✓ Improved maintainability - easier to find active files ✓ Better git history - removed 198 deleted files from tracking No functional changes - all core functionality preserved. Related: - TEST_ACCURACY_BATCH_DEPENDENCIES.md - dependency analysis - CLEANUP_PLAN.md - detailed cleanup plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 14:35:06 +08:00
#!/usr/bin/env python3
"""
OCR桥接脚本 - 跨平台版本
用于Java ProcessBuilder调用
"""
import sys
import os
import json
# 添加项目根目录到路径
project_root = os.path.dirname(os.path.abspath(__file__))
sys.path.insert(0, project_root)
sys.path.insert(0, os.path.join(project_root, 'python_api'))
from pdf_processor import process_pdf_standalone
def main():
if len(sys.argv) < 3:
print(json.dumps({"success": False, "error": "Usage: ocr_bridge_cross_platform.py <pdf_path> <output_dir>"}, ensure_ascii=False))
sys.exit(1)
pdf_path = sys.argv[1]
output_dir = sys.argv[2] if len(sys.argv) > 2 else "output"
try:
result = process_pdf_standalone(pdf_path, output_dir, ocr_model='paddleocr_vl')
if result.get('success'):
print(json.dumps({
"success": True,
"cma_code": result.get('cma_code', ''),
"institution_name": result.get('institution_name', ''),
"confidence": result.get('confidence', 0.0)
}, ensure_ascii=False))
else:
print(json.dumps({
"success": False,
"error": result.get('error', 'Unknown error')
}, ensure_ascii=False))
sys.exit(1)
except Exception as e:
print(json.dumps({
"success": False,
"error": str(e)
}, ensure_ascii=False))
sys.exit(1)
if __name__ == '__main__':
main()