report-detect/archive/temp_scripts/analyze_logo_position.py

75 lines
2.2 KiB
Python
Raw Permalink Normal View History

chore(project): conservative cleanup - archive temp scripts and old docs Major cleanup to improve project organization and maintainability. Changes: - Moved 34 temp/debug/test scripts to archive/temp_scripts/ - Moved 9 auxiliary tools to archive/tools/ - Moved 3 CRT test scripts to archive/crt_tests/ - Moved 4 OCR test scripts to archive/ocr_tests/ - Moved 14 old documentation files to archive/docs/ - Deleted 4 useless files (duplicates, temp files) Root directory: - Before: 67 files (cluttered) - After: 10 core files (clean and organized) Core files retained: - test_accuracy_batch_full.py (main script) - cma_extraction_template_primary.py (CMA extraction) - cma_extraction_final.py (backup CMA extraction) - CLAUDE.md (project guide) - TEST_ACCURACY_BATCH_README.md (usage guide) - TEST_ACCURACY_BATCH_DEPENDENCIES.md (dependency docs) - CLEANUP_PLAN.md (cleanup plan) - CLEANUP_SUMMARY.md (this file) - IMPLEMENTATION_SUMMARY.md (implementation summary) - requirements.txt (dependencies) Archive structure: archive/ ├── temp_scripts/ (34 files: test_, debug_, analyze_, etc.) ├── tools/ (9 files: find_, show_, visualize_, etc.) ├── crt_tests/ (3 files: CRT extraction tests) ├── ocr_tests/ (4 files: OCR timeout tests) └── docs/ (14 files: old reports and guides) Benefits: ✓ Cleaner root directory - easier navigation ✓ Better organization - clear separation of concerns ✓ Preserved history - all files archived, not deleted ✓ Improved maintainability - easier to find active files ✓ Better git history - removed 198 deleted files from tracking No functional changes - all core functionality preserved. Related: - TEST_ACCURACY_BATCH_DEPENDENCIES.md - dependency analysis - CLEANUP_PLAN.md - detailed cleanup plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 14:35:06 +08:00
"""
Analyze the CMA logo position and ROI for YDQ23_001838.pdf
"""
import cv2
import numpy as np
from pathlib import Path
pdf_name = "YDQ23_001838.pdf"
page_img_path = Path(f"test_reports_full/{pdf_name}/doc_page.png")
# Load page image
page_img = cv2.imread(str(page_img_path))
h, w = page_img.shape[:2]
print(f"Page size: {w}x{h}")
print()
# Template matching result from debug output
max_loc = (2066, 2971) # From template matching
template_size = (113, 177) # Template size
# Calculate logo center
logo_center_x = max_loc[0] + template_size[1] // 2
logo_center_y = max_loc[1] + template_size[0] // 2
print(f"CMA Logo position:")
print(f" Match location (top-left): {max_loc}")
print(f" Logo center: ({logo_center_x}, {logo_center_y})")
print(f" Template size: {template_size}")
print()
# Calculate ROI (right side of logo)
template_h, template_w = template_size
x = logo_center_x
y = logo_center_y
roi_x1 = max(0, x)
roi_y1 = max(0, y - template_h // 2)
roi_x2 = min(w, x + min(600, w - x))
roi_y2 = min(h, y + template_h // 2 + template_h)
print(f"Current ROI (right side of logo):")
print(f" ROI: ({roi_x1}, {roi_y1}) -> ({roi_x2}, {roi_y2})")
print(f" Size: {roi_x2 - roi_x1}x{roi_y2 - roi_y1}")
print()
# Visualize
viz = page_img.copy()
cv2.rectangle(viz, (roi_x1, roi_y1), (roi_x2, roi_y2), (0, 255, 0), 3)
cv2.circle(viz, (logo_center_x, logo_center_y), 10, (255, 0, 0), -1)
# Save visualization
output_path = Path("test_reports_full") / pdf_name / "roi_analysis.png"
cv2.imwrite(str(output_path), viz)
print(f"Visualization saved to: {output_path}")
print()
# Analysis
print("ANALYSIS:")
print("=" * 80)
print(f"Logo is at the BOTTOM of the page (y={logo_center_y}, page height={h})")
print(f"Logo center Y position: {logo_center_y / h * 100:.1f}% from top")
print()
if logo_center_y > h * 0.8:
print("⚠️ WARNING: Logo is in the BOTTOM 20% of the page!")
print(" This might not be the main CMA logo.")
print(" The real CMA logo might be at the TOP of the page.")
print()
print("Possible issues:")
print(" 1. Template matching found the WRONG logo (e.g., footer logo)")
print(" 2. ROI is in the wrong place")
print(" 3. The real CMA code (210020349096) is elsewhere on the page")