report-detect/archive/temp_scripts/quick_crt_test.py

82 lines
2.5 KiB
Python
Raw Normal View History

chore(project): conservative cleanup - archive temp scripts and old docs Major cleanup to improve project organization and maintainability. Changes: - Moved 34 temp/debug/test scripts to archive/temp_scripts/ - Moved 9 auxiliary tools to archive/tools/ - Moved 3 CRT test scripts to archive/crt_tests/ - Moved 4 OCR test scripts to archive/ocr_tests/ - Moved 14 old documentation files to archive/docs/ - Deleted 4 useless files (duplicates, temp files) Root directory: - Before: 67 files (cluttered) - After: 10 core files (clean and organized) Core files retained: - test_accuracy_batch_full.py (main script) - cma_extraction_template_primary.py (CMA extraction) - cma_extraction_final.py (backup CMA extraction) - CLAUDE.md (project guide) - TEST_ACCURACY_BATCH_README.md (usage guide) - TEST_ACCURACY_BATCH_DEPENDENCIES.md (dependency docs) - CLEANUP_PLAN.md (cleanup plan) - CLEANUP_SUMMARY.md (this file) - IMPLEMENTATION_SUMMARY.md (implementation summary) - requirements.txt (dependencies) Archive structure: archive/ ├── temp_scripts/ (34 files: test_, debug_, analyze_, etc.) ├── tools/ (9 files: find_, show_, visualize_, etc.) ├── crt_tests/ (3 files: CRT extraction tests) ├── ocr_tests/ (4 files: OCR timeout tests) └── docs/ (14 files: old reports and guides) Benefits: ✓ Cleaner root directory - easier navigation ✓ Better organization - clear separation of concerns ✓ Preserved history - all files archived, not deleted ✓ Improved maintainability - easier to find active files ✓ Better git history - removed 198 deleted files from tracking No functional changes - all core functionality preserved. Related: - TEST_ACCURACY_BATCH_DEPENDENCIES.md - dependency analysis - CLEANUP_PLAN.md - detailed cleanup plan Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 14:35:06 +08:00
"""
快速CRT提取测试 - 只测试一个PDF
"""
import pikepdf
from cryptography.hazmat.primitives.serialization.pkcs7 import load_der_pkcs7_certificates
from cryptography.x509.oid import NameOID
pdf_path = "src/test/resources/data/pdfs/YDQ25_002294.pdf"
print(f"Testing CRT extraction for: {pdf_path}")
try:
pdf = pikepdf.Pdf.open(pdf_path)
acroform = pdf.Root.get("/AcroForm")
if not acroform:
print("ERROR: No /AcroForm found")
exit(1)
fields = acroform.get("/Fields", [])
print(f"Found {len(fields)} fields")
signatures = []
for idx, field in enumerate(fields):
field_obj = field
if field_obj.get("/FT") != "/Sig":
continue
sig_dict = field_obj.get("/V")
if not sig_dict:
continue
contents_obj = sig_dict.get("/Contents")
if contents_obj is None:
continue
contents = bytes(contents_obj)
print(f"\nSignature #{len(signatures)}:")
print(f" Size: {len(contents)} bytes")
# Try PKCS#7 parsing
try:
certs = load_der_pkcs7_certificates(contents)
print(f" PKCS#7 parsing: SUCCESS ({len(certs)} certificates)")
for cert_idx, cert in enumerate(certs):
print(f" Certificate #{cert_idx}:")
print(f" Subject: {cert.subject}")
# Try to get organization name
for oid in [NameOID.COMMON_NAME, NameOID.ORGANIZATION_NAME]:
val = cert.subject.get_attributes_for_oid(oid)
if val:
print(f" {oid._name}: {val[0].value}")
except Exception as e:
print(f" PKCS#7 parsing: FAILED ({e})")
# Try binary search fallback
known_institutions = [
"广东产品质量监督检验研究院",
"广东产品质量监督检验",
]
for inst in known_institutions:
encoded = inst.encode('utf-8')
if encoded in contents:
print(f" Binary search: FOUND '{inst}'")
print(f" Position: {contents.find(encoded)}")
break
signatures.append(contents)
if len(signatures) >= 3: # Only test first 3 signatures
break
print(f"\nTotal signatures tested: {len(signatures)}")
except Exception as e:
print(f"ERROR: {e}")
import traceback
traceback.print_exc()