NEW FILES - Python-First Architecture Support:
1. FlaskOCRClient.java (HTTP Client):
- REST client for communicating with Python Flask API
- POST /api/ocr/pdf - PDF processing endpoint
- Configurable baseUrl and timeout
- Error handling and response parsing
- Methods: processPdf(), processImage(), healthCheck()
2. FlaskOCRResponse.java (Response DTO):
- Data transfer object for Flask API responses
- Fields: success, cma, institutions, seals, error
- JSON serialization support
3. FlaskOCRVerboseResponse.java (Verbose Response DTO):
- Extended response with detailed processing steps
- Includes timing metrics for each processing stage
- Used for debugging and performance analysis
4. OCRResultMessage.java (Message Entity):
- Message format for OCR results
- Used in async processing (if needed)
5. OCRTaskMessage.java (Task Message):
- Message format for OCR task requests
- Used in async processing (if needed)
USAGE:
These components are used by OcrService to communicate with
the Python Flask API server running on localhost:8081.
Example:
```java
FlaskOCRClient client = new FlaskOCRClient("http://localhost:8081");
FlaskOCRResponse response = client.processPdf(pdfPath, outputDir);
String cmaCode = response.getCma().getCode();
List<String> institutions = response.getInstitutions();
```
ARCHITECTURE:
Java Backend → FlaskOCRClient → HTTP → Flask API → PaddleOCR
DEPENDENCIES:
- Spring RestTemplate (for HTTP calls)
- Jackson (for JSON serialization)
- No additional OCR libraries required in Java
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
||
|---|---|---|
| archive | ||
| data | ||
| report_viz | ||
| scripts | ||
| src | ||
| template | ||
| .gitignore | ||
| CLEANUP_COMPLETE.md | ||
| CLEANUP_PLAN.md | ||
| IMPLEMENTATION_SUMMARY.md | ||
| TEST_ACCURACY_BATCH_DEPENDENCIES.md | ||
| TEST_ACCURACY_BATCH_README.md | ||
| cma_extraction_final.py | ||
| cma_extraction_template_primary.py | ||
| pom.xml | ||
| settings.xml | ||
| test_accuracy_batch_full.py | ||