185 lines
3.5 KiB
Markdown
185 lines
3.5 KiB
Markdown
|
|
# GLM-OCR Java服务
|
|||
|
|
|
|||
|
|
基于DJL (Deep Java Library) 的纯Java实现GLM-OCR本地部署服务,提供REST API接口。
|
|||
|
|
|
|||
|
|
## 特性
|
|||
|
|
|
|||
|
|
- ✅ 纯Java实现,无需外部服务依赖(vLLM/Ollama/Python)
|
|||
|
|
- ✅ 支持多种OCR任务:文本识别、公式识别、表格识别、信息提取
|
|||
|
|
- ✅ 基于DJL框架,使用PyTorch引擎
|
|||
|
|
- ✅ 支持CPU和GPU推理
|
|||
|
|
- ✅ 提供完整的REST API和Swagger文档
|
|||
|
|
- ✅ Apache 2.0开源协议
|
|||
|
|
|
|||
|
|
## 技术栈
|
|||
|
|
|
|||
|
|
- Java 17
|
|||
|
|
- Spring Boot 3.2.0
|
|||
|
|
- DJL 0.27.0 (PyTorch引擎)
|
|||
|
|
- PyTorch 2.1.1
|
|||
|
|
- HuggingFace Tokenizers
|
|||
|
|
|
|||
|
|
## 快速开始
|
|||
|
|
|
|||
|
|
### 1. 环境要求
|
|||
|
|
|
|||
|
|
- Java 17+
|
|||
|
|
- Maven 3.6+
|
|||
|
|
|
|||
|
|
### 2. 下载模型
|
|||
|
|
|
|||
|
|
从ModelScope下载GLM-OCR模型:
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
# 使用Git LFS
|
|||
|
|
git lfs install
|
|||
|
|
git clone https://modelscope.cn/ZhipuAI/GLM-OCR.git ./models/GLM-OCR
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
或手动下载并解压到 `./models/GLM-OCR/` 目录。
|
|||
|
|
|
|||
|
|
### 3. 配置
|
|||
|
|
|
|||
|
|
编辑 `src/main/resources/application.yml`:
|
|||
|
|
|
|||
|
|
```yaml
|
|||
|
|
glm-ocr:
|
|||
|
|
model-path: ./models/GLM-OCR
|
|||
|
|
device: cpu # 或 gpu(0) 如有GPU
|
|||
|
|
precision: fp32
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 4. 构建并运行
|
|||
|
|
|
|||
|
|
```bash
|
|||
|
|
mvn clean package -DskipTests
|
|||
|
|
java -jar target/glm-ocr-service-1.0.0.jar
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
服务将在 `http://localhost:9090` 启动。
|
|||
|
|
|
|||
|
|
## API接口
|
|||
|
|
|
|||
|
|
### 健康检查
|
|||
|
|
|
|||
|
|
```http
|
|||
|
|
GET /api/v1/health
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 文本识别
|
|||
|
|
|
|||
|
|
```http
|
|||
|
|
POST /api/v1/ocr/text
|
|||
|
|
Content-Type: application/json
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
"image": "iVBORw0KGgoAAAANSUhEUg...",
|
|||
|
|
"imageType": "base64"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 公式识别
|
|||
|
|
|
|||
|
|
```http
|
|||
|
|
POST /api/v1/ocr/formula
|
|||
|
|
Content-Type: application/json
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
"image": "iVBORw0KGgoAAAANSUhEUg...",
|
|||
|
|
"imageType": "base64"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 表格识别
|
|||
|
|
|
|||
|
|
```http
|
|||
|
|
POST /api/v1/ocr/table
|
|||
|
|
Content-Type: application/json
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
"image": "iVBORw0KGgoAAAANSUhEUg...",
|
|||
|
|
"imageType": "base64"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 信息提取
|
|||
|
|
|
|||
|
|
```http
|
|||
|
|
POST /api/v1/ocr/extract
|
|||
|
|
Content-Type: application/json
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
"image": "iVBORw0KGgoAAAANSUhEUg...",
|
|||
|
|
"imageType": "base64",
|
|||
|
|
"extractionTemplate": "{\n \"name\": \"\",\n \"id_number\": \"\"\n}"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 通用识别
|
|||
|
|
|
|||
|
|
```http
|
|||
|
|
POST /api/v1/ocr
|
|||
|
|
Content-Type: application/json
|
|||
|
|
|
|||
|
|
{
|
|||
|
|
"image": "iVBORw0KGgoAAAANSUhEUg...",
|
|||
|
|
"imageType": "base64",
|
|||
|
|
"taskType": "text"
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 重新加载模型
|
|||
|
|
|
|||
|
|
```http
|
|||
|
|
POST /api/v1/model/reload
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## Swagger文档
|
|||
|
|
|
|||
|
|
访问 `http://localhost:9090/swagger-ui.html` 查看完整的API文档和在线测试。
|
|||
|
|
|
|||
|
|
## 配置说明
|
|||
|
|
|
|||
|
|
| 配置项 | 说明 | 默认值 |
|
|||
|
|
|--------|------|--------|
|
|||
|
|
| `glm-ocr.model-path` | 模型本地路径 | `./models/GLM-OCR` |
|
|||
|
|
| `glm-ocr.device` | 推理设备 (cpu/gpu(0)) | `cpu` |
|
|||
|
|
| `glm-ocr.precision` | 精度 (fp32/fp16/bf16/int8) | `fp32` |
|
|||
|
|
| `glm-ocr.max-tokens` | 最大生成token数 | `8192` |
|
|||
|
|
| `glm-ocr.batch-size` | 批次大小 | `1` |
|
|||
|
|
| `glm-ocr.image-size` | 图像预处理大小 | `448` |
|
|||
|
|
| `glm-ocr.temperature` | 温度参数 | `0.1` |
|
|||
|
|
| `glm-ocr.top-p` | Top P采样参数 | `0.95` |
|
|||
|
|
|
|||
|
|
## 项目结构
|
|||
|
|
|
|||
|
|
```
|
|||
|
|
glmocrdemojava/
|
|||
|
|
├── pom.xml
|
|||
|
|
├── api-test.http
|
|||
|
|
├── src/main/
|
|||
|
|
│ ├── java/com/example/glmocr/
|
|||
|
|
│ │ ├── GlmOcrApplication.java
|
|||
|
|
│ │ ├── config/
|
|||
|
|
│ │ ├── controller/
|
|||
|
|
│ │ ├── dto/
|
|||
|
|
│ │ ├── model/
|
|||
|
|
│ │ ├── service/
|
|||
|
|
│ │ └── tokenizer/
|
|||
|
|
│ └── resources/
|
|||
|
|
│ └── application.yml
|
|||
|
|
└── models/
|
|||
|
|
└── GLM-OCR/
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 开源协议
|
|||
|
|
|
|||
|
|
- 本项目:MIT License
|
|||
|
|
- DJL:Apache License 2.0
|
|||
|
|
- GLM-OCR模型:MIT License
|
|||
|
|
|
|||
|
|
## 参考资料
|
|||
|
|
|
|||
|
|
- [GLM-OCR模型](https://modelscope.cn/models/ZhipuAI/GLM-OCR)
|
|||
|
|
- [DJL文档](https://djl.ai/)
|
|||
|
|
- [Spring Boot文档](https://spring.io/projects/spring-boot)
|