501 lines
14 KiB
Markdown
501 lines
14 KiB
Markdown
# CLAUDE.md
|
|
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
|
|
|
---
|
|
|
|
# LawRisk Backend - Claude Code Analysis
|
|
|
|
## Project Overview
|
|
|
|
**LawRisk** is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese).
|
|
|
|
**Python Version Requirement**: Python 3.10+ (uses PEP 604 union types like `str | None`)
|
|
|
|
### Key Features
|
|
- **Semantic Search**: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics
|
|
- **LLM-Powered Matching**: Qwen (qwen-plus-latest) for intelligent subject selection
|
|
- **Two Database Architecture**:
|
|
- `fs_law_risk`: Vector embeddings and subject-permit mappings
|
|
- `licensing_risks`: Structured permit and risk data with regions, themes, and compliance information
|
|
- **RESTful APIs**: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search
|
|
- **CORS Enabled**: Built-in CORS middleware for frontend integration
|
|
|
|
---
|
|
|
|
## Architecture & Project Structure
|
|
|
|
### Core Framework & Libraries
|
|
- **Framework**: Flask (Python web framework)
|
|
- **Database Driver**: pg8000 (PostgreSQL adapter)
|
|
- **Vector Embeddings**: Aliyun DashScope OpenAI-compatible API
|
|
- **LLM**: Qwen via DashScope (qwen-plus-latest)
|
|
- **Dependencies**: Minimal footprint - Flask, pg8000, concurrent.futures
|
|
|
|
### Directory Structure
|
|
```
|
|
市监局-lawRisk-backend/
|
|
├── app.py # Flask application entry point
|
|
├── requirements.txt # Python dependencies
|
|
├── .env # Environment configuration
|
|
├── lawrisk/ # Main application package
|
|
│ ├── __init__.py
|
|
│ ├── api/ # API route handlers
|
|
│ │ ├── v1.py # V1 API (legacy)
|
|
│ │ └── v2.py # V2 API (current)
|
|
│ ├── services/ # Business logic layer
|
|
│ │ ├── lawrisk_service.py # Core search & embeddings
|
|
│ │ ├── lawrisk_v2_service.py # V2 enhanced service
|
|
│ │ └── licensing_repo.py # Data repository
|
|
│ ├── middleware/ # HTTP middleware
|
|
│ │ └── smart_cors_middleware.py
|
|
│ └── utils/ # Utility functions
|
|
│ ├── env_loader.py
|
|
│ ├── export_risk_json.py
|
|
│ └── ingest_lawrisk.py
|
|
├── static/ # Static assets
|
|
│ └── v2_tester.html # Web-based API tester
|
|
├── tests/ # Test suite (planned)
|
|
├── data/ # Data files
|
|
│ ├── risk_tables_export.json
|
|
│ └── licensing_risks_dump.sql
|
|
└── docs/ # Documentation
|
|
├── PRD.md
|
|
├── API.md
|
|
├── V2_API文档.md
|
|
├── AGENTS.md
|
|
├── DB_GUIDE.md
|
|
└── CLAUDE.md
|
|
```
|
|
|
|
---
|
|
|
|
## Quick Reference
|
|
|
|
### Most Common Commands
|
|
```bash
|
|
# Run the application
|
|
python app.py
|
|
|
|
# Export data from database
|
|
python export_risk_json.py
|
|
|
|
# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
|
|
python ingest_lawrisk.py
|
|
|
|
# Format and lint code
|
|
black .
|
|
ruff .
|
|
|
|
# Test locally via browser
|
|
# Open static/v2_tester.html after starting the app
|
|
```
|
|
|
|
### Key Files
|
|
- `app.py` - Flask application entry point
|
|
- `lawrisk/` - Main application package
|
|
- `api/v1.py` - V1 API routes (legacy)
|
|
- `api/v2.py` - V2 API routes (current)
|
|
- `services/lawrisk_service.py` - Core search & embeddings
|
|
- `services/lawrisk_v2_service.py` - V2 enhanced service
|
|
- `services/licensing_repo.py` - Data repository
|
|
- `middleware/smart_cors_middleware.py` - CORS middleware
|
|
- `utils/` - Utility functions
|
|
- `static/v2_tester.html` - Web-based API testing interface
|
|
- `requirements.txt` - Python dependencies
|
|
- `.env` - Environment configuration
|
|
|
|
---
|
|
|
|
## Development Workflow
|
|
|
|
### Initial Setup
|
|
```bash
|
|
# 1. Create virtual environment
|
|
python -m venv .venv
|
|
|
|
# 2. Activate virtual environment (Windows PowerShell)
|
|
.venv\Scripts\activate
|
|
|
|
# 3. Install dependencies
|
|
pip install Flask pg8000 black ruff pytest
|
|
|
|
# 4. Load environment variables
|
|
# Edit .env with your database credentials
|
|
```
|
|
|
|
### Common Commands
|
|
|
|
#### Run the Application
|
|
```bash
|
|
# Development mode
|
|
python app.py
|
|
|
|
# Custom port
|
|
PORT=8000 python app.py
|
|
|
|
# With debug logging
|
|
FLASK_DEBUG=1 python app.py
|
|
```
|
|
|
|
#### Data Management
|
|
```bash
|
|
# Export data from fs_law_risk database to JSON
|
|
python export_risk_json.py
|
|
|
|
# Ingest data with embeddings into database
|
|
python ingest_lawrisk.py
|
|
# Requires DASHSCOPE_API_KEY in .env
|
|
```
|
|
|
|
#### Code Quality
|
|
```bash
|
|
# Format code with Black (100 char line length)
|
|
black .
|
|
|
|
# Lint with Ruff
|
|
ruff .
|
|
|
|
# Run tests (when added)
|
|
pytest -q
|
|
```
|
|
|
|
#### Database Operations
|
|
```bash
|
|
# Connect to PostgreSQL
|
|
psql -h 8.138.196.105 -U postgres -d fs_law_risk
|
|
|
|
# Connect to licensing_risks database
|
|
psql -h 8.138.196.105 -U postgres -d licensing_risks
|
|
```
|
|
|
|
---
|
|
|
|
## API Endpoints
|
|
|
|
### V1 API (Legacy)
|
|
- **Path**: `/fs-ai-asistant/api/workflow/lawrisk`
|
|
- **Methods**: GET, POST
|
|
- **Mode**: `llm` (default) or `embed`
|
|
- **Input**: `query` (user question)
|
|
- **Output**: Simple array of matching subjects with permit IDs
|
|
|
|
### V2 API (Current/Recommended)
|
|
- **Base Path**: `/fs-ai-asistant/api/workflow/lawrisk/v2`
|
|
- **Methods**: GET, POST
|
|
- **Features**:
|
|
- Structured results with regions, themes, permits, and risks
|
|
- Optional region filtering
|
|
- Debug mode with detailed execution info
|
|
- Direct permit matching by name
|
|
|
|
#### V2 Sub-endpoints
|
|
|
|
1. **Search Endpoint**
|
|
- Path: `/fs-ai-asistant/api/workflow/lawrisk/v2`
|
|
- Parameters:
|
|
- `query` (required): User question
|
|
- `region` (optional): Filter by region (市级, 禅城区, etc.)
|
|
- `debug` (optional): Enable debug output (1/true/yes/on)
|
|
- `top` (optional): Number of recommendations (default: 5)
|
|
|
|
2. **Regions List**
|
|
- Path: `/fs-ai-asistant/api/workflow/lawrisk/v2/regions`
|
|
- Method: GET
|
|
- Returns: All available regions for filtering
|
|
|
|
3. **Get Permits**
|
|
- Path: `/fs-ai-asistant/api/workflow/lawrisk/getPermits`
|
|
- Method: GET, POST
|
|
- Input: `region` (region ID or name)
|
|
- Returns: All permits for a specific region
|
|
|
|
### Health Check
|
|
- Path: `/healthz`
|
|
- Method: GET
|
|
- Returns: `{"status": "ok"}`
|
|
|
|
---
|
|
|
|
## Database Schema
|
|
|
|
### Database 1: fs_law_risk
|
|
Used for vector embeddings and semantic search.
|
|
|
|
#### Tables
|
|
- **`law_sub`**: Subject matter with embeddings
|
|
- `id` (TEXT, PK): Subject ID
|
|
- `name` (TEXT): Subject name
|
|
- `vector` (JSONB): Embedding vector
|
|
|
|
- **`law_sub_per`**: Subject-permit mappings
|
|
- `sub_id` (TEXT, PK): Subject ID
|
|
- `per_ids` (JSONB): Array of permit IDs
|
|
|
|
- **`law_per`**: Permit information
|
|
- `id` (TEXT, PK): Permit ID
|
|
- `name` (TEXT): Permit name
|
|
- `risk_ids` (JSONB): Array of risk IDs
|
|
|
|
### Database 2: licensing_risks
|
|
Used for structured compliance data.
|
|
|
|
#### Tables
|
|
- **`regions`**: Administrative areas
|
|
- `id` (PK), `name` (unique)
|
|
|
|
- **`business_scopes`**: Business scope definitions
|
|
- `id` (PK), `description`
|
|
|
|
- **`region_scopes`**: Region-scope mappings
|
|
|
|
- **`themes`**: Legal themes/subjects
|
|
- `id` (PK), `name`
|
|
|
|
- **`region_themes`**: Region-theme mappings
|
|
|
|
- **`permits`**: License/permit items
|
|
- `id` (PK), `name`
|
|
|
|
- **`region_theme_permits`**: Tripartite linkage
|
|
|
|
- **`risks`**: Risk information
|
|
- `id` (PK), `risk_content`, `legal_basis`, `document_no`, `summary`
|
|
|
|
- **`region_permit_risks`**: Risk associations
|
|
|
|
---
|
|
|
|
## Configuration
|
|
|
|
### Environment Variables (.env)
|
|
|
|
#### DashScope (Embeddings & LLM)
|
|
```
|
|
DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a
|
|
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
|
|
DASHSCOPE_EMBED_MODEL=text-embedding-v4
|
|
DASHSCOPE_EMBED_DIM=1024
|
|
DASHSCOPE_MAX_BATCH=10
|
|
DASHSCOPE_CHAT_MODEL=qwen-plus-latest
|
|
```
|
|
|
|
#### PostgreSQL Configuration
|
|
```
|
|
# fs_law_risk database
|
|
PG_HOST=8.138.196.105
|
|
PG_PORT=5432
|
|
PG_USER=postgres
|
|
PG_PASSWORD=difyai123456
|
|
PG_DATABASE=fs_law_risk
|
|
PG_ADMIN_DB=postgres
|
|
|
|
# licensing_risks database
|
|
LIC_PG_HOST=8.138.196.105
|
|
LIC_PG_PORT=5432
|
|
LIC_PG_USER=postgres
|
|
LIC_PG_PASSWORD=difyai123456
|
|
LIC_PG_DATABASE=licensing_risks
|
|
```
|
|
|
|
#### Application Settings
|
|
```
|
|
FLASK_ENV=development
|
|
|
|
# Search thresholds (tunable)
|
|
LAWRISK_RETURN_IF_GE=0.7
|
|
LAWRISK_FALLBACK_GT=0.5
|
|
```
|
|
|
|
---
|
|
|
|
## Testing
|
|
|
|
### Current Testing Status
|
|
- **No dedicated test suite** in the repository
|
|
- pytest framework is recommended but not configured
|
|
- Manual testing available via `static/v2_tester.html`
|
|
|
|
### Testing Framework & Guidelines
|
|
When adding tests, use:
|
|
- **Framework**: pytest with Flask test client
|
|
- **Focus Areas**:
|
|
- API endpoints (V1 and V2)
|
|
- Middleware behavior (CORS)
|
|
- Database operations
|
|
- LLM selection logic
|
|
- Region filtering
|
|
|
|
**Recommended test structure:**
|
|
```
|
|
tests/
|
|
├── test_api_v1.py # V1 API endpoint tests
|
|
├── test_api_v2.py # V2 API endpoint tests
|
|
├── test_search_service.py # Core search logic
|
|
├── test_licensing_repo.py # Database repository
|
|
└── conftest.py # Shared fixtures
|
|
```
|
|
|
|
### Testing Commands
|
|
```bash
|
|
# Run all tests
|
|
pytest
|
|
|
|
# Run with coverage
|
|
pytest --cov=lawrisk_service
|
|
|
|
# Run specific test file
|
|
pytest tests/test_api_v2.py -v
|
|
```
|
|
|
|
### Manual Testing with V2 Tester
|
|
1. Start the application: `python app.py` (defaults to port 8000)
|
|
2. Open `static/v2_tester.html` in your browser
|
|
3. Test queries like:
|
|
- "我要办一家电影院"
|
|
- "开办旅馆需要哪些许可"
|
|
- "公共场所卫生许可"
|
|
- Query with region filter: "电影院®ion=市级&debug=1"
|
|
|
|
The tester provides a simple UI to experiment with the V2 API and view debug information.
|
|
|
|
---
|
|
|
|
## Coding Standards & Best Practices
|
|
|
|
### Python Style Guidelines
|
|
- **Indentation**: 4 spaces (no tabs)
|
|
- **Encoding**: UTF-8 for all source files
|
|
- **Naming Conventions**:
|
|
- Functions/variables: `snake_case`
|
|
- Constants: `SCREAMING_SNAKE_CASE`
|
|
- Classes: `PascalCase`
|
|
- **Type Hints**: Prefer type hints for all public functions
|
|
- **Code Formatting**: Use `black` with 100-character line length
|
|
- **Linting**: Use `ruff` with default rules
|
|
|
|
### Code Quality Guidelines
|
|
- Keep functions small and side-effect free
|
|
- Prefer pure functions where possible
|
|
- Document complex logic with comments
|
|
- Use type hints from `typing` module
|
|
- Handle errors gracefully with appropriate logging
|
|
|
|
### Documentation Files
|
|
- **PRD.md** - Product Requirements Document (specifies business logic and requirements)
|
|
- **API.md** - API documentation for V1 endpoints
|
|
- **V2_API文档.md** - Detailed API documentation for V2 endpoints
|
|
- **AGENTS.md** - Development guidelines and best practices
|
|
- **DB_GUIDE.md** - Database schema reference and query examples
|
|
|
|
---
|
|
|
|
## Key Components Deep Dive
|
|
|
|
### 1. app.py - Application Entry Point
|
|
- Creates Flask app with CORS enabled
|
|
- Registers all API routes
|
|
- Parameter extraction logic (GET/POST, JSON/form data)
|
|
- Concurrent execution using ThreadPoolExecutor
|
|
- Error handling and logging
|
|
|
|
### 2. lawrisk_service.py - Core Search Logic
|
|
- **EmbeddingClient**: Handles DashScope API integration
|
|
- **ChatClient**: Manages Qwen LLM interactions
|
|
- Database helpers with pg8000
|
|
- Search algorithms:
|
|
- Embedding-based cosine similarity
|
|
- LLM-based subject selection
|
|
- Similarity threshold management
|
|
|
|
### 3. lawrisk_v2_service.py - Enhanced API
|
|
- Structured response formatting
|
|
- Region filtering logic
|
|
- Permit direct matching by name
|
|
- Markdown formatting for legal text
|
|
- Complex query execution pipeline
|
|
|
|
### 4. licensing_repo.py - Data Repository
|
|
- Separate database connection for licensing_risks
|
|
- Query optimization for multi-table joins
|
|
- Legal text formatting helpers
|
|
- Pattern matching for Chinese legal documents
|
|
|
|
### 5. smart_cors_middleware.py - Reusable CORS
|
|
- Wildcard and exact origin matching
|
|
- Subdomain support
|
|
- Preflight OPTIONS handling
|
|
- NGINX integration mode
|
|
- Debug and logging features
|
|
|
|
---
|
|
|
|
## Troubleshooting Guide
|
|
|
|
### Common Issues
|
|
|
|
#### Database Connection Errors
|
|
**Symptom**: `pg8000.dbapi.Error` when starting the app
|
|
**Solutions**:
|
|
- Check `.env` file exists with correct PostgreSQL credentials
|
|
- Verify network connectivity to the database server
|
|
- Ensure PostgreSQL server is running and accessible
|
|
- Check database names: `fs_law_risk` and `licensing_risks`
|
|
|
|
#### Missing Environment Variables
|
|
**Symptom**: Key errors or default values being used
|
|
**Solutions**:
|
|
- Create `.env` file from the template (see Configuration section)
|
|
- Ensure `DASHSCOPE_API_KEY` is set for embedding/chat features
|
|
- Verify all required PG_* and LIC_* environment variables
|
|
|
|
#### LLM/Embedding API Errors
|
|
**Symptom**: API authentication failures or timeout errors
|
|
**Solutions**:
|
|
- Verify `DASHSCOPE_API_KEY` is valid and has sufficient quota
|
|
- Check `DASHSCOPE_BASE_URL` matches the API endpoint
|
|
- Ensure network access to DashScope API servers
|
|
- Review API rate limits and batch sizes
|
|
|
|
#### Empty Search Results
|
|
**Symptom**: API returns empty `risk_subject` array
|
|
**Solutions**:
|
|
- Check if database tables are populated (`fs_law_risk.law_sub`, etc.)
|
|
- Try `debug=1` parameter to see detailed execution info
|
|
- Verify similarity thresholds in `.env` (`LAWRISK_RETURN_IF_GE`, `LAWRISK_FALLBACK_GT`)
|
|
- Test with known queries like "我要办一家电影院"
|
|
|
|
#### Port Already in Use
|
|
**Symptom**: `OSError: [Errno 10048] Only one usage of each socket address`
|
|
**Solutions**:
|
|
- Change port: `PORT=8001 python app.py`
|
|
- Kill existing process using the port: `netstat -ano | findstr :8000` then `taskkill /PID <PID> /F`
|
|
|
|
### Debug Mode
|
|
Enable debug logging to troubleshoot issues:
|
|
```bash
|
|
# Enable Flask debug mode
|
|
FLASK_DEBUG=1 python app.py
|
|
|
|
# Enable CORS debug mode
|
|
CORS_DEBUG=1 python app.py
|
|
|
|
# Check app logs for registered routes and errors
|
|
# Logs are printed to console when starting the app
|
|
```
|
|
|
|
### Health Checks
|
|
- **Basic health**: `GET /healthz` → `{"status": "ok"}`
|
|
- **V2 regions**: `GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions`
|
|
- Check logs for registered routes on app startup
|
|
|
|
### Data Verification
|
|
Verify database content with queries from `DB_GUIDE.md`:
|
|
```sql
|
|
-- Check subject count
|
|
SELECT COUNT(*) FROM fs_law_risk.law_sub;
|
|
|
|
-- Check region-theme pairs
|
|
SELECT COUNT(*) FROM licensing_risks.region_themes;
|
|
```
|