440 lines
12 KiB
Markdown
440 lines
12 KiB
Markdown
|
|
# CLAUDE.md
|
||
|
|
|
||
|
|
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
# LawRisk Backend - Development Guide
|
||
|
|
|
||
|
|
## Project Overview
|
||
|
|
|
||
|
|
**LawRisk** is a Flask-based Python backend service for intelligent legal compliance risk retrieval. It helps users find permits, licenses, and legal risks based on natural language queries using vector embeddings and LLM matching.
|
||
|
|
|
||
|
|
### Tech Stack
|
||
|
|
- **Framework**: Flask 2.3+
|
||
|
|
- **Database**: PostgreSQL (pg8000 driver)
|
||
|
|
- **AI Services**: 阿里云DashScope (text-embedding-v4, qwen-plus-latest)
|
||
|
|
- **Development**: Black, Ruff, Pytest
|
||
|
|
|
||
|
|
### Two-Database Architecture
|
||
|
|
1. **fs_law_risk**: Vector embeddings and subject-permit mappings
|
||
|
|
2. **licensing_risks**: Structured permit and risk data (regions, themes, compliance)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Quick Reference
|
||
|
|
|
||
|
|
### Most Common Commands
|
||
|
|
```bash
|
||
|
|
# Run the application (port 8000)
|
||
|
|
python app.py
|
||
|
|
|
||
|
|
# Install dependencies
|
||
|
|
pip install -r requirements.txt
|
||
|
|
|
||
|
|
# Format and lint code
|
||
|
|
black .
|
||
|
|
ruff .
|
||
|
|
|
||
|
|
# Run tests
|
||
|
|
pytest
|
||
|
|
pytest --cov=lawrisk
|
||
|
|
|
||
|
|
# Test API via curl
|
||
|
|
curl http://localhost:8000/healthz
|
||
|
|
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
|
||
|
|
-d "query=我要办一家电影院&debug=1"
|
||
|
|
```
|
||
|
|
|
||
|
|
### Key File Locations
|
||
|
|
- `app.py` - Flask application entry point
|
||
|
|
- `lawrisk/api/v2.py` - V2 API routes (current)
|
||
|
|
- `lawrisk/services/lawrisk_v2_service.py` - Enhanced V2 search logic
|
||
|
|
- `lawrisk/services/licensing_repo.py` - Database operations
|
||
|
|
- `lawrisk/api/auth.py` - Authentication endpoints
|
||
|
|
- `static/v2_tester.html` - Web-based API testing interface
|
||
|
|
- `tests/test_auth.py` - Auth system tests
|
||
|
|
- `tests/test_checkpoint_security.py` - Checkpoint system tests
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Architecture & Code Structure
|
||
|
|
|
||
|
|
### Request Flow
|
||
|
|
```
|
||
|
|
HTTP Request
|
||
|
|
→ lawrisk/api/ (routing layer)
|
||
|
|
→ lawrisk/services/ (business logic)
|
||
|
|
→ lawrisk/services/licensing_repo.py (database access)
|
||
|
|
→ DashScope API (embeddings & LLM)
|
||
|
|
```
|
||
|
|
|
||
|
|
### Core Modules
|
||
|
|
|
||
|
|
**1. API Layer (`lawrisk/api/`)**
|
||
|
|
- `v1.py` - Legacy API (deprecated)
|
||
|
|
- `v2.py` - Current API with structured responses + admin endpoints
|
||
|
|
- `auth.py` - Authentication (login/logout/me endpoints)
|
||
|
|
|
||
|
|
**2. Services Layer (`lawrisk/services/`)**
|
||
|
|
- `lawrisk_service.py` - Core search with embeddings (cosine similarity) + LLM matching
|
||
|
|
- `lawrisk_v2_service.py` - Enhanced V2 with structured results, region filtering, direct permit matching
|
||
|
|
- `licensing_repo.py` - PostgreSQL operations (both databases), checkpoint management
|
||
|
|
- `auth_service.py` - User authentication, password hashing, seed admin creation
|
||
|
|
|
||
|
|
**3. Middleware & Utils**
|
||
|
|
- `middleware/smart_cors_middleware.py` - Configurable CORS (wildcard, subdomains, NGINX mode)
|
||
|
|
- `utils/env_loader.py` - Environment variable loading
|
||
|
|
- `utils/export_risk_json.py` - Database export utility
|
||
|
|
- `utils/ingest_lawrisk.py` - Data ingestion with embeddings
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## API Endpoints
|
||
|
|
|
||
|
|
### Public Endpoints
|
||
|
|
|
||
|
|
#### V2 Search (Current)
|
||
|
|
- **Path**: `/fs-ai-asistant/api/workflow/lawrisk/v2`
|
||
|
|
- **Method**: POST (recommended), GET
|
||
|
|
- **Params**:
|
||
|
|
- `query` (required): User question
|
||
|
|
- `region` (optional): Filter by region (市级, 禅城区, etc.)
|
||
|
|
- `debug` (optional): Enable debug output (1/true/yes/on)
|
||
|
|
- `top` (optional): Number of recommendations (default: 5)
|
||
|
|
- **Returns**: Structured results with regions, themes, permits, risks
|
||
|
|
|
||
|
|
#### Supporting Endpoints
|
||
|
|
- `GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions` - List all regions
|
||
|
|
- `GET /fs-ai-asistant/api/workflow/lawrisk/getPermits` - Get permits by region
|
||
|
|
- `GET /healthz` - Health check
|
||
|
|
|
||
|
|
### Authentication Endpoints
|
||
|
|
- `GET /fs-ai-asistant/lawrisk/login` - Login page (HTML)
|
||
|
|
- `POST /auth/login` - Authenticate user
|
||
|
|
- `GET /auth/me` - Get current user
|
||
|
|
- `GET /auth/logout` - Logout
|
||
|
|
|
||
|
|
### Admin Endpoints (Protected)
|
||
|
|
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/test` - Admin test
|
||
|
|
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/regions` - Region management
|
||
|
|
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/themes` - Theme management
|
||
|
|
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/permits` - Permit management
|
||
|
|
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/checkpoints` - Checkpoint management (create/list/restore/delete)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Development Workflow
|
||
|
|
|
||
|
|
### Environment Setup
|
||
|
|
```bash
|
||
|
|
# Windows PowerShell
|
||
|
|
python -m venv .venv
|
||
|
|
.venv\Scripts\activate
|
||
|
|
|
||
|
|
# Install dependencies
|
||
|
|
pip install -r requirements.txt
|
||
|
|
|
||
|
|
# Configure .env with database credentials and DashScope API key
|
||
|
|
```
|
||
|
|
|
||
|
|
### Testing
|
||
|
|
```bash
|
||
|
|
# Run all tests
|
||
|
|
pytest
|
||
|
|
|
||
|
|
# Run with coverage
|
||
|
|
pytest --cov=lawrisk --cov-report=html
|
||
|
|
|
||
|
|
# Run specific test file
|
||
|
|
pytest tests/test_auth.py -v
|
||
|
|
|
||
|
|
# Test authentication
|
||
|
|
pytest tests/test_auth.py::test_login_success -v
|
||
|
|
```
|
||
|
|
|
||
|
|
**Test Files**:
|
||
|
|
- `tests/test_auth.py` - Authentication system tests (login, logout, session management)
|
||
|
|
- `tests/test_checkpoint_security.py` - Database checkpoint security tests
|
||
|
|
|
||
|
|
### Code Quality
|
||
|
|
```bash
|
||
|
|
# Format code
|
||
|
|
black .
|
||
|
|
|
||
|
|
# Lint with Ruff
|
||
|
|
ruff .
|
||
|
|
|
||
|
|
# Check specific file
|
||
|
|
black lawrisk/services/lawrisk_v2_service.py
|
||
|
|
ruff lawrisk/services/lawrisk_v2_service.py
|
||
|
|
```
|
||
|
|
|
||
|
|
### Data Management
|
||
|
|
```bash
|
||
|
|
# Export data from fs_law_risk database
|
||
|
|
python lawrisk/utils/export_risk_json.py
|
||
|
|
# Output: data/risk_tables_export.json
|
||
|
|
|
||
|
|
# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
|
||
|
|
python lawrisk/utils/ingest_lawrisk.py
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Configuration
|
||
|
|
|
||
|
|
### Required Environment Variables (.env)
|
||
|
|
|
||
|
|
#### DashScope AI Services
|
||
|
|
```
|
||
|
|
DASHSCOPE_API_KEY=your_api_key
|
||
|
|
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
|
||
|
|
DASHSCOPE_EMBED_MODEL=text-embedding-v4
|
||
|
|
DASHSCOPE_EMBED_DIM=1024
|
||
|
|
DASHSCOPE_MAX_BATCH=10
|
||
|
|
DASHSCOPE_CHAT_MODEL=qwen-plus-latest
|
||
|
|
```
|
||
|
|
|
||
|
|
#### PostgreSQL Databases
|
||
|
|
```
|
||
|
|
# fs_law_risk (embeddings database)
|
||
|
|
PG_HOST=your_host
|
||
|
|
PG_PORT=5432
|
||
|
|
PG_USER=postgres
|
||
|
|
PG_PASSWORD=your_password
|
||
|
|
PG_DATABASE=fs_law_risk
|
||
|
|
PG_ADMIN_DB=postgres
|
||
|
|
|
||
|
|
# licensing_risks (structured data)
|
||
|
|
LIC_PG_HOST=your_host
|
||
|
|
LIC_PG_PORT=5432
|
||
|
|
LIC_PG_USER=postgres
|
||
|
|
LIC_PG_PASSWORD=your_password
|
||
|
|
LIC_PG_DATABASE=licensing_risks
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Authentication
|
||
|
|
```
|
||
|
|
FLASK_SECRET_KEY=your-secret-key
|
||
|
|
LAWRISK_ADMIN_USERNAME=admin
|
||
|
|
LAWRISK_ADMIN_PASSWORD=adminpassword123
|
||
|
|
# Optional: LAWRISK_ADMIN_ROLE, LAWRISK_ADMIN_GRADE, LAWRISK_ADMIN_DISPLAY_NAME
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Search Thresholds
|
||
|
|
```
|
||
|
|
LAWRISK_RETURN_IF_GE=0.7 # Return results if similarity >= 0.7
|
||
|
|
LAWRISK_FALLBACK_GT=0.5 # Use fallback if similarity > 0.5
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Database Schema
|
||
|
|
|
||
|
|
### fs_law_risk (Vector Embeddings)
|
||
|
|
- **`law_sub`** - Subject matter with embeddings (id, name, vector)
|
||
|
|
- **`law_sub_per`** - Subject-permit mappings (sub_id, per_ids)
|
||
|
|
- **`law_per`** - Permit information (id, name, risk_ids)
|
||
|
|
|
||
|
|
### licensing_risks (Structured Compliance)
|
||
|
|
- **`regions`** - Administrative areas
|
||
|
|
- **`themes`** - Legal themes/subjects
|
||
|
|
- **`permits`** - License/permit items
|
||
|
|
- **`risks`** - Risk information (content, legal_basis, document_no, summary)
|
||
|
|
- **`business_scopes`** - Business scope definitions
|
||
|
|
- **Junction tables**: region_themes, region_theme_permits, region_permit_risks
|
||
|
|
|
||
|
|
### Checkpoint System
|
||
|
|
Licensing_repo.py implements database checkpoint management:
|
||
|
|
- `create_checkpoint()` - Create database backup
|
||
|
|
- `list_checkpoints()` - List available backups
|
||
|
|
- `restore_checkpoint()` - **DANGEROUS** - Restore from checkpoint
|
||
|
|
- `delete_checkpoint()` - Remove old checkpoints
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Security Guidelines
|
||
|
|
|
||
|
|
### Critical Security Notes
|
||
|
|
- **NEVER commit secrets** - All credentials in `.env` or environment variables
|
||
|
|
- **Protect admin endpoints** - `/admin/*` should be restricted in production
|
||
|
|
- **Checkpoint restore is dangerous** - Database operation with confirmation flow
|
||
|
|
- **API keys externalized** - `DASHSCOPE_API_KEY` and database passwords must be in `.env`
|
||
|
|
|
||
|
|
### Authentication System
|
||
|
|
- Session-based auth using Flask sessions
|
||
|
|
- Password hashing with `passlib`
|
||
|
|
- First admin auto-created from environment variables on startup
|
||
|
|
- Role-based access (admin, reviewer, analyst, etc.)
|
||
|
|
- Login page: `/fs-ai-asistant/lawrisk/login`
|
||
|
|
- Protected endpoints use `@login_required` decorator
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Recent Features (from git log)
|
||
|
|
|
||
|
|
### Checkpoint System (Recent)
|
||
|
|
- Database backup/restore functionality
|
||
|
|
- Timeline view of checkpoints
|
||
|
|
- Progress indicators for restore operations
|
||
|
|
- Security tests in `test_checkpoint_security.py`
|
||
|
|
|
||
|
|
### Permit Risk Snapshot
|
||
|
|
- Workflow for permit risk snapshots
|
||
|
|
- Unified snapshot and checkpoint timeline
|
||
|
|
- Enhanced batch display for snapshots
|
||
|
|
|
||
|
|
### Licensing Import Enhancement
|
||
|
|
- Optimized district/region merging during import
|
||
|
|
- Enhanced source display for permits
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Testing Guidelines
|
||
|
|
|
||
|
|
### Test Structure
|
||
|
|
```
|
||
|
|
tests/
|
||
|
|
├── __init__.py
|
||
|
|
├── test_auth.py # Auth system tests (login, session, decorators)
|
||
|
|
└── test_checkpoint_security.py # Checkpoint security tests
|
||
|
|
```
|
||
|
|
|
||
|
|
### Running Tests
|
||
|
|
```bash
|
||
|
|
# All tests
|
||
|
|
pytest
|
||
|
|
|
||
|
|
# Verbose output
|
||
|
|
pytest -v
|
||
|
|
|
||
|
|
# Coverage report
|
||
|
|
pytest --cov=lawrisk --cov-report=term-missing
|
||
|
|
|
||
|
|
# Specific test
|
||
|
|
pytest tests/test_auth.py::test_login_success -v
|
||
|
|
```
|
||
|
|
|
||
|
|
### Manual Testing
|
||
|
|
1. Start app: `python app.py`
|
||
|
|
2. Open browser: `static/v2_tester.html`
|
||
|
|
3. Test queries:
|
||
|
|
- "我要办一家电影院"
|
||
|
|
- "开办旅馆需要哪些许可"
|
||
|
|
- With region filter and debug mode
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Troubleshooting
|
||
|
|
|
||
|
|
### Common Issues
|
||
|
|
|
||
|
|
#### Database Connection
|
||
|
|
```bash
|
||
|
|
# Verify database is accessible
|
||
|
|
psql -h $PG_HOST -U $PG_USER -d $PG_DATABASE
|
||
|
|
|
||
|
|
# Check tables exist
|
||
|
|
SELECT COUNT(*) FROM fs_law_risk.law_sub;
|
||
|
|
SELECT COUNT(*) FROM licensing_risks.regions;
|
||
|
|
```
|
||
|
|
|
||
|
|
#### API Errors
|
||
|
|
```bash
|
||
|
|
# Test health check
|
||
|
|
curl http://localhost:8000/healthz
|
||
|
|
|
||
|
|
# Test V2 API with debug
|
||
|
|
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
|
||
|
|
-d "query=电影院&debug=1"
|
||
|
|
|
||
|
|
# Check app logs for registered routes
|
||
|
|
python app.py 2>&1 | grep "Registered routes"
|
||
|
|
```
|
||
|
|
|
||
|
|
#### Missing Embeddings
|
||
|
|
```bash
|
||
|
|
# Check if embeddings exist
|
||
|
|
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;
|
||
|
|
|
||
|
|
# If empty, run ingestion
|
||
|
|
python lawrisk/utils/ingest_lawrisk.py
|
||
|
|
```
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Documentation Files
|
||
|
|
|
||
|
|
- **README.md** - Project overview and quick start
|
||
|
|
- **AGENTS.md** - Development guidelines, coding style, testing approach
|
||
|
|
- **docs/V2_API文档.md** - Detailed V2 API documentation
|
||
|
|
- **docs/API.md** - V1 API documentation (legacy)
|
||
|
|
- **docs/DB_GUIDE.md** - Database schema and query examples
|
||
|
|
- **docs/PRD.md** - Product requirements
|
||
|
|
- **docs/CLAUDE.md** - Detailed Claude Code guidance (comprehensive)
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Key Components Deep Dive
|
||
|
|
|
||
|
|
### V2 Service Architecture
|
||
|
|
`lawrisk_v2_service.py` implements:
|
||
|
|
- Structured response formatting
|
||
|
|
- Region filter normalization
|
||
|
|
- Direct permit name matching
|
||
|
|
- Markdown formatting for legal text
|
||
|
|
- Complex query execution pipeline with concurrency
|
||
|
|
|
||
|
|
### Authentication Flow
|
||
|
|
`lawrisk/api/auth.py` provides:
|
||
|
|
- Login page with redirect handling
|
||
|
|
- Session management
|
||
|
|
- `@login_required` decorator for protecting endpoints
|
||
|
|
- JSON vs HTML response handling (API vs browser)
|
||
|
|
|
||
|
|
### Checkpoint Security
|
||
|
|
`test_checkpoint_security.py` tests:
|
||
|
|
- Checkpoint creation authorization
|
||
|
|
- Restore operation security
|
||
|
|
- User permission validation
|
||
|
|
- Operation audit logging
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Best Practices
|
||
|
|
|
||
|
|
### Code Style
|
||
|
|
- **Black**: 100-character line length, Python 3.10+
|
||
|
|
- **Type Hints**: Use PEP 604 union types (`str | None`)
|
||
|
|
- **Imports**: Ruff-compatible, group by standard library → third-party → local
|
||
|
|
- **Naming**: snake_case (functions/variables), SCREAMING_SNAKE_CASE (constants), PascalCase (classes)
|
||
|
|
|
||
|
|
### Error Handling
|
||
|
|
- Graceful degradation on startup (errors surface on first request)
|
||
|
|
- Structured error responses: `{"success": false, "message": "error", "data": {}}`
|
||
|
|
- Logging to stdout with structured format
|
||
|
|
|
||
|
|
### Configuration
|
||
|
|
- Use `lawrisk.utils.env_loader` for environment variables
|
||
|
|
- Default values for non-critical configs
|
||
|
|
- Environment-specific overrides supported
|
||
|
|
|
||
|
|
---
|
||
|
|
|
||
|
|
## Health Checks
|
||
|
|
|
||
|
|
```bash
|
||
|
|
# Basic health
|
||
|
|
curl http://localhost:8000/healthz
|
||
|
|
|
||
|
|
# Check regions
|
||
|
|
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions
|
||
|
|
|
||
|
|
# Test search
|
||
|
|
curl -X POST http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2 \
|
||
|
|
-d "query=电影院&debug=1"
|
||
|
|
```
|
||
|
|
|
||
|
|
View app startup logs to see all registered routes.
|