12 KiB
12 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
LawRisk Backend - Development Guide
Project Overview
LawRisk is a Flask-based Python backend service for intelligent legal compliance risk retrieval. It helps users find permits, licenses, and legal risks based on natural language queries using vector embeddings and LLM matching.
Tech Stack
- Framework: Flask 2.3+
- Database: PostgreSQL (pg8000 driver)
- AI Services: 阿里云DashScope (text-embedding-v4, qwen-plus-latest)
- Development: Black, Ruff, Pytest
Two-Database Architecture
- fs_law_risk: Vector embeddings and subject-permit mappings
- licensing_risks: Structured permit and risk data (regions, themes, compliance)
Quick Reference
Most Common Commands
# Run the application (port 8000)
python app.py
# Install dependencies
pip install -r requirements.txt
# Format and lint code
black .
ruff .
# Run tests
pytest
pytest --cov=lawrisk
# Test API via curl
curl http://localhost:8000/healthz
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
-d "query=我要办一家电影院&debug=1"
Key File Locations
app.py- Flask application entry pointlawrisk/api/v2.py- V2 API routes (current)lawrisk/services/lawrisk_v2_service.py- Enhanced V2 search logiclawrisk/services/licensing_repo.py- Database operationslawrisk/api/auth.py- Authentication endpointsstatic/v2_tester.html- Web-based API testing interfacetests/test_auth.py- Auth system teststests/test_checkpoint_security.py- Checkpoint system tests
Architecture & Code Structure
Request Flow
HTTP Request
→ lawrisk/api/ (routing layer)
→ lawrisk/services/ (business logic)
→ lawrisk/services/licensing_repo.py (database access)
→ DashScope API (embeddings & LLM)
Core Modules
1. API Layer (lawrisk/api/)
v1.py- Legacy API (deprecated)v2.py- Current API with structured responses + admin endpointsauth.py- Authentication (login/logout/me endpoints)
2. Services Layer (lawrisk/services/)
lawrisk_service.py- Core search with embeddings (cosine similarity) + LLM matchinglawrisk_v2_service.py- Enhanced V2 with structured results, region filtering, direct permit matchinglicensing_repo.py- PostgreSQL operations (both databases), checkpoint managementauth_service.py- User authentication, password hashing, seed admin creation
3. Middleware & Utils
middleware/smart_cors_middleware.py- Configurable CORS (wildcard, subdomains, NGINX mode)utils/env_loader.py- Environment variable loadingutils/export_risk_json.py- Database export utilityutils/ingest_lawrisk.py- Data ingestion with embeddings
API Endpoints
Public Endpoints
V2 Search (Current)
- Path:
/fs-ai-asistant/api/workflow/lawrisk/v2 - Method: POST (recommended), GET
- Params:
query(required): User questionregion(optional): Filter by region (市级, 禅城区, etc.)debug(optional): Enable debug output (1/true/yes/on)top(optional): Number of recommendations (default: 5)
- Returns: Structured results with regions, themes, permits, risks
Supporting Endpoints
GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions- List all regionsGET /fs-ai-asistant/api/workflow/lawrisk/getPermits- Get permits by regionGET /healthz- Health check
Authentication Endpoints
GET /fs-ai-asistant/lawrisk/login- Login page (HTML)POST /auth/login- Authenticate userGET /auth/me- Get current userGET /auth/logout- Logout
Admin Endpoints (Protected)
GET /fs-ai-asistant/api/workflow/lawrisk/admin/test- Admin testGET /fs-ai-asistant/api/workflow/lawrisk/admin/regions- Region managementGET /fs-ai-asistant/api/workflow/lawrisk/admin/themes- Theme managementGET /fs-ai-asistant/api/workflow/lawrisk/admin/permits- Permit managementGET /fs-ai-asistant/api/workflow/lawrisk/admin/checkpoints- Checkpoint management (create/list/restore/delete)
Development Workflow
Environment Setup
# Windows PowerShell
python -m venv .venv
.venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Configure .env with database credentials and DashScope API key
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=lawrisk --cov-report=html
# Run specific test file
pytest tests/test_auth.py -v
# Test authentication
pytest tests/test_auth.py::test_login_success -v
Test Files:
tests/test_auth.py- Authentication system tests (login, logout, session management)tests/test_checkpoint_security.py- Database checkpoint security tests
Code Quality
# Format code
black .
# Lint with Ruff
ruff .
# Check specific file
black lawrisk/services/lawrisk_v2_service.py
ruff lawrisk/services/lawrisk_v2_service.py
Data Management
# Export data from fs_law_risk database
python lawrisk/utils/export_risk_json.py
# Output: data/risk_tables_export.json
# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python lawrisk/utils/ingest_lawrisk.py
Configuration
Required Environment Variables (.env)
DashScope AI Services
DASHSCOPE_API_KEY=your_api_key
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest
PostgreSQL Databases
# fs_law_risk (embeddings database)
PG_HOST=your_host
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=your_password
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres
# licensing_risks (structured data)
LIC_PG_HOST=your_host
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=your_password
LIC_PG_DATABASE=licensing_risks
Authentication
FLASK_SECRET_KEY=your-secret-key
LAWRISK_ADMIN_USERNAME=admin
LAWRISK_ADMIN_PASSWORD=adminpassword123
# Optional: LAWRISK_ADMIN_ROLE, LAWRISK_ADMIN_GRADE, LAWRISK_ADMIN_DISPLAY_NAME
Search Thresholds
LAWRISK_RETURN_IF_GE=0.7 # Return results if similarity >= 0.7
LAWRISK_FALLBACK_GT=0.5 # Use fallback if similarity > 0.5
Database Schema
fs_law_risk (Vector Embeddings)
law_sub- Subject matter with embeddings (id, name, vector)law_sub_per- Subject-permit mappings (sub_id, per_ids)law_per- Permit information (id, name, risk_ids)
licensing_risks (Structured Compliance)
regions- Administrative areasthemes- Legal themes/subjectspermits- License/permit itemsrisks- Risk information (content, legal_basis, document_no, summary)business_scopes- Business scope definitions- Junction tables: region_themes, region_theme_permits, region_permit_risks
Checkpoint System
Licensing_repo.py implements database checkpoint management:
create_checkpoint()- Create database backuplist_checkpoints()- List available backupsrestore_checkpoint()- DANGEROUS - Restore from checkpointdelete_checkpoint()- Remove old checkpoints
Security Guidelines
Critical Security Notes
- NEVER commit secrets - All credentials in
.envor environment variables - Protect admin endpoints -
/admin/*should be restricted in production - Checkpoint restore is dangerous - Database operation with confirmation flow
- API keys externalized -
DASHSCOPE_API_KEYand database passwords must be in.env
Authentication System
- Session-based auth using Flask sessions
- Password hashing with
passlib - First admin auto-created from environment variables on startup
- Role-based access (admin, reviewer, analyst, etc.)
- Login page:
/fs-ai-asistant/lawrisk/login - Protected endpoints use
@login_requireddecorator
Recent Features (from git log)
Checkpoint System (Recent)
- Database backup/restore functionality
- Timeline view of checkpoints
- Progress indicators for restore operations
- Security tests in
test_checkpoint_security.py
Permit Risk Snapshot
- Workflow for permit risk snapshots
- Unified snapshot and checkpoint timeline
- Enhanced batch display for snapshots
Licensing Import Enhancement
- Optimized district/region merging during import
- Enhanced source display for permits
Testing Guidelines
Test Structure
tests/
├── __init__.py
├── test_auth.py # Auth system tests (login, session, decorators)
└── test_checkpoint_security.py # Checkpoint security tests
Running Tests
# All tests
pytest
# Verbose output
pytest -v
# Coverage report
pytest --cov=lawrisk --cov-report=term-missing
# Specific test
pytest tests/test_auth.py::test_login_success -v
Manual Testing
- Start app:
python app.py - Open browser:
static/v2_tester.html - Test queries:
- "我要办一家电影院"
- "开办旅馆需要哪些许可"
- With region filter and debug mode
Troubleshooting
Common Issues
Database Connection
# Verify database is accessible
psql -h $PG_HOST -U $PG_USER -d $PG_DATABASE
# Check tables exist
SELECT COUNT(*) FROM fs_law_risk.law_sub;
SELECT COUNT(*) FROM licensing_risks.regions;
API Errors
# Test health check
curl http://localhost:8000/healthz
# Test V2 API with debug
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
-d "query=电影院&debug=1"
# Check app logs for registered routes
python app.py 2>&1 | grep "Registered routes"
Missing Embeddings
# Check if embeddings exist
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;
# If empty, run ingestion
python lawrisk/utils/ingest_lawrisk.py
Documentation Files
- README.md - Project overview and quick start
- AGENTS.md - Development guidelines, coding style, testing approach
- docs/V2_API文档.md - Detailed V2 API documentation
- docs/API.md - V1 API documentation (legacy)
- docs/DB_GUIDE.md - Database schema and query examples
- docs/PRD.md - Product requirements
- docs/CLAUDE.md - Detailed Claude Code guidance (comprehensive)
Key Components Deep Dive
V2 Service Architecture
lawrisk_v2_service.py implements:
- Structured response formatting
- Region filter normalization
- Direct permit name matching
- Markdown formatting for legal text
- Complex query execution pipeline with concurrency
Authentication Flow
lawrisk/api/auth.py provides:
- Login page with redirect handling
- Session management
@login_requireddecorator for protecting endpoints- JSON vs HTML response handling (API vs browser)
Checkpoint Security
test_checkpoint_security.py tests:
- Checkpoint creation authorization
- Restore operation security
- User permission validation
- Operation audit logging
Best Practices
Code Style
- Black: 100-character line length, Python 3.10+
- Type Hints: Use PEP 604 union types (
str | None) - Imports: Ruff-compatible, group by standard library → third-party → local
- Naming: snake_case (functions/variables), SCREAMING_SNAKE_CASE (constants), PascalCase (classes)
Error Handling
- Graceful degradation on startup (errors surface on first request)
- Structured error responses:
{"success": false, "message": "error", "data": {}} - Logging to stdout with structured format
Configuration
- Use
lawrisk.utils.env_loaderfor environment variables - Default values for non-critical configs
- Environment-specific overrides supported
Health Checks
# Basic health
curl http://localhost:8000/healthz
# Check regions
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions
# Test search
curl -X POST http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2 \
-d "query=电影院&debug=1"
View app startup logs to see all registered routes.