22 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
LawRisk Backend - Claude Code Analysis
Project Overview
LawRisk is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese).
Python Version Requirement: Python 3.10+ (uses PEP 604 union types like str | None)
Key Features
- Semantic Search: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics
- LLM-Powered Matching: Qwen (qwen-plus-latest) for intelligent subject selection
- Two Database Architecture:
fs_law_risk: Vector embeddings and subject-permit mappingslicensing_risks: Structured permit and risk data with regions, themes, and compliance information
- RESTful APIs: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search
- CORS Enabled: Built-in CORS middleware for frontend integration
Architecture & Project Structure
Core Framework & Libraries
- Framework: Flask (Python web framework)
- Database Driver: pg8000 (PostgreSQL adapter)
- Vector Embeddings: Aliyun DashScope OpenAI-compatible API
- LLM: Qwen via DashScope (qwen-plus-latest)
- Dependencies: Minimal footprint - Flask, pg8000, concurrent.futures
Directory Structure
市监局-lawRisk-backend/
├── app.py # Flask application entry point
├── requirements.txt # Python dependencies
├── .env # Environment configuration
├── lawrisk/ # Main application package
│ ├── __init__.py
│ ├── api/ # API route handlers
│ │ ├── v1.py # V1 API (legacy)
│ │ └── v2.py # V2 API (current)
│ ├── services/ # Business logic layer
│ │ ├── lawrisk_service.py # Core search & embeddings
│ │ ├── lawrisk_v2_service.py # V2 enhanced service
│ │ └── licensing_repo.py # Data repository
│ ├── middleware/ # HTTP middleware
│ │ └── smart_cors_middleware.py
│ └── utils/ # Utility functions
│ ├── env_loader.py
│ ├── export_risk_json.py
│ └── ingest_lawrisk.py
├── static/ # Static assets
│ └── v2_tester.html # Web-based API tester
├── tests/ # Test suite (planned)
├── data/ # Data files
│ ├── risk_tables_export.json
│ └── licensing_risks_dump.sql
└── docs/ # Documentation
├── PRD.md
├── API.md
├── V2_API文档.md
├── AGENTS.md
├── DB_GUIDE.md
└── CLAUDE.md
Quick Reference
Most Common Commands
# Run the application
python app.py
# Export data from database
python export_risk_json.py
# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python ingest_lawrisk.py
# Format and lint code
black .
ruff .
# Test locally via browser
# Open static/v2_tester.html after starting the app
Key Files
app.py- Flask application entry pointlawrisk/- Main application packageapi/v1.py- V1 API routes (legacy)api/v2.py- V2 API routes (current)services/lawrisk_service.py- Core search & embeddingsservices/lawrisk_v2_service.py- V2 enhanced serviceservices/licensing_repo.py- Data repositorymiddleware/smart_cors_middleware.py- CORS middlewareutils/- Utility functions
static/v2_tester.html- Web-based API testing interfacerequirements.txt- Python dependencies.env- Environment configuration
Development Workflow
Initial Setup
# 1. Create virtual environment (Windows PowerShell)
python -m venv .venv
.venv\Scripts\activate
# 2. Install dependencies
pip install Flask pg8000 black ruff pytest
# 3. Configure environment
# Edit .env with your database credentials and DashScope API key
Virtual Environment Activation
# Windows PowerShell
.venv\Scripts\activate
# Windows CMD
.venv\Scripts\activate.bat
# Git Bash (Windows)
source .venv/Scripts/activate
Common Commands
Run the Application
# Development mode
python app.py
# Custom port
PORT=8000 python app.py
# With debug logging
FLASK_DEBUG=1 python app.py
Data Management
# Export data from fs_law_risk database to JSON
python export_risk_json.py
# Ingest data with embeddings into database
python ingest_lawrisk.py
# Requires DASHSCOPE_API_KEY in .env
Code Quality
# Format code with Black (100 char line length)
black .
# Lint with Ruff
ruff .
# Format and lint specific file
black lawrisk/services/lawrisk_v2_service.py
ruff lawrisk/services/lawrisk_v2_service.py
# Run tests (when added)
pytest -q
# Run tests with coverage
pytest --cov=lawrisk
Data Management Commands
# Export data from fs_law_risk database to JSON
# Output: data/risk_tables_export.json
python lawrisk/utils/export_risk_json.py
# Ingest data with embeddings into database
# Requires DASHSCOPE_API_KEY in .env
python lawrisk/utils/ingest_lawrisk.py
# Verify exported data
ls -lh data/
cat data/risk_tables_export.json | head -50
Database Operations
# Connect to PostgreSQL
psql -h 8.138.196.105 -U postgres -d fs_law_risk
# Connect to licensing_risks database
psql -h 8.138.196.105 -U postgres -d licensing_risks
API Endpoints
V1 API (Legacy)
- Path:
/fs-ai-asistant/api/workflow/lawrisk - Methods: GET, POST
- Mode:
llm(default) orembed - Input:
query(user question) - Output: Simple array of matching subjects with permit IDs
V2 API (Current/Recommended)
- Base Path:
/fs-ai-asistant/api/workflow/lawrisk/v2 - Methods: GET, POST
- Features:
- Structured results with regions, themes, permits, and risks
- Optional region filtering
- Debug mode with detailed execution info
- Direct permit matching by name
V2 Sub-endpoints
-
Search Endpoint
- Path:
/fs-ai-asistant/api/workflow/lawrisk/v2 - Parameters:
query(required): User questionregion(optional): Filter by region (市级, 禅城区, etc.)debug(optional): Enable debug output (1/true/yes/on)top(optional): Number of recommendations (default: 5)
- Path:
-
Regions List
- Path:
/fs-ai-asistant/api/workflow/lawrisk/v2/regions - Method: GET
- Returns: All available regions for filtering
- Path:
-
Get Permits
- Path:
/fs-ai-asistant/api/workflow/lawrisk/getPermits - Method: GET, POST
- Input:
region(region ID or name) - Returns: All permits for a specific region
- Path:
Health Check
- Path:
/healthz - Method: GET
- Returns:
{"status": "ok"}
Database Schema
Database 1: fs_law_risk
Used for vector embeddings and semantic search.
Tables
-
law_sub: Subject matter with embeddingsid(TEXT, PK): Subject IDname(TEXT): Subject namevector(JSONB): Embedding vector
-
law_sub_per: Subject-permit mappingssub_id(TEXT, PK): Subject IDper_ids(JSONB): Array of permit IDs
-
law_per: Permit informationid(TEXT, PK): Permit IDname(TEXT): Permit namerisk_ids(JSONB): Array of risk IDs
Database 2: licensing_risks
Used for structured compliance data.
Tables
-
regions: Administrative areasid(PK),name(unique)
-
business_scopes: Business scope definitionsid(PK),description
-
region_scopes: Region-scope mappings -
themes: Legal themes/subjectsid(PK),name
-
region_themes: Region-theme mappings -
permits: License/permit itemsid(PK),name
-
region_theme_permits: Tripartite linkage -
risks: Risk informationid(PK),risk_content,legal_basis,document_no,summary
-
region_permit_risks: Risk associations
Configuration
Environment Variables (.env)
DashScope (Embeddings & LLM)
DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest
PostgreSQL Configuration
# fs_law_risk database
PG_HOST=8.138.196.105
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=difyai123456
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres
# licensing_risks database
LIC_PG_HOST=8.138.196.105
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=difyai123456
LIC_PG_DATABASE=licensing_risks
Application Settings
FLASK_ENV=development
# Search thresholds (tunable)
LAWRISK_RETURN_IF_GE=0.7
LAWRISK_FALLBACK_GT=0.5
Testing
Current Testing Status
- No dedicated test suite in the repository
- pytest framework is recommended but not configured
- Manual testing available via
static/v2_tester.html
Testing Framework & Guidelines
When adding tests, use:
- Framework: pytest with Flask test client
- Dependencies: pytest-cov for coverage reporting
- Focus Areas:
- API endpoints (V1 and V2)
- Middleware behavior (CORS)
- Database operations
- LLM selection logic
- Region filtering
- Checkpoint operations (create, list, restore, delete)
Recommended test structure:
tests/
├── conftest.py # Shared fixtures and test configuration
├── test_api_v1.py # V1 API endpoint tests
├── test_api_v2.py # V2 API endpoint tests
├── test_admin_endpoints.py # Admin endpoint tests
├── test_search_service.py # Core search logic
├── test_licensing_repo.py # Database repository
└── test_cors_middleware.py # CORS middleware behavior
# Test naming: test_*.py (pytest discovery pattern)
# Test functions: test_<feature>_<scenario>()
Testing CORS Middleware (from AGENTS.md):
- Origin matching: wildcard, exact, subdomains
- Preflight OPTIONS handling
- X-CORS-Decision header behavior
- NGINX_CORS_MODE functionality
- Environment variable configurations (ALLOWED_ORIGINS, CORS_STRICT, etc.)
Example test case:
def test_v2_search_with_debug():
client = app.test_client()
response = client.post('/fs-ai-asistant/api/workflow/lawrisk/v2',
data={'query': '电影院', 'debug': '1'})
assert response.status_code == 200
data = response.get_json()['data']
assert 'debug' in data
assert 'executionTime' in data
Testing Commands
# Run all tests
pytest
# Run with coverage
pytest --cov=lawrisk_service
# Run specific test file
pytest tests/test_api_v2.py -v
Manual Testing with V2 Tester
- Start the application:
python app.py(defaults to port 8000) - Open
static/v2_tester.htmlin your browser - Test queries like:
- "我要办一家电影院"
- "开办旅馆需要哪些许可"
- "公共场所卫生许可"
- Query with region filter: "电影院®ion=市级&debug=1"
The tester provides a simple UI to experiment with the V2 API and view debug information.
Coding Standards & Best Practices
Python Style Guidelines
- Indentation: 4 spaces (no tabs)
- Encoding: UTF-8 for all source files
- Naming Conventions:
- Functions/variables:
snake_case - Constants:
SCREAMING_SNAKE_CASE - Classes:
PascalCase
- Functions/variables:
- Type Hints: Prefer type hints for all public functions
- Code Formatting: Use
blackwith 100-character line length - Linting: Use
ruffwith default rules
Code Quality Guidelines
- Keep functions small and side-effect free
- Prefer pure functions where possible
- Document complex logic with comments
- Use type hints from
typingmodule - Handle errors gracefully with appropriate logging
Documentation Files
-
PRD.md (docs/PRD.md) - Product Requirements Document
- Business logic and requirements specification
- Feature specifications and user stories
-
API.md (docs/API.md) - V1 API documentation
- Legacy API endpoints and usage
- Request/response formats
-
V2_API文档.md (docs/V2_API文档.md) - Detailed V2 API documentation
- Enhanced API specification
- Admin endpoints and checkpoint operations
- Request/response examples
-
AGENTS.md (docs/AGENTS.md) - Development guidelines
- Repository structure and module organization
- Coding style and naming conventions
- Commit and PR guidelines
- Security and configuration tips
-
DB_GUIDE.md (docs/DB_GUIDE.md) - Database reference
- Schema reference for both databases
- Query examples and optimization tips
-
README.md (README.md) - Project overview and quick start
Key Components Deep Dive
1. app.py - Application Entry Point (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\app.py)
- Creates Flask app with CORS enabled
- Registers all API routes (v1_bp, v2_bp)
- Database initialization and schema checks on startup
- Health check endpoint at
/healthz - Logs all registered routes on startup
- Error handling doesn't block app startup (errors surface on first request)
2. lawrisk_service.py - Core Search Logic (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_service.py)
- EmbeddingClient: DashScope API integration for vector embeddings
- ChatClient: Qwen LLM interaction for intelligent subject selection
- Database helpers using pg8000 for PostgreSQL
- Search algorithms:
- Embedding-based cosine similarity search
- LLM-based subject matching using
qwen-plus-latest
- Similarity threshold management (tunable via LAWRISK_RETURN_IF_GE, LAWRISK_FALLBACK_GT)
- Concurrent execution support with ThreadPoolExecutor
3. lawrisk_v2_service.py - Enhanced API (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_v2_service.py)
- Structured response formatting with regions, themes, permits, and risks
- Region filtering logic with normalization
- Direct permit matching by name
- Markdown formatting for legal text
- Complex query execution pipeline
- Helper functions:
_compose_prompt(): Builds natural-language prompts from structured data_normalize_region_filter(): Normalizes region filters for matching
4. licensing_repo.py - Data Repository (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\licensing_repo.py)
- Separate database connection for
licensing_risksdatabase - Multi-table join query optimization
- Legal text formatting helpers
- Chinese legal document pattern matching
- Checkpoint management:
create_checkpoint(): Database backup functionalitylist_checkpoints(): List available backupsrestore_checkpoint(): Restore from checkpoint (DANGEROUS!)delete_checkpoint(): Remove old checkpoints
5. smart_cors_middleware.py - Reusable CORS (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\middleware\smart_cors_middleware.py)
- Wildcard and exact origin matching
- Subdomain support with flexible patterns
- Preflight OPTIONS handling
- NGINX integration mode (NGINX_CORS_MODE)
- Debug and logging features (CORS_DEBUG)
- Environment variable support:
- ALLOWED_ORIGINS, CORS_STRICT, CORS_DEBUG
- NGINX_CORS_MODE, CORS_MAX_AGE, CORS_EXPOSE_HEADERS
6. v2.py API Routes (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\api\v2.py)
- Public endpoints:
/v2,/v2/regions,/getPermits - Admin endpoints:
/admin/test,/admin/regions,/admin/themes,/admin/permits,/admin/checkpoints - Parameter extraction supporting GET, POST, JSON, and form data
- Concurrent execution using ThreadPoolExecutor (max_workers=2)
- Structured error responses with consistent format
7. Utility Scripts
- env_loader.py: Environment variable loading from .env file
- export_risk_json.py: PostgreSQL data export utility (outputs to data/risk_tables_export.json)
- ingest_lawrisk.py: Data ingestion with embeddings (requires DASHSCOPE_API_KEY)
Security & Best Practices
Security Guidelines
- NEVER hardcode secrets in source code
- All credentials must be in
.envfile or environment variables - API keys (DASHSCOPE_API_KEY) and database passwords must be externalized
- Admin endpoints (
/admin/*) should be protected in production - Database checkpoint restore is a DANGEROUS OPERATION and should be restricted
Configuration Best Practices
- Use
.envfile for all configuration (database, API keys, thresholds) - Environment variables supported by CORS middleware:
- ALLOWED_ORIGINS: Comma-separated list of allowed origins
- CORS_STRICT: Enable strict origin checking
- CORS_DEBUG: Enable debug logging
- NGINX_CORS_MODE: Enable NGINX integration
- CORS_MAX_AGE: Preflight cache duration
- CORS_EXPOSE_HEADERS: Headers to expose to browsers
- Regularly backup databases using checkpoint system
- Monitor DashScope API quota and rate limits
Troubleshooting Guide
Common Issues
Database Connection Errors
Symptom: pg8000.dbapi.Error when starting the app
Solutions:
- Check
.envfile exists with correct PostgreSQL credentials - Verify network connectivity to the database server
- Ensure PostgreSQL server is running and accessible
- Check database names:
fs_law_riskandlicensing_risks - Verify PG_HOST, PG_PORT, PG_USER, PG_PASSWORD are correct
Missing Environment Variables
Symptom: Key errors, default values being used, or API failures Solutions:
- Create
.envfile from the template (see Configuration section) - Ensure
DASHSCOPE_API_KEYis set for embedding/chat features - Verify all required PG_* and LIC_* environment variables
- Check that pg8000 is installed:
pip install pg8000>=1.30.0
LLM/Embedding API Errors
Symptom: API authentication failures, timeout errors, or embedding errors Solutions:
- Verify
DASHSCOPE_API_KEYis valid and has sufficient quota - Check
DASHSCOPE_BASE_URLmatches:https://dashscope.aliyuncs.com/compatible-mode/v1 - Ensure network access to DashScope API servers
- Review API rate limits and batch sizes (DASHSCOPE_MAX_BATCH)
- Test API key: curl -H "Authorization: Bearer $DASHSCOPE_API_KEY" "$DASHSCOPE_BASE_URL/models"
Empty Search Results
Symptom: API returns empty risk_subject array
Solutions:
- Check if database tables are populated:
SELECT COUNT(*) FROM fs_law_risk.law_sub; SELECT COUNT(*) FROM fs_law_risk.law_sub_per; - Try
debug=1parameter to see detailed execution info - Verify similarity thresholds in
.env:LAWRISK_RETURN_IF_GE=0.7(return if similarity >= 0.7)LAWRISK_FALLBACK_GT=0.5(fallback if similarity > 0.5)
- Test with known queries like "我要办一家电影院" or "开办旅馆"
- Check if embeddings exist:
SELECT id FROM fs_law_risk.law_sub LIMIT 5;
Port Already in Use
Symptom: OSError: [Errno 10048] Only one usage of each socket address
Solutions:
- Change port:
PORT=8001 python app.py - Kill existing process using the port:
netstat -ano | findstr :8000 taskkill /PID <PID> /F
Data Export/Import Issues
Symptom: export_risk_json.py or ingest_lawrisk.py fails Solutions:
- Verify PostgreSQL credentials in
.envmatch database access - Ensure export script isn't writing outside the repository
- For ingestion, confirm DASHSCOPE_API_KEY is valid
- Check data files exist:
data/risk_tables_export.json - Run export first, then ingestion if needed
Health Check Commands
# Basic health check
curl http://localhost:8000/healthz
# Check registered routes (see app startup logs)
python app.py 2>&1 | grep "Registered routes"
# Test V2 API with debug
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "query=我要办一家电影院&debug=1"
# Test regions endpoint
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions
# Test admin endpoints
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/admin/test
Database Verification
Verify database content with queries from DB_GUIDE.md:
-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;
-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;
-- Check if embeddings exist
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;
-- List available regions
SELECT * FROM licensing_risks.regions ORDER BY name;
Debug Mode
Enable debug logging to troubleshoot issues:
# Enable Flask debug mode
FLASK_DEBUG=1 python app.py
# Enable CORS debug mode
CORS_DEBUG=1 python app.py
# Check app logs for registered routes and errors
# Logs are printed to console when starting the app
Health Checks
- Basic health:
GET /healthz→{"status": "ok"} - V2 regions:
GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions - Check logs for registered routes on app startup
Data Verification
Verify database content with queries from DB_GUIDE.md:
-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;
-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;