14 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
LawRisk Backend - Claude Code Analysis
Project Overview
LawRisk is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese).
Python Version Requirement: Python 3.10+ (uses PEP 604 union types like str | None)
Key Features
- Semantic Search: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics
- LLM-Powered Matching: Qwen (qwen-plus-latest) for intelligent subject selection
- Two Database Architecture:
fs_law_risk: Vector embeddings and subject-permit mappingslicensing_risks: Structured permit and risk data with regions, themes, and compliance information
- RESTful APIs: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search
- CORS Enabled: Built-in CORS middleware for frontend integration
Architecture & Project Structure
Core Framework & Libraries
- Framework: Flask (Python web framework)
- Database Driver: pg8000 (PostgreSQL adapter)
- Vector Embeddings: Aliyun DashScope OpenAI-compatible API
- LLM: Qwen via DashScope (qwen-plus-latest)
- Dependencies: Minimal footprint - Flask, pg8000, concurrent.futures
Directory Structure
市监局-lawRisk-backend/
├── app.py # Flask application entry point
├── requirements.txt # Python dependencies
├── .env # Environment configuration
├── lawrisk/ # Main application package
│ ├── __init__.py
│ ├── api/ # API route handlers
│ │ ├── v1.py # V1 API (legacy)
│ │ └── v2.py # V2 API (current)
│ ├── services/ # Business logic layer
│ │ ├── lawrisk_service.py # Core search & embeddings
│ │ ├── lawrisk_v2_service.py # V2 enhanced service
│ │ └── licensing_repo.py # Data repository
│ ├── middleware/ # HTTP middleware
│ │ └── smart_cors_middleware.py
│ └── utils/ # Utility functions
│ ├── env_loader.py
│ ├── export_risk_json.py
│ └── ingest_lawrisk.py
├── static/ # Static assets
│ └── v2_tester.html # Web-based API tester
├── tests/ # Test suite (planned)
├── data/ # Data files
│ ├── risk_tables_export.json
│ └── licensing_risks_dump.sql
└── docs/ # Documentation
├── PRD.md
├── API.md
├── V2_API文档.md
├── AGENTS.md
├── DB_GUIDE.md
└── CLAUDE.md
Quick Reference
Most Common Commands
# Run the application
python app.py
# Export data from database
python export_risk_json.py
# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python ingest_lawrisk.py
# Format and lint code
black .
ruff .
# Test locally via browser
# Open static/v2_tester.html after starting the app
Key Files
app.py- Flask application entry pointlawrisk/- Main application packageapi/v1.py- V1 API routes (legacy)api/v2.py- V2 API routes (current)services/lawrisk_service.py- Core search & embeddingsservices/lawrisk_v2_service.py- V2 enhanced serviceservices/licensing_repo.py- Data repositorymiddleware/smart_cors_middleware.py- CORS middlewareutils/- Utility functions
static/v2_tester.html- Web-based API testing interfacerequirements.txt- Python dependencies.env- Environment configuration
Development Workflow
Initial Setup
# 1. Create virtual environment
python -m venv .venv
# 2. Activate virtual environment (Windows PowerShell)
.venv\Scripts\activate
# 3. Install dependencies
pip install Flask pg8000 black ruff pytest
# 4. Load environment variables
# Edit .env with your database credentials
Common Commands
Run the Application
# Development mode
python app.py
# Custom port
PORT=8000 python app.py
# With debug logging
FLASK_DEBUG=1 python app.py
Data Management
# Export data from fs_law_risk database to JSON
python export_risk_json.py
# Ingest data with embeddings into database
python ingest_lawrisk.py
# Requires DASHSCOPE_API_KEY in .env
Code Quality
# Format code with Black (100 char line length)
black .
# Lint with Ruff
ruff .
# Run tests (when added)
pytest -q
Database Operations
# Connect to PostgreSQL
psql -h 8.138.196.105 -U postgres -d fs_law_risk
# Connect to licensing_risks database
psql -h 8.138.196.105 -U postgres -d licensing_risks
API Endpoints
V1 API (Legacy)
- Path:
/fs-ai-asistant/api/workflow/lawrisk - Methods: GET, POST
- Mode:
llm(default) orembed - Input:
query(user question) - Output: Simple array of matching subjects with permit IDs
V2 API (Current/Recommended)
- Base Path:
/fs-ai-asistant/api/workflow/lawrisk/v2 - Methods: GET, POST
- Features:
- Structured results with regions, themes, permits, and risks
- Optional region filtering
- Debug mode with detailed execution info
- Direct permit matching by name
V2 Sub-endpoints
-
Search Endpoint
- Path:
/fs-ai-asistant/api/workflow/lawrisk/v2 - Parameters:
query(required): User questionregion(optional): Filter by region (市级, 禅城区, etc.)debug(optional): Enable debug output (1/true/yes/on)top(optional): Number of recommendations (default: 5)
- Path:
-
Regions List
- Path:
/fs-ai-asistant/api/workflow/lawrisk/v2/regions - Method: GET
- Returns: All available regions for filtering
- Path:
-
Get Permits
- Path:
/fs-ai-asistant/api/workflow/lawrisk/getPermits - Method: GET, POST
- Input:
region(region ID or name) - Returns: All permits for a specific region
- Path:
Health Check
- Path:
/healthz - Method: GET
- Returns:
{"status": "ok"}
Database Schema
Database 1: fs_law_risk
Used for vector embeddings and semantic search.
Tables
-
law_sub: Subject matter with embeddingsid(TEXT, PK): Subject IDname(TEXT): Subject namevector(JSONB): Embedding vector
-
law_sub_per: Subject-permit mappingssub_id(TEXT, PK): Subject IDper_ids(JSONB): Array of permit IDs
-
law_per: Permit informationid(TEXT, PK): Permit IDname(TEXT): Permit namerisk_ids(JSONB): Array of risk IDs
Database 2: licensing_risks
Used for structured compliance data.
Tables
-
regions: Administrative areasid(PK),name(unique)
-
business_scopes: Business scope definitionsid(PK),description
-
region_scopes: Region-scope mappings -
themes: Legal themes/subjectsid(PK),name
-
region_themes: Region-theme mappings -
permits: License/permit itemsid(PK),name
-
region_theme_permits: Tripartite linkage -
risks: Risk informationid(PK),risk_content,legal_basis,document_no,summary
-
region_permit_risks: Risk associations
Configuration
Environment Variables (.env)
DashScope (Embeddings & LLM)
DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest
PostgreSQL Configuration
# fs_law_risk database
PG_HOST=8.138.196.105
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=difyai123456
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres
# licensing_risks database
LIC_PG_HOST=8.138.196.105
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=difyai123456
LIC_PG_DATABASE=licensing_risks
Application Settings
FLASK_ENV=development
# Search thresholds (tunable)
LAWRISK_RETURN_IF_GE=0.7
LAWRISK_FALLBACK_GT=0.5
Testing
Current Testing Status
- No dedicated test suite in the repository
- pytest framework is recommended but not configured
- Manual testing available via
static/v2_tester.html
Testing Framework & Guidelines
When adding tests, use:
- Framework: pytest with Flask test client
- Focus Areas:
- API endpoints (V1 and V2)
- Middleware behavior (CORS)
- Database operations
- LLM selection logic
- Region filtering
Recommended test structure:
tests/
├── test_api_v1.py # V1 API endpoint tests
├── test_api_v2.py # V2 API endpoint tests
├── test_search_service.py # Core search logic
├── test_licensing_repo.py # Database repository
└── conftest.py # Shared fixtures
Testing Commands
# Run all tests
pytest
# Run with coverage
pytest --cov=lawrisk_service
# Run specific test file
pytest tests/test_api_v2.py -v
Manual Testing with V2 Tester
- Start the application:
python app.py(defaults to port 8000) - Open
static/v2_tester.htmlin your browser - Test queries like:
- "我要办一家电影院"
- "开办旅馆需要哪些许可"
- "公共场所卫生许可"
- Query with region filter: "电影院®ion=市级&debug=1"
The tester provides a simple UI to experiment with the V2 API and view debug information.
Coding Standards & Best Practices
Python Style Guidelines
- Indentation: 4 spaces (no tabs)
- Encoding: UTF-8 for all source files
- Naming Conventions:
- Functions/variables:
snake_case - Constants:
SCREAMING_SNAKE_CASE - Classes:
PascalCase
- Functions/variables:
- Type Hints: Prefer type hints for all public functions
- Code Formatting: Use
blackwith 100-character line length - Linting: Use
ruffwith default rules
Code Quality Guidelines
- Keep functions small and side-effect free
- Prefer pure functions where possible
- Document complex logic with comments
- Use type hints from
typingmodule - Handle errors gracefully with appropriate logging
Documentation Files
- PRD.md - Product Requirements Document (specifies business logic and requirements)
- API.md - API documentation for V1 endpoints
- V2_API文档.md - Detailed API documentation for V2 endpoints
- AGENTS.md - Development guidelines and best practices
- DB_GUIDE.md - Database schema reference and query examples
Key Components Deep Dive
1. app.py - Application Entry Point
- Creates Flask app with CORS enabled
- Registers all API routes
- Parameter extraction logic (GET/POST, JSON/form data)
- Concurrent execution using ThreadPoolExecutor
- Error handling and logging
2. lawrisk_service.py - Core Search Logic
- EmbeddingClient: Handles DashScope API integration
- ChatClient: Manages Qwen LLM interactions
- Database helpers with pg8000
- Search algorithms:
- Embedding-based cosine similarity
- LLM-based subject selection
- Similarity threshold management
3. lawrisk_v2_service.py - Enhanced API
- Structured response formatting
- Region filtering logic
- Permit direct matching by name
- Markdown formatting for legal text
- Complex query execution pipeline
4. licensing_repo.py - Data Repository
- Separate database connection for licensing_risks
- Query optimization for multi-table joins
- Legal text formatting helpers
- Pattern matching for Chinese legal documents
5. smart_cors_middleware.py - Reusable CORS
- Wildcard and exact origin matching
- Subdomain support
- Preflight OPTIONS handling
- NGINX integration mode
- Debug and logging features
Troubleshooting Guide
Common Issues
Database Connection Errors
Symptom: pg8000.dbapi.Error when starting the app
Solutions:
- Check
.envfile exists with correct PostgreSQL credentials - Verify network connectivity to the database server
- Ensure PostgreSQL server is running and accessible
- Check database names:
fs_law_riskandlicensing_risks
Missing Environment Variables
Symptom: Key errors or default values being used Solutions:
- Create
.envfile from the template (see Configuration section) - Ensure
DASHSCOPE_API_KEYis set for embedding/chat features - Verify all required PG_* and LIC_* environment variables
LLM/Embedding API Errors
Symptom: API authentication failures or timeout errors Solutions:
- Verify
DASHSCOPE_API_KEYis valid and has sufficient quota - Check
DASHSCOPE_BASE_URLmatches the API endpoint - Ensure network access to DashScope API servers
- Review API rate limits and batch sizes
Empty Search Results
Symptom: API returns empty risk_subject array
Solutions:
- Check if database tables are populated (
fs_law_risk.law_sub, etc.) - Try
debug=1parameter to see detailed execution info - Verify similarity thresholds in
.env(LAWRISK_RETURN_IF_GE,LAWRISK_FALLBACK_GT) - Test with known queries like "我要办一家电影院"
Port Already in Use
Symptom: OSError: [Errno 10048] Only one usage of each socket address
Solutions:
- Change port:
PORT=8001 python app.py - Kill existing process using the port:
netstat -ano | findstr :8000thentaskkill /PID <PID> /F
Debug Mode
Enable debug logging to troubleshoot issues:
# Enable Flask debug mode
FLASK_DEBUG=1 python app.py
# Enable CORS debug mode
CORS_DEBUG=1 python app.py
# Check app logs for registered routes and errors
# Logs are printed to console when starting the app
Health Checks
- Basic health:
GET /healthz→{"status": "ok"} - V2 regions:
GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions - Check logs for registered routes on app startup
Data Verification
Verify database content with queries from DB_GUIDE.md:
-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;
-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;