# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. --- # LawRisk Backend - Claude Code Analysis ## Project Overview **LawRisk** is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese). **Python Version Requirement**: Python 3.10+ (uses PEP 604 union types like `str | None`) ### Key Features - **Semantic Search**: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics - **LLM-Powered Matching**: Qwen (qwen-plus-latest) for intelligent subject selection - **Two Database Architecture**: - `fs_law_risk`: Vector embeddings and subject-permit mappings - `licensing_risks`: Structured permit and risk data with regions, themes, and compliance information - **RESTful APIs**: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search - **CORS Enabled**: Built-in CORS middleware for frontend integration --- ## Architecture & Project Structure ### Core Framework & Libraries - **Framework**: Flask (Python web framework) - **Database Driver**: pg8000 (PostgreSQL adapter) - **Vector Embeddings**: Aliyun DashScope OpenAI-compatible API - **LLM**: Qwen via DashScope (qwen-plus-latest) - **Dependencies**: Minimal footprint - Flask, pg8000, concurrent.futures ### Directory Structure ``` 市监局-lawRisk-backend/ ├── app.py # Flask application entry point ├── requirements.txt # Python dependencies ├── .env # Environment configuration ├── lawrisk/ # Main application package │ ├── __init__.py │ ├── api/ # API route handlers │ │ ├── v1.py # V1 API (legacy) │ │ └── v2.py # V2 API (current) │ ├── services/ # Business logic layer │ │ ├── lawrisk_service.py # Core search & embeddings │ │ ├── lawrisk_v2_service.py # V2 enhanced service │ │ └── licensing_repo.py # Data repository │ ├── middleware/ # HTTP middleware │ │ └── smart_cors_middleware.py │ └── utils/ # Utility functions │ ├── env_loader.py │ ├── export_risk_json.py │ └── ingest_lawrisk.py ├── static/ # Static assets │ └── v2_tester.html # Web-based API tester ├── tests/ # Test suite (planned) ├── data/ # Data files │ ├── risk_tables_export.json │ └── licensing_risks_dump.sql └── docs/ # Documentation ├── PRD.md ├── API.md ├── V2_API文档.md ├── AGENTS.md ├── DB_GUIDE.md └── CLAUDE.md ``` --- ## Quick Reference ### Most Common Commands ```bash # Run the application python app.py # Export data from database python export_risk_json.py # Ingest data with embeddings (requires DASHSCOPE_API_KEY) python ingest_lawrisk.py # Format and lint code black . ruff . # Test locally via browser # Open static/v2_tester.html after starting the app ``` ### Key Files - `app.py` - Flask application entry point - `lawrisk/` - Main application package - `api/v1.py` - V1 API routes (legacy) - `api/v2.py` - V2 API routes (current) - `services/lawrisk_service.py` - Core search & embeddings - `services/lawrisk_v2_service.py` - V2 enhanced service - `services/licensing_repo.py` - Data repository - `middleware/smart_cors_middleware.py` - CORS middleware - `utils/` - Utility functions - `static/v2_tester.html` - Web-based API testing interface - `requirements.txt` - Python dependencies - `.env` - Environment configuration --- ## Development Workflow ### Initial Setup ```bash # 1. Create virtual environment python -m venv .venv # 2. Activate virtual environment (Windows PowerShell) .venv\Scripts\activate # 3. Install dependencies pip install Flask pg8000 black ruff pytest # 4. Load environment variables # Edit .env with your database credentials ``` ### Common Commands #### Run the Application ```bash # Development mode python app.py # Custom port PORT=8000 python app.py # With debug logging FLASK_DEBUG=1 python app.py ``` #### Data Management ```bash # Export data from fs_law_risk database to JSON python export_risk_json.py # Ingest data with embeddings into database python ingest_lawrisk.py # Requires DASHSCOPE_API_KEY in .env ``` #### Code Quality ```bash # Format code with Black (100 char line length) black . # Lint with Ruff ruff . # Run tests (when added) pytest -q ``` #### Database Operations ```bash # Connect to PostgreSQL psql -h 8.138.196.105 -U postgres -d fs_law_risk # Connect to licensing_risks database psql -h 8.138.196.105 -U postgres -d licensing_risks ``` --- ## API Endpoints ### V1 API (Legacy) - **Path**: `/fs-ai-asistant/api/workflow/lawrisk` - **Methods**: GET, POST - **Mode**: `llm` (default) or `embed` - **Input**: `query` (user question) - **Output**: Simple array of matching subjects with permit IDs ### V2 API (Current/Recommended) - **Base Path**: `/fs-ai-asistant/api/workflow/lawrisk/v2` - **Methods**: GET, POST - **Features**: - Structured results with regions, themes, permits, and risks - Optional region filtering - Debug mode with detailed execution info - Direct permit matching by name #### V2 Sub-endpoints 1. **Search Endpoint** - Path: `/fs-ai-asistant/api/workflow/lawrisk/v2` - Parameters: - `query` (required): User question - `region` (optional): Filter by region (市级, 禅城区, etc.) - `debug` (optional): Enable debug output (1/true/yes/on) - `top` (optional): Number of recommendations (default: 5) 2. **Regions List** - Path: `/fs-ai-asistant/api/workflow/lawrisk/v2/regions` - Method: GET - Returns: All available regions for filtering 3. **Get Permits** - Path: `/fs-ai-asistant/api/workflow/lawrisk/getPermits` - Method: GET, POST - Input: `region` (region ID or name) - Returns: All permits for a specific region ### Health Check - Path: `/healthz` - Method: GET - Returns: `{"status": "ok"}` --- ## Database Schema ### Database 1: fs_law_risk Used for vector embeddings and semantic search. #### Tables - **`law_sub`**: Subject matter with embeddings - `id` (TEXT, PK): Subject ID - `name` (TEXT): Subject name - `vector` (JSONB): Embedding vector - **`law_sub_per`**: Subject-permit mappings - `sub_id` (TEXT, PK): Subject ID - `per_ids` (JSONB): Array of permit IDs - **`law_per`**: Permit information - `id` (TEXT, PK): Permit ID - `name` (TEXT): Permit name - `risk_ids` (JSONB): Array of risk IDs ### Database 2: licensing_risks Used for structured compliance data. #### Tables - **`regions`**: Administrative areas - `id` (PK), `name` (unique) - **`business_scopes`**: Business scope definitions - `id` (PK), `description` - **`region_scopes`**: Region-scope mappings - **`themes`**: Legal themes/subjects - `id` (PK), `name` - **`region_themes`**: Region-theme mappings - **`permits`**: License/permit items - `id` (PK), `name` - **`region_theme_permits`**: Tripartite linkage - **`risks`**: Risk information - `id` (PK), `risk_content`, `legal_basis`, `document_no`, `summary` - **`region_permit_risks`**: Risk associations --- ## Configuration ### Environment Variables (.env) #### DashScope (Embeddings & LLM) ``` DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 DASHSCOPE_EMBED_MODEL=text-embedding-v4 DASHSCOPE_EMBED_DIM=1024 DASHSCOPE_MAX_BATCH=10 DASHSCOPE_CHAT_MODEL=qwen-plus-latest ``` #### PostgreSQL Configuration ``` # fs_law_risk database PG_HOST=8.138.196.105 PG_PORT=5432 PG_USER=postgres PG_PASSWORD=difyai123456 PG_DATABASE=fs_law_risk PG_ADMIN_DB=postgres # licensing_risks database LIC_PG_HOST=8.138.196.105 LIC_PG_PORT=5432 LIC_PG_USER=postgres LIC_PG_PASSWORD=difyai123456 LIC_PG_DATABASE=licensing_risks ``` #### Application Settings ``` FLASK_ENV=development # Search thresholds (tunable) LAWRISK_RETURN_IF_GE=0.7 LAWRISK_FALLBACK_GT=0.5 ``` --- ## Testing ### Current Testing Status - **No dedicated test suite** in the repository - pytest framework is recommended but not configured - Manual testing available via `static/v2_tester.html` ### Testing Framework & Guidelines When adding tests, use: - **Framework**: pytest with Flask test client - **Focus Areas**: - API endpoints (V1 and V2) - Middleware behavior (CORS) - Database operations - LLM selection logic - Region filtering **Recommended test structure:** ``` tests/ ├── test_api_v1.py # V1 API endpoint tests ├── test_api_v2.py # V2 API endpoint tests ├── test_search_service.py # Core search logic ├── test_licensing_repo.py # Database repository └── conftest.py # Shared fixtures ``` ### Testing Commands ```bash # Run all tests pytest # Run with coverage pytest --cov=lawrisk_service # Run specific test file pytest tests/test_api_v2.py -v ``` ### Manual Testing with V2 Tester 1. Start the application: `python app.py` (defaults to port 8000) 2. Open `static/v2_tester.html` in your browser 3. Test queries like: - "我要办一家电影院" - "开办旅馆需要哪些许可" - "公共场所卫生许可" - Query with region filter: "电影院®ion=市级&debug=1" The tester provides a simple UI to experiment with the V2 API and view debug information. --- ## Coding Standards & Best Practices ### Python Style Guidelines - **Indentation**: 4 spaces (no tabs) - **Encoding**: UTF-8 for all source files - **Naming Conventions**: - Functions/variables: `snake_case` - Constants: `SCREAMING_SNAKE_CASE` - Classes: `PascalCase` - **Type Hints**: Prefer type hints for all public functions - **Code Formatting**: Use `black` with 100-character line length - **Linting**: Use `ruff` with default rules ### Code Quality Guidelines - Keep functions small and side-effect free - Prefer pure functions where possible - Document complex logic with comments - Use type hints from `typing` module - Handle errors gracefully with appropriate logging ### Documentation Files - **PRD.md** - Product Requirements Document (specifies business logic and requirements) - **API.md** - API documentation for V1 endpoints - **V2_API文档.md** - Detailed API documentation for V2 endpoints - **AGENTS.md** - Development guidelines and best practices - **DB_GUIDE.md** - Database schema reference and query examples --- ## Key Components Deep Dive ### 1. app.py - Application Entry Point - Creates Flask app with CORS enabled - Registers all API routes - Parameter extraction logic (GET/POST, JSON/form data) - Concurrent execution using ThreadPoolExecutor - Error handling and logging ### 2. lawrisk_service.py - Core Search Logic - **EmbeddingClient**: Handles DashScope API integration - **ChatClient**: Manages Qwen LLM interactions - Database helpers with pg8000 - Search algorithms: - Embedding-based cosine similarity - LLM-based subject selection - Similarity threshold management ### 3. lawrisk_v2_service.py - Enhanced API - Structured response formatting - Region filtering logic - Permit direct matching by name - Markdown formatting for legal text - Complex query execution pipeline ### 4. licensing_repo.py - Data Repository - Separate database connection for licensing_risks - Query optimization for multi-table joins - Legal text formatting helpers - Pattern matching for Chinese legal documents ### 5. smart_cors_middleware.py - Reusable CORS - Wildcard and exact origin matching - Subdomain support - Preflight OPTIONS handling - NGINX integration mode - Debug and logging features --- ## Troubleshooting Guide ### Common Issues #### Database Connection Errors **Symptom**: `pg8000.dbapi.Error` when starting the app **Solutions**: - Check `.env` file exists with correct PostgreSQL credentials - Verify network connectivity to the database server - Ensure PostgreSQL server is running and accessible - Check database names: `fs_law_risk` and `licensing_risks` #### Missing Environment Variables **Symptom**: Key errors or default values being used **Solutions**: - Create `.env` file from the template (see Configuration section) - Ensure `DASHSCOPE_API_KEY` is set for embedding/chat features - Verify all required PG_* and LIC_* environment variables #### LLM/Embedding API Errors **Symptom**: API authentication failures or timeout errors **Solutions**: - Verify `DASHSCOPE_API_KEY` is valid and has sufficient quota - Check `DASHSCOPE_BASE_URL` matches the API endpoint - Ensure network access to DashScope API servers - Review API rate limits and batch sizes #### Empty Search Results **Symptom**: API returns empty `risk_subject` array **Solutions**: - Check if database tables are populated (`fs_law_risk.law_sub`, etc.) - Try `debug=1` parameter to see detailed execution info - Verify similarity thresholds in `.env` (`LAWRISK_RETURN_IF_GE`, `LAWRISK_FALLBACK_GT`) - Test with known queries like "我要办一家电影院" #### Port Already in Use **Symptom**: `OSError: [Errno 10048] Only one usage of each socket address` **Solutions**: - Change port: `PORT=8001 python app.py` - Kill existing process using the port: `netstat -ano | findstr :8000` then `taskkill /PID /F` ### Debug Mode Enable debug logging to troubleshoot issues: ```bash # Enable Flask debug mode FLASK_DEBUG=1 python app.py # Enable CORS debug mode CORS_DEBUG=1 python app.py # Check app logs for registered routes and errors # Logs are printed to console when starting the app ``` ### Health Checks - **Basic health**: `GET /healthz` → `{"status": "ok"}` - **V2 regions**: `GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions` - Check logs for registered routes on app startup ### Data Verification Verify database content with queries from `DB_GUIDE.md`: ```sql -- Check subject count SELECT COUNT(*) FROM fs_law_risk.law_sub; -- Check region-theme pairs SELECT COUNT(*) FROM licensing_risks.region_themes; ```