# CLAUDE.md This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. --- # LawRisk Backend - Development Guide ## Project Overview **LawRisk** is a Flask-based Python backend service for intelligent legal compliance risk retrieval. It helps users find permits, licenses, and legal risks based on natural language queries using vector embeddings and LLM matching. ### Tech Stack - **Framework**: Flask 2.3+ - **Database**: PostgreSQL (pg8000 driver) - **AI Services**: 阿里云DashScope (text-embedding-v4, qwen-plus-latest) - **Development**: Black, Ruff, Pytest ### Two-Database Architecture 1. **fs_law_risk**: Vector embeddings and subject-permit mappings 2. **licensing_risks**: Structured permit and risk data (regions, themes, compliance) --- ## Quick Reference ### Most Common Commands ```bash # Run the application (port 8000) python app.py # Install dependencies pip install -r requirements.txt # Format and lint code black . ruff . # Run tests pytest pytest --cov=lawrisk # Test API via curl curl http://localhost:8000/healthz curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \ -d "query=我要办一家电影院&debug=1" ``` ### Key File Locations - `app.py` - Flask application entry point - `lawrisk/api/v2.py` - V2 API routes (current) - `lawrisk/services/lawrisk_v2_service.py` - Enhanced V2 search logic - `lawrisk/services/licensing_repo.py` - Database operations - `lawrisk/api/auth.py` - Authentication endpoints - `static/v2_tester.html` - Web-based API testing interface - `tests/test_auth.py` - Auth system tests - `tests/test_checkpoint_security.py` - Checkpoint system tests --- ## Architecture & Code Structure ### Request Flow ``` HTTP Request → lawrisk/api/ (routing layer) → lawrisk/services/ (business logic) → lawrisk/services/licensing_repo.py (database access) → DashScope API (embeddings & LLM) ``` ### Core Modules **1. API Layer (`lawrisk/api/`)** - `v1.py` - Legacy API (deprecated) - `v2.py` - Current API with structured responses + admin endpoints - `auth.py` - Authentication (login/logout/me endpoints) **2. Services Layer (`lawrisk/services/`)** - `lawrisk_service.py` - Core search with embeddings (cosine similarity) + LLM matching - `lawrisk_v2_service.py` - Enhanced V2 with structured results, region filtering, direct permit matching - `licensing_repo.py` - PostgreSQL operations (both databases), checkpoint management - `auth_service.py` - User authentication, password hashing, seed admin creation **3. Middleware & Utils** - `middleware/smart_cors_middleware.py` - Configurable CORS (wildcard, subdomains, NGINX mode) - `utils/env_loader.py` - Environment variable loading - `utils/export_risk_json.py` - Database export utility - `utils/ingest_lawrisk.py` - Data ingestion with embeddings --- ## API Endpoints ### Public Endpoints #### V2 Search (Current) - **Path**: `/fs-ai-asistant/api/workflow/lawrisk/v2` - **Method**: POST (recommended), GET - **Params**: - `query` (required): User question - `region` (optional): Filter by region (市级, 禅城区, etc.) - `debug` (optional): Enable debug output (1/true/yes/on) - `top` (optional): Number of recommendations (default: 5) - **Returns**: Structured results with regions, themes, permits, risks #### Supporting Endpoints - `GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions` - List all regions - `GET /fs-ai-asistant/api/workflow/lawrisk/getPermits` - Get permits by region - `GET /healthz` - Health check ### Authentication Endpoints - `GET /fs-ai-asistant/lawrisk/login` - Login page (HTML) - `POST /auth/login` - Authenticate user - `GET /auth/me` - Get current user - `GET /auth/logout` - Logout ### Admin Endpoints (Protected) - `GET /fs-ai-asistant/api/workflow/lawrisk/admin/test` - Admin test - `GET /fs-ai-asistant/api/workflow/lawrisk/admin/regions` - Region management - `GET /fs-ai-asistant/api/workflow/lawrisk/admin/themes` - Theme management - `GET /fs-ai-asistant/api/workflow/lawrisk/admin/permits` - Permit management - `GET /fs-ai-asistant/api/workflow/lawrisk/admin/checkpoints` - Checkpoint management (create/list/restore/delete) --- ## Development Workflow ### Environment Setup ```bash # Windows PowerShell python -m venv .venv .venv\Scripts\activate # Install dependencies pip install -r requirements.txt # Configure .env with database credentials and DashScope API key ``` ### Testing ```bash # Run all tests pytest # Run with coverage pytest --cov=lawrisk --cov-report=html # Run specific test file pytest tests/test_auth.py -v # Test authentication pytest tests/test_auth.py::test_login_success -v ``` **Test Files**: - `tests/test_auth.py` - Authentication system tests (login, logout, session management) - `tests/test_checkpoint_security.py` - Database checkpoint security tests ### Code Quality ```bash # Format code black . # Lint with Ruff ruff . # Check specific file black lawrisk/services/lawrisk_v2_service.py ruff lawrisk/services/lawrisk_v2_service.py ``` ### Data Management ```bash # Export data from fs_law_risk database python lawrisk/utils/export_risk_json.py # Output: data/risk_tables_export.json # Ingest data with embeddings (requires DASHSCOPE_API_KEY) python lawrisk/utils/ingest_lawrisk.py ``` --- ## Configuration ### Required Environment Variables (.env) #### DashScope AI Services ``` DASHSCOPE_API_KEY=your_api_key DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1 DASHSCOPE_EMBED_MODEL=text-embedding-v4 DASHSCOPE_EMBED_DIM=1024 DASHSCOPE_MAX_BATCH=10 DASHSCOPE_CHAT_MODEL=qwen-plus-latest ``` #### PostgreSQL Databases ``` # fs_law_risk (embeddings database) PG_HOST=your_host PG_PORT=5432 PG_USER=postgres PG_PASSWORD=your_password PG_DATABASE=fs_law_risk PG_ADMIN_DB=postgres # licensing_risks (structured data) LIC_PG_HOST=your_host LIC_PG_PORT=5432 LIC_PG_USER=postgres LIC_PG_PASSWORD=your_password LIC_PG_DATABASE=licensing_risks ``` #### Authentication ``` FLASK_SECRET_KEY=your-secret-key LAWRISK_ADMIN_USERNAME=admin LAWRISK_ADMIN_PASSWORD=adminpassword123 # Optional: LAWRISK_ADMIN_ROLE, LAWRISK_ADMIN_GRADE, LAWRISK_ADMIN_DISPLAY_NAME ``` #### Search Thresholds ``` LAWRISK_RETURN_IF_GE=0.7 # Return results if similarity >= 0.7 LAWRISK_FALLBACK_GT=0.5 # Use fallback if similarity > 0.5 ``` --- ## Database Schema ### fs_law_risk (Vector Embeddings) - **`law_sub`** - Subject matter with embeddings (id, name, vector) - **`law_sub_per`** - Subject-permit mappings (sub_id, per_ids) - **`law_per`** - Permit information (id, name, risk_ids) ### licensing_risks (Structured Compliance) - **`regions`** - Administrative areas - **`themes`** - Legal themes/subjects - **`permits`** - License/permit items - **`risks`** - Risk information (content, legal_basis, document_no, summary) - **`business_scopes`** - Business scope definitions - **Junction tables**: region_themes, region_theme_permits, region_permit_risks ### Checkpoint System Licensing_repo.py implements database checkpoint management: - `create_checkpoint()` - Create database backup - `list_checkpoints()` - List available backups - `restore_checkpoint()` - **DANGEROUS** - Restore from checkpoint - `delete_checkpoint()` - Remove old checkpoints --- ## Security Guidelines ### Critical Security Notes - **NEVER commit secrets** - All credentials in `.env` or environment variables - **Protect admin endpoints** - `/admin/*` should be restricted in production - **Checkpoint restore is dangerous** - Database operation with confirmation flow - **API keys externalized** - `DASHSCOPE_API_KEY` and database passwords must be in `.env` ### Authentication System - Session-based auth using Flask sessions - Password hashing with `passlib` - First admin auto-created from environment variables on startup - Role-based access (admin, reviewer, analyst, etc.) - Login page: `/fs-ai-asistant/lawrisk/login` - Protected endpoints use `@login_required` decorator --- ## Recent Features (from git log) ### Checkpoint System (Recent) - Database backup/restore functionality - Timeline view of checkpoints - Progress indicators for restore operations - Security tests in `test_checkpoint_security.py` ### Permit Risk Snapshot - Workflow for permit risk snapshots - Unified snapshot and checkpoint timeline - Enhanced batch display for snapshots ### Licensing Import Enhancement - Optimized district/region merging during import - Enhanced source display for permits --- ## Testing Guidelines ### Test Structure ``` tests/ ├── __init__.py ├── test_auth.py # Auth system tests (login, session, decorators) └── test_checkpoint_security.py # Checkpoint security tests ``` ### Running Tests ```bash # All tests pytest # Verbose output pytest -v # Coverage report pytest --cov=lawrisk --cov-report=term-missing # Specific test pytest tests/test_auth.py::test_login_success -v ``` ### Manual Testing 1. Start app: `python app.py` 2. Open browser: `static/v2_tester.html` 3. Test queries: - "我要办一家电影院" - "开办旅馆需要哪些许可" - With region filter and debug mode --- ## Troubleshooting ### Common Issues #### Database Connection ```bash # Verify database is accessible psql -h $PG_HOST -U $PG_USER -d $PG_DATABASE # Check tables exist SELECT COUNT(*) FROM fs_law_risk.law_sub; SELECT COUNT(*) FROM licensing_risks.regions; ``` #### API Errors ```bash # Test health check curl http://localhost:8000/healthz # Test V2 API with debug curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \ -d "query=电影院&debug=1" # Check app logs for registered routes python app.py 2>&1 | grep "Registered routes" ``` #### Missing Embeddings ```bash # Check if embeddings exist SELECT id, name FROM fs_law_risk.law_sub LIMIT 5; # If empty, run ingestion python lawrisk/utils/ingest_lawrisk.py ``` --- ## Documentation Files - **README.md** - Project overview and quick start - **AGENTS.md** - Development guidelines, coding style, testing approach - **docs/V2_API文档.md** - Detailed V2 API documentation - **docs/API.md** - V1 API documentation (legacy) - **docs/DB_GUIDE.md** - Database schema and query examples - **docs/PRD.md** - Product requirements - **docs/CLAUDE.md** - Detailed Claude Code guidance (comprehensive) --- ## Key Components Deep Dive ### V2 Service Architecture `lawrisk_v2_service.py` implements: - Structured response formatting - Region filter normalization - Direct permit name matching - Markdown formatting for legal text - Complex query execution pipeline with concurrency ### Authentication Flow `lawrisk/api/auth.py` provides: - Login page with redirect handling - Session management - `@login_required` decorator for protecting endpoints - JSON vs HTML response handling (API vs browser) ### Checkpoint Security `test_checkpoint_security.py` tests: - Checkpoint creation authorization - Restore operation security - User permission validation - Operation audit logging --- ## Best Practices ### Code Style - **Black**: 100-character line length, Python 3.10+ - **Type Hints**: Use PEP 604 union types (`str | None`) - **Imports**: Ruff-compatible, group by standard library → third-party → local - **Naming**: snake_case (functions/variables), SCREAMING_SNAKE_CASE (constants), PascalCase (classes) ### Error Handling - Graceful degradation on startup (errors surface on first request) - Structured error responses: `{"success": false, "message": "error", "data": {}}` - Logging to stdout with structured format ### Configuration - Use `lawrisk.utils.env_loader` for environment variables - Default values for non-critical configs - Environment-specific overrides supported --- ## Health Checks ```bash # Basic health curl http://localhost:8000/healthz # Check regions curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions # Test search curl -X POST http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2 \ -d "query=电影院&debug=1" ``` View app startup logs to see all registered routes.