fs-lawrisk/docs/CLAUDE.md

22 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.


LawRisk Backend - Claude Code Analysis

Project Overview

LawRisk is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese).

Python Version Requirement: Python 3.10+ (uses PEP 604 union types like str | None)

Key Features

  • Semantic Search: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics
  • LLM-Powered Matching: Qwen (qwen-plus-latest) for intelligent subject selection
  • Two Database Architecture:
    • fs_law_risk: Vector embeddings and subject-permit mappings
    • licensing_risks: Structured permit and risk data with regions, themes, and compliance information
  • RESTful APIs: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search
  • CORS Enabled: Built-in CORS middleware for frontend integration

Architecture & Project Structure

Core Framework & Libraries

  • Framework: Flask (Python web framework)
  • Database Driver: pg8000 (PostgreSQL adapter)
  • Vector Embeddings: Aliyun DashScope OpenAI-compatible API
  • LLM: Qwen via DashScope (qwen-plus-latest)
  • Dependencies: Minimal footprint - Flask, pg8000, concurrent.futures

Directory Structure

市监局-lawRisk-backend/
├── app.py                          # Flask application entry point
├── requirements.txt                # Python dependencies
├── .env                            # Environment configuration
├── lawrisk/                        # Main application package
│   ├── __init__.py
│   ├── api/                        # API route handlers
│   │   ├── v1.py                   # V1 API (legacy)
│   │   └── v2.py                   # V2 API (current)
│   ├── services/                   # Business logic layer
│   │   ├── lawrisk_service.py      # Core search & embeddings
│   │   ├── lawrisk_v2_service.py   # V2 enhanced service
│   │   └── licensing_repo.py       # Data repository
│   ├── middleware/                 # HTTP middleware
│   │   └── smart_cors_middleware.py
│   └── utils/                      # Utility functions
│       ├── env_loader.py
│       ├── export_risk_json.py
│       └── ingest_lawrisk.py
├── static/                         # Static assets
│   └── v2_tester.html              # Web-based API tester
├── tests/                          # Test suite (planned)
├── data/                           # Data files
│   ├── risk_tables_export.json
│   └── licensing_risks_dump.sql
└── docs/                           # Documentation
    ├── PRD.md
    ├── API.md
    ├── V2_API文档.md
    ├── AGENTS.md
    ├── DB_GUIDE.md
    └── CLAUDE.md

Quick Reference

Most Common Commands

# Run the application
python app.py

# Export data from database
python export_risk_json.py

# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python ingest_lawrisk.py

# Format and lint code
black .
ruff .

# Test locally via browser
# Open static/v2_tester.html after starting the app

Key Files

  • app.py - Flask application entry point
  • lawrisk/ - Main application package
    • api/v1.py - V1 API routes (legacy)
    • api/v2.py - V2 API routes (current)
    • services/lawrisk_service.py - Core search & embeddings
    • services/lawrisk_v2_service.py - V2 enhanced service
    • services/licensing_repo.py - Data repository
    • middleware/smart_cors_middleware.py - CORS middleware
    • utils/ - Utility functions
  • static/v2_tester.html - Web-based API testing interface
  • requirements.txt - Python dependencies
  • .env - Environment configuration

Development Workflow

Initial Setup

# 1. Create virtual environment (Windows PowerShell)
python -m venv .venv
.venv\Scripts\activate

# 2. Install dependencies
pip install Flask pg8000 black ruff pytest

# 3. Configure environment
# Edit .env with your database credentials and DashScope API key

Virtual Environment Activation

# Windows PowerShell
.venv\Scripts\activate

# Windows CMD
.venv\Scripts\activate.bat

# Git Bash (Windows)
source .venv/Scripts/activate

Common Commands

Run the Application

# Development mode
python app.py

# Custom port
PORT=8000 python app.py

# With debug logging
FLASK_DEBUG=1 python app.py

Data Management

# Export data from fs_law_risk database to JSON
python export_risk_json.py

# Ingest data with embeddings into database
python ingest_lawrisk.py
# Requires DASHSCOPE_API_KEY in .env

Code Quality

# Format code with Black (100 char line length)
black .

# Lint with Ruff
ruff .

# Format and lint specific file
black lawrisk/services/lawrisk_v2_service.py
ruff lawrisk/services/lawrisk_v2_service.py

# Run tests (when added)
pytest -q

# Run tests with coverage
pytest --cov=lawrisk

Data Management Commands

# Export data from fs_law_risk database to JSON
# Output: data/risk_tables_export.json
python lawrisk/utils/export_risk_json.py

# Ingest data with embeddings into database
# Requires DASHSCOPE_API_KEY in .env
python lawrisk/utils/ingest_lawrisk.py

# Verify exported data
ls -lh data/
cat data/risk_tables_export.json | head -50

Database Operations

# Connect to PostgreSQL
psql -h 8.138.196.105 -U postgres -d fs_law_risk

# Connect to licensing_risks database
psql -h 8.138.196.105 -U postgres -d licensing_risks

API Endpoints

V1 API (Legacy)

  • Path: /fs-ai-asistant/api/workflow/lawrisk
  • Methods: GET, POST
  • Mode: llm (default) or embed
  • Input: query (user question)
  • Output: Simple array of matching subjects with permit IDs

V2 API (Current/Recommended)

  • Base Path: /fs-ai-asistant/api/workflow/lawrisk/v2
  • Methods: GET, POST
  • Features:
    • Structured results with regions, themes, permits, and risks
    • Optional region filtering
    • Debug mode with detailed execution info
    • Direct permit matching by name

V2 Sub-endpoints

  1. Search Endpoint

    • Path: /fs-ai-asistant/api/workflow/lawrisk/v2
    • Parameters:
      • query (required): User question
      • region (optional): Filter by region (市级, 禅城区, etc.)
      • debug (optional): Enable debug output (1/true/yes/on)
      • top (optional): Number of recommendations (default: 5)
  2. Regions List

    • Path: /fs-ai-asistant/api/workflow/lawrisk/v2/regions
    • Method: GET
    • Returns: All available regions for filtering
  3. Get Permits

    • Path: /fs-ai-asistant/api/workflow/lawrisk/getPermits
    • Method: GET, POST
    • Input: region (region ID or name)
    • Returns: All permits for a specific region

Health Check

  • Path: /healthz
  • Method: GET
  • Returns: {"status": "ok"}

Database Schema

Database 1: fs_law_risk

Used for vector embeddings and semantic search.

Tables

  • law_sub: Subject matter with embeddings

    • id (TEXT, PK): Subject ID
    • name (TEXT): Subject name
    • vector (JSONB): Embedding vector
  • law_sub_per: Subject-permit mappings

    • sub_id (TEXT, PK): Subject ID
    • per_ids (JSONB): Array of permit IDs
  • law_per: Permit information

    • id (TEXT, PK): Permit ID
    • name (TEXT): Permit name
    • risk_ids (JSONB): Array of risk IDs

Database 2: licensing_risks

Used for structured compliance data.

Tables

  • regions: Administrative areas

    • id (PK), name (unique)
  • business_scopes: Business scope definitions

    • id (PK), description
  • region_scopes: Region-scope mappings

  • themes: Legal themes/subjects

    • id (PK), name
  • region_themes: Region-theme mappings

  • permits: License/permit items

    • id (PK), name
  • region_theme_permits: Tripartite linkage

  • risks: Risk information

    • id (PK), risk_content, legal_basis, document_no, summary
  • region_permit_risks: Risk associations


Configuration

Environment Variables (.env)

DashScope (Embeddings & LLM)

DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest

PostgreSQL Configuration

# fs_law_risk database
PG_HOST=8.138.196.105
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=difyai123456
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres

# licensing_risks database
LIC_PG_HOST=8.138.196.105
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=difyai123456
LIC_PG_DATABASE=licensing_risks

Application Settings

FLASK_ENV=development

# Search thresholds (tunable)
LAWRISK_RETURN_IF_GE=0.7
LAWRISK_FALLBACK_GT=0.5

Testing

Current Testing Status

  • No dedicated test suite in the repository
  • pytest framework is recommended but not configured
  • Manual testing available via static/v2_tester.html

Testing Framework & Guidelines

When adding tests, use:

  • Framework: pytest with Flask test client
  • Dependencies: pytest-cov for coverage reporting
  • Focus Areas:
    • API endpoints (V1 and V2)
    • Middleware behavior (CORS)
    • Database operations
    • LLM selection logic
    • Region filtering
    • Checkpoint operations (create, list, restore, delete)

Recommended test structure:

tests/
├── conftest.py                  # Shared fixtures and test configuration
├── test_api_v1.py               # V1 API endpoint tests
├── test_api_v2.py               # V2 API endpoint tests
├── test_admin_endpoints.py      # Admin endpoint tests
├── test_search_service.py       # Core search logic
├── test_licensing_repo.py       # Database repository
└── test_cors_middleware.py      # CORS middleware behavior

# Test naming: test_*.py (pytest discovery pattern)
# Test functions: test_<feature>_<scenario>()

Testing CORS Middleware (from AGENTS.md):

  • Origin matching: wildcard, exact, subdomains
  • Preflight OPTIONS handling
  • X-CORS-Decision header behavior
  • NGINX_CORS_MODE functionality
  • Environment variable configurations (ALLOWED_ORIGINS, CORS_STRICT, etc.)

Example test case:

def test_v2_search_with_debug():
    client = app.test_client()
    response = client.post('/fs-ai-asistant/api/workflow/lawrisk/v2',
                          data={'query': '电影院', 'debug': '1'})
    assert response.status_code == 200
    data = response.get_json()['data']
    assert 'debug' in data
    assert 'executionTime' in data

Testing Commands

# Run all tests
pytest

# Run with coverage
pytest --cov=lawrisk_service

# Run specific test file
pytest tests/test_api_v2.py -v

Manual Testing with V2 Tester

  1. Start the application: python app.py (defaults to port 8000)
  2. Open static/v2_tester.html in your browser
  3. Test queries like:
    • "我要办一家电影院"
    • "开办旅馆需要哪些许可"
    • "公共场所卫生许可"
    • Query with region filter: "电影院&region=市级&debug=1"

The tester provides a simple UI to experiment with the V2 API and view debug information.


Coding Standards & Best Practices

Python Style Guidelines

  • Indentation: 4 spaces (no tabs)
  • Encoding: UTF-8 for all source files
  • Naming Conventions:
    • Functions/variables: snake_case
    • Constants: SCREAMING_SNAKE_CASE
    • Classes: PascalCase
  • Type Hints: Prefer type hints for all public functions
  • Code Formatting: Use black with 100-character line length
  • Linting: Use ruff with default rules

Code Quality Guidelines

  • Keep functions small and side-effect free
  • Prefer pure functions where possible
  • Document complex logic with comments
  • Use type hints from typing module
  • Handle errors gracefully with appropriate logging

Documentation Files

  • PRD.md (docs/PRD.md) - Product Requirements Document

    • Business logic and requirements specification
    • Feature specifications and user stories
  • API.md (docs/API.md) - V1 API documentation

    • Legacy API endpoints and usage
    • Request/response formats
  • V2_API文档.md (docs/V2_API文档.md) - Detailed V2 API documentation

    • Enhanced API specification
    • Admin endpoints and checkpoint operations
    • Request/response examples
  • AGENTS.md (docs/AGENTS.md) - Development guidelines

    • Repository structure and module organization
    • Coding style and naming conventions
    • Commit and PR guidelines
    • Security and configuration tips
  • DB_GUIDE.md (docs/DB_GUIDE.md) - Database reference

    • Schema reference for both databases
    • Query examples and optimization tips
  • README.md (README.md) - Project overview and quick start


Key Components Deep Dive

1. app.py - Application Entry Point (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\app.py)

  • Creates Flask app with CORS enabled
  • Registers all API routes (v1_bp, v2_bp)
  • Database initialization and schema checks on startup
  • Health check endpoint at /healthz
  • Logs all registered routes on startup
  • Error handling doesn't block app startup (errors surface on first request)

2. lawrisk_service.py - Core Search Logic (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_service.py)

  • EmbeddingClient: DashScope API integration for vector embeddings
  • ChatClient: Qwen LLM interaction for intelligent subject selection
  • Database helpers using pg8000 for PostgreSQL
  • Search algorithms:
    • Embedding-based cosine similarity search
    • LLM-based subject matching using qwen-plus-latest
  • Similarity threshold management (tunable via LAWRISK_RETURN_IF_GE, LAWRISK_FALLBACK_GT)
  • Concurrent execution support with ThreadPoolExecutor

3. lawrisk_v2_service.py - Enhanced API (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_v2_service.py)

  • Structured response formatting with regions, themes, permits, and risks
  • Region filtering logic with normalization
  • Direct permit matching by name
  • Markdown formatting for legal text
  • Complex query execution pipeline
  • Helper functions:
    • _compose_prompt(): Builds natural-language prompts from structured data
    • _normalize_region_filter(): Normalizes region filters for matching

4. licensing_repo.py - Data Repository (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\licensing_repo.py)

  • Separate database connection for licensing_risks database
  • Multi-table join query optimization
  • Legal text formatting helpers
  • Chinese legal document pattern matching
  • Checkpoint management:
    • create_checkpoint(): Database backup functionality
    • list_checkpoints(): List available backups
    • restore_checkpoint(): Restore from checkpoint (DANGEROUS!)
    • delete_checkpoint(): Remove old checkpoints

5. smart_cors_middleware.py - Reusable CORS (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\middleware\smart_cors_middleware.py)

  • Wildcard and exact origin matching
  • Subdomain support with flexible patterns
  • Preflight OPTIONS handling
  • NGINX integration mode (NGINX_CORS_MODE)
  • Debug and logging features (CORS_DEBUG)
  • Environment variable support:
    • ALLOWED_ORIGINS, CORS_STRICT, CORS_DEBUG
    • NGINX_CORS_MODE, CORS_MAX_AGE, CORS_EXPOSE_HEADERS

6. v2.py API Routes (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\api\v2.py)

  • Public endpoints: /v2, /v2/regions, /getPermits
  • Admin endpoints: /admin/test, /admin/regions, /admin/themes, /admin/permits, /admin/checkpoints
  • Parameter extraction supporting GET, POST, JSON, and form data
  • Concurrent execution using ThreadPoolExecutor (max_workers=2)
  • Structured error responses with consistent format

7. Utility Scripts

  • env_loader.py: Environment variable loading from .env file
  • export_risk_json.py: PostgreSQL data export utility (outputs to data/risk_tables_export.json)
  • ingest_lawrisk.py: Data ingestion with embeddings (requires DASHSCOPE_API_KEY)

Security & Best Practices

Security Guidelines

  • NEVER hardcode secrets in source code
  • All credentials must be in .env file or environment variables
  • API keys (DASHSCOPE_API_KEY) and database passwords must be externalized
  • Admin endpoints (/admin/*) should be protected in production
  • Database checkpoint restore is a DANGEROUS OPERATION and should be restricted

Configuration Best Practices

  • Use .env file for all configuration (database, API keys, thresholds)
  • Environment variables supported by CORS middleware:
    • ALLOWED_ORIGINS: Comma-separated list of allowed origins
    • CORS_STRICT: Enable strict origin checking
    • CORS_DEBUG: Enable debug logging
    • NGINX_CORS_MODE: Enable NGINX integration
    • CORS_MAX_AGE: Preflight cache duration
    • CORS_EXPOSE_HEADERS: Headers to expose to browsers
  • Regularly backup databases using checkpoint system
  • Monitor DashScope API quota and rate limits

Troubleshooting Guide

Common Issues

Database Connection Errors

Symptom: pg8000.dbapi.Error when starting the app Solutions:

  • Check .env file exists with correct PostgreSQL credentials
  • Verify network connectivity to the database server
  • Ensure PostgreSQL server is running and accessible
  • Check database names: fs_law_risk and licensing_risks
  • Verify PG_HOST, PG_PORT, PG_USER, PG_PASSWORD are correct

Missing Environment Variables

Symptom: Key errors, default values being used, or API failures Solutions:

  • Create .env file from the template (see Configuration section)
  • Ensure DASHSCOPE_API_KEY is set for embedding/chat features
  • Verify all required PG_* and LIC_* environment variables
  • Check that pg8000 is installed: pip install pg8000>=1.30.0

LLM/Embedding API Errors

Symptom: API authentication failures, timeout errors, or embedding errors Solutions:

  • Verify DASHSCOPE_API_KEY is valid and has sufficient quota
  • Check DASHSCOPE_BASE_URL matches: https://dashscope.aliyuncs.com/compatible-mode/v1
  • Ensure network access to DashScope API servers
  • Review API rate limits and batch sizes (DASHSCOPE_MAX_BATCH)
  • Test API key: curl -H "Authorization: Bearer $DASHSCOPE_API_KEY" "$DASHSCOPE_BASE_URL/models"

Empty Search Results

Symptom: API returns empty risk_subject array Solutions:

  • Check if database tables are populated:
    SELECT COUNT(*) FROM fs_law_risk.law_sub;
    SELECT COUNT(*) FROM fs_law_risk.law_sub_per;
    
  • Try debug=1 parameter to see detailed execution info
  • Verify similarity thresholds in .env:
    • LAWRISK_RETURN_IF_GE=0.7 (return if similarity >= 0.7)
    • LAWRISK_FALLBACK_GT=0.5 (fallback if similarity > 0.5)
  • Test with known queries like "我要办一家电影院" or "开办旅馆"
  • Check if embeddings exist: SELECT id FROM fs_law_risk.law_sub LIMIT 5;

Port Already in Use

Symptom: OSError: [Errno 10048] Only one usage of each socket address Solutions:

  • Change port: PORT=8001 python app.py
  • Kill existing process using the port:
    netstat -ano | findstr :8000
    taskkill /PID <PID> /F
    

Data Export/Import Issues

Symptom: export_risk_json.py or ingest_lawrisk.py fails Solutions:

  • Verify PostgreSQL credentials in .env match database access
  • Ensure export script isn't writing outside the repository
  • For ingestion, confirm DASHSCOPE_API_KEY is valid
  • Check data files exist: data/risk_tables_export.json
  • Run export first, then ingestion if needed

Health Check Commands

# Basic health check
curl http://localhost:8000/healthz

# Check registered routes (see app startup logs)
python app.py 2>&1 | grep "Registered routes"

# Test V2 API with debug
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "query=我要办一家电影院&debug=1"

# Test regions endpoint
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions

# Test admin endpoints
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/admin/test

Database Verification

Verify database content with queries from DB_GUIDE.md:

-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;

-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;

-- Check if embeddings exist
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;

-- List available regions
SELECT * FROM licensing_risks.regions ORDER BY name;

Debug Mode

Enable debug logging to troubleshoot issues:

# Enable Flask debug mode
FLASK_DEBUG=1 python app.py

# Enable CORS debug mode
CORS_DEBUG=1 python app.py

# Check app logs for registered routes and errors
# Logs are printed to console when starting the app

Health Checks

  • Basic health: GET /healthz{"status": "ok"}
  • V2 regions: GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions
  • Check logs for registered routes on app startup

Data Verification

Verify database content with queries from DB_GUIDE.md:

-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;

-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;