22 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

LawRisk Backend - Claude Code Analysis

Project Overview

LawRisk is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese).

Python Version Requirement: Python 3.10+ (uses PEP 604 union types like str | None)

Key Features

Semantic Search: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics
LLM-Powered Matching: Qwen (qwen-plus-latest) for intelligent subject selection
Two Database Architecture:
- fs_law_risk: Vector embeddings and subject-permit mappings
- licensing_risks: Structured permit and risk data with regions, themes, and compliance information
RESTful APIs: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search
CORS Enabled: Built-in CORS middleware for frontend integration

Architecture & Project Structure

Core Framework & Libraries

Framework: Flask (Python web framework)
Database Driver: pg8000 (PostgreSQL adapter)
Vector Embeddings: Aliyun DashScope OpenAI-compatible API
LLM: Qwen via DashScope (qwen-plus-latest)
Dependencies: Minimal footprint - Flask, pg8000, concurrent.futures

Directory Structure

市监局-lawRisk-backend/
├── app.py                          # Flask application entry point
├── requirements.txt                # Python dependencies
├── .env                            # Environment configuration
├── lawrisk/                        # Main application package
│   ├── __init__.py
│   ├── api/                        # API route handlers
│   │   ├── v1.py                   # V1 API (legacy)
│   │   └── v2.py                   # V2 API (current)
│   ├── services/                   # Business logic layer
│   │   ├── lawrisk_service.py      # Core search & embeddings
│   │   ├── lawrisk_v2_service.py   # V2 enhanced service
│   │   └── licensing_repo.py       # Data repository
│   ├── middleware/                 # HTTP middleware
│   │   └── smart_cors_middleware.py
│   └── utils/                      # Utility functions
│       ├── env_loader.py
│       ├── export_risk_json.py
│       └── ingest_lawrisk.py
├── static/                         # Static assets
│   └── v2_tester.html              # Web-based API tester
├── tests/                          # Test suite (planned)
├── data/                           # Data files
│   ├── risk_tables_export.json
│   └── licensing_risks_dump.sql
└── docs/                           # Documentation
    ├── PRD.md
    ├── API.md
    ├── V2_API文档.md
    ├── AGENTS.md
    ├── DB_GUIDE.md
    └── CLAUDE.md

Quick Reference

Most Common Commands

# Run the application
python app.py

# Export data from database
python export_risk_json.py

# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python ingest_lawrisk.py

# Format and lint code
black .
ruff .

# Test locally via browser
# Open static/v2_tester.html after starting the app

Key Files

app.py - Flask application entry point
lawrisk/ - Main application package
- api/v1.py - V1 API routes (legacy)
- api/v2.py - V2 API routes (current)
- services/lawrisk_service.py - Core search & embeddings
- services/lawrisk_v2_service.py - V2 enhanced service
- services/licensing_repo.py - Data repository
- middleware/smart_cors_middleware.py - CORS middleware
- utils/ - Utility functions
static/v2_tester.html - Web-based API testing interface
requirements.txt - Python dependencies
.env - Environment configuration

Development Workflow

Initial Setup

# 1. Create virtual environment (Windows PowerShell)
python -m venv .venv
.venv\Scripts\activate

# 2. Install dependencies
pip install Flask pg8000 black ruff pytest

# 3. Configure environment
# Edit .env with your database credentials and DashScope API key

Virtual Environment Activation

# Windows PowerShell
.venv\Scripts\activate

# Windows CMD
.venv\Scripts\activate.bat

# Git Bash (Windows)
source .venv/Scripts/activate

Common Commands

Run the Application

# Development mode
python app.py

# Custom port
PORT=8000 python app.py

# With debug logging
FLASK_DEBUG=1 python app.py

Data Management

# Export data from fs_law_risk database to JSON
python export_risk_json.py

# Ingest data with embeddings into database
python ingest_lawrisk.py
# Requires DASHSCOPE_API_KEY in .env

Code Quality

# Format code with Black (100 char line length)
black .

# Lint with Ruff
ruff .

# Format and lint specific file
black lawrisk/services/lawrisk_v2_service.py
ruff lawrisk/services/lawrisk_v2_service.py

# Run tests (when added)
pytest -q

# Run tests with coverage
pytest --cov=lawrisk

Data Management Commands

# Export data from fs_law_risk database to JSON
# Output: data/risk_tables_export.json
python lawrisk/utils/export_risk_json.py

# Ingest data with embeddings into database
# Requires DASHSCOPE_API_KEY in .env
python lawrisk/utils/ingest_lawrisk.py

# Verify exported data
ls -lh data/
cat data/risk_tables_export.json | head -50

Database Operations

# Connect to PostgreSQL
psql -h 8.138.196.105 -U postgres -d fs_law_risk

# Connect to licensing_risks database
psql -h 8.138.196.105 -U postgres -d licensing_risks

API Endpoints

V1 API (Legacy)

Path: /fs-ai-asistant/api/workflow/lawrisk
Methods: GET, POST
Mode: llm (default) or embed
Input: query (user question)
Output: Simple array of matching subjects with permit IDs

V2 API (Current/Recommended)

Base Path: /fs-ai-asistant/api/workflow/lawrisk/v2
Methods: GET, POST
Features:
- Structured results with regions, themes, permits, and risks
- Optional region filtering
- Debug mode with detailed execution info
- Direct permit matching by name

V2 Sub-endpoints

Search Endpoint
- Path: /fs-ai-asistant/api/workflow/lawrisk/v2
- Parameters:
  - query (required): User question
  - region (optional): Filter by region (市级, 禅城区, etc.)
  - debug (optional): Enable debug output (1/true/yes/on)
  - top (optional): Number of recommendations (default: 5)
Regions List
- Path: /fs-ai-asistant/api/workflow/lawrisk/v2/regions
- Method: GET
- Returns: All available regions for filtering
Get Permits
- Path: /fs-ai-asistant/api/workflow/lawrisk/getPermits
- Method: GET, POST
- Input: region (region ID or name)
- Returns: All permits for a specific region

Health Check

Path: /healthz
Method: GET
Returns: {"status": "ok"}

Database Schema

Database 1: fs_law_risk

Used for vector embeddings and semantic search.

Tables

law_sub: Subject matter with embeddings
- id (TEXT, PK): Subject ID
- name (TEXT): Subject name
- vector (JSONB): Embedding vector
law_sub_per: Subject-permit mappings
- sub_id (TEXT, PK): Subject ID
- per_ids (JSONB): Array of permit IDs
law_per: Permit information
- id (TEXT, PK): Permit ID
- name (TEXT): Permit name
- risk_ids (JSONB): Array of risk IDs

Database 2: licensing_risks

Used for structured compliance data.

Tables

regions: Administrative areas
- id (PK), name (unique)
business_scopes: Business scope definitions
- id (PK), description
region_scopes: Region-scope mappings
themes: Legal themes/subjects
- id (PK), name
region_themes: Region-theme mappings
permits: License/permit items
- id (PK), name
region_theme_permits: Tripartite linkage
risks: Risk information
- id (PK), risk_content, legal_basis, document_no, summary
region_permit_risks: Risk associations

Configuration

Environment Variables (.env)

DashScope (Embeddings & LLM)

DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest

PostgreSQL Configuration

# fs_law_risk database
PG_HOST=8.138.196.105
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=difyai123456
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres

# licensing_risks database
LIC_PG_HOST=8.138.196.105
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=difyai123456
LIC_PG_DATABASE=licensing_risks

Application Settings

FLASK_ENV=development

# Search thresholds (tunable)
LAWRISK_RETURN_IF_GE=0.7
LAWRISK_FALLBACK_GT=0.5

Testing

Current Testing Status

No dedicated test suite in the repository
pytest framework is recommended but not configured
Manual testing available via static/v2_tester.html

Testing Framework & Guidelines

When adding tests, use:

Framework: pytest with Flask test client
Dependencies: pytest-cov for coverage reporting
Focus Areas:
- API endpoints (V1 and V2)
- Middleware behavior (CORS)
- Database operations
- LLM selection logic
- Region filtering
- Checkpoint operations (create, list, restore, delete)

Recommended test structure:

tests/
├── conftest.py                  # Shared fixtures and test configuration
├── test_api_v1.py               # V1 API endpoint tests
├── test_api_v2.py               # V2 API endpoint tests
├── test_admin_endpoints.py      # Admin endpoint tests
├── test_search_service.py       # Core search logic
├── test_licensing_repo.py       # Database repository
└── test_cors_middleware.py      # CORS middleware behavior

# Test naming: test_*.py (pytest discovery pattern)
# Test functions: test_<feature>_<scenario>()

Testing CORS Middleware (from AGENTS.md):

Origin matching: wildcard, exact, subdomains
Preflight OPTIONS handling
X-CORS-Decision header behavior
NGINX_CORS_MODE functionality
Environment variable configurations (ALLOWED_ORIGINS, CORS_STRICT, etc.)

Example test case:

def test_v2_search_with_debug():
    client = app.test_client()
    response = client.post('/fs-ai-asistant/api/workflow/lawrisk/v2',
                          data={'query': '电影院', 'debug': '1'})
    assert response.status_code == 200
    data = response.get_json()['data']
    assert 'debug' in data
    assert 'executionTime' in data

Testing Commands

# Run all tests
pytest

# Run with coverage
pytest --cov=lawrisk_service

# Run specific test file
pytest tests/test_api_v2.py -v

Manual Testing with V2 Tester

Start the application: python app.py (defaults to port 8000)
Open static/v2_tester.html in your browser
Test queries like:
- "我要办一家电影院"
- "开办旅馆需要哪些许可"
- "公共场所卫生许可"
- Query with region filter: "电影院&region=市级&debug=1"

The tester provides a simple UI to experiment with the V2 API and view debug information.

Coding Standards & Best Practices

Python Style Guidelines

Indentation: 4 spaces (no tabs)
Encoding: UTF-8 for all source files
Naming Conventions:
- Functions/variables: snake_case
- Constants: SCREAMING_SNAKE_CASE
- Classes: PascalCase
Type Hints: Prefer type hints for all public functions
Code Formatting: Use black with 100-character line length
Linting: Use ruff with default rules

Code Quality Guidelines

Keep functions small and side-effect free
Prefer pure functions where possible
Document complex logic with comments
Use type hints from typing module
Handle errors gracefully with appropriate logging

Documentation Files

PRD.md (docs/PRD.md) - Product Requirements Document
- Business logic and requirements specification
- Feature specifications and user stories
API.md (docs/API.md) - V1 API documentation
- Legacy API endpoints and usage
- Request/response formats
V2_API文档.md (docs/V2_API文档.md) - Detailed V2 API documentation
- Enhanced API specification
- Admin endpoints and checkpoint operations
- Request/response examples
AGENTS.md (docs/AGENTS.md) - Development guidelines
- Repository structure and module organization
- Coding style and naming conventions
- Commit and PR guidelines
- Security and configuration tips
DB_GUIDE.md (docs/DB_GUIDE.md) - Database reference
- Schema reference for both databases
- Query examples and optimization tips
README.md (README.md) - Project overview and quick start

Key Components Deep Dive

1. app.py - Application Entry Point (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\app.py)

Creates Flask app with CORS enabled
Registers all API routes (v1_bp, v2_bp)
Database initialization and schema checks on startup
Health check endpoint at /healthz
Logs all registered routes on startup
Error handling doesn't block app startup (errors surface on first request)

2. lawrisk_service.py - Core Search Logic (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_service.py)

EmbeddingClient: DashScope API integration for vector embeddings
ChatClient: Qwen LLM interaction for intelligent subject selection
Database helpers using pg8000 for PostgreSQL
Search algorithms:
- Embedding-based cosine similarity search
- LLM-based subject matching using qwen-plus-latest
Similarity threshold management (tunable via LAWRISK_RETURN_IF_GE, LAWRISK_FALLBACK_GT)
Concurrent execution support with ThreadPoolExecutor

3. lawrisk_v2_service.py - Enhanced API (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_v2_service.py)

Structured response formatting with regions, themes, permits, and risks
Region filtering logic with normalization
Direct permit matching by name
Markdown formatting for legal text
Complex query execution pipeline
Helper functions:
- _compose_prompt(): Builds natural-language prompts from structured data
- _normalize_region_filter(): Normalizes region filters for matching

4. licensing_repo.py - Data Repository (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\licensing_repo.py)

Separate database connection for licensing_risks database
Multi-table join query optimization
Legal text formatting helpers
Chinese legal document pattern matching
Checkpoint management:
- create_checkpoint(): Database backup functionality
- list_checkpoints(): List available backups
- restore_checkpoint(): Restore from checkpoint (DANGEROUS!)
- delete_checkpoint(): Remove old checkpoints

5. smart_cors_middleware.py - Reusable CORS (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\middleware\smart_cors_middleware.py)

Wildcard and exact origin matching
Subdomain support with flexible patterns
Preflight OPTIONS handling
NGINX integration mode (NGINX_CORS_MODE)
Debug and logging features (CORS_DEBUG)
Environment variable support:
- ALLOWED_ORIGINS, CORS_STRICT, CORS_DEBUG
- NGINX_CORS_MODE, CORS_MAX_AGE, CORS_EXPOSE_HEADERS

6. v2.py API Routes (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\api\v2.py)

Public endpoints: /v2, /v2/regions, /getPermits
Admin endpoints: /admin/test, /admin/regions, /admin/themes, /admin/permits, /admin/checkpoints
Parameter extraction supporting GET, POST, JSON, and form data
Concurrent execution using ThreadPoolExecutor (max_workers=2)
Structured error responses with consistent format

7. Utility Scripts

env_loader.py: Environment variable loading from .env file
export_risk_json.py: PostgreSQL data export utility (outputs to data/risk_tables_export.json)
ingest_lawrisk.py: Data ingestion with embeddings (requires DASHSCOPE_API_KEY)

Security & Best Practices

Security Guidelines

NEVER hardcode secrets in source code
All credentials must be in .env file or environment variables
API keys (DASHSCOPE_API_KEY) and database passwords must be externalized
Admin endpoints (/admin/*) should be protected in production
Database checkpoint restore is a DANGEROUS OPERATION and should be restricted

Configuration Best Practices

Use .env file for all configuration (database, API keys, thresholds)
Environment variables supported by CORS middleware:
- ALLOWED_ORIGINS: Comma-separated list of allowed origins
- CORS_STRICT: Enable strict origin checking
- CORS_DEBUG: Enable debug logging
- NGINX_CORS_MODE: Enable NGINX integration
- CORS_MAX_AGE: Preflight cache duration
- CORS_EXPOSE_HEADERS: Headers to expose to browsers
Regularly backup databases using checkpoint system
Monitor DashScope API quota and rate limits

Troubleshooting Guide

Common Issues

Database Connection Errors

Symptom: pg8000.dbapi.Error when starting the app Solutions:

Check .env file exists with correct PostgreSQL credentials
Verify network connectivity to the database server
Ensure PostgreSQL server is running and accessible
Check database names: fs_law_risk and licensing_risks
Verify PG_HOST, PG_PORT, PG_USER, PG_PASSWORD are correct

Missing Environment Variables

Symptom: Key errors, default values being used, or API failures Solutions:

Create .env file from the template (see Configuration section)
Ensure DASHSCOPE_API_KEY is set for embedding/chat features
Verify all required PG_* and LIC_* environment variables
Check that pg8000 is installed: pip install pg8000>=1.30.0

LLM/Embedding API Errors

Symptom: API authentication failures, timeout errors, or embedding errors Solutions:

Verify DASHSCOPE_API_KEY is valid and has sufficient quota
Check DASHSCOPE_BASE_URL matches: https://dashscope.aliyuncs.com/compatible-mode/v1
Ensure network access to DashScope API servers
Review API rate limits and batch sizes (DASHSCOPE_MAX_BATCH)
Test API key: curl -H "Authorization: Bearer $DASHSCOPE_API_KEY" "$DASHSCOPE_BASE_URL/models"

Empty Search Results

Symptom: API returns empty risk_subject array Solutions:

Check if database tables are populated:

SELECT COUNT(*) FROM fs_law_risk.law_sub;
SELECT COUNT(*) FROM fs_law_risk.law_sub_per;

Try debug=1 parameter to see detailed execution info
Verify similarity thresholds in .env:
- LAWRISK_RETURN_IF_GE=0.7 (return if similarity >= 0.7)
- LAWRISK_FALLBACK_GT=0.5 (fallback if similarity > 0.5)
Test with known queries like "我要办一家电影院" or "开办旅馆"
Check if embeddings exist: SELECT id FROM fs_law_risk.law_sub LIMIT 5;

Port Already in Use

Symptom: OSError: [Errno 10048] Only one usage of each socket address Solutions:

Change port: PORT=8001 python app.py

Kill existing process using the port:

netstat -ano | findstr :8000
taskkill /PID <PID> /F

Data Export/Import Issues

Symptom: export_risk_json.py or ingest_lawrisk.py fails Solutions:

Verify PostgreSQL credentials in .env match database access
Ensure export script isn't writing outside the repository
For ingestion, confirm DASHSCOPE_API_KEY is valid
Check data files exist: data/risk_tables_export.json
Run export first, then ingestion if needed

Health Check Commands

# Basic health check
curl http://localhost:8000/healthz

# Check registered routes (see app startup logs)
python app.py 2>&1 | grep "Registered routes"

# Test V2 API with debug
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -H "Content-Type: application/x-www-form-urlencoded" \
  -d "query=我要办一家电影院&debug=1"

# Test regions endpoint
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions

# Test admin endpoints
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/admin/test

Database Verification

Verify database content with queries from DB_GUIDE.md:

-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;

-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;

-- Check if embeddings exist
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;

-- List available regions
SELECT * FROM licensing_risks.regions ORDER BY name;

Debug Mode

Enable debug logging to troubleshoot issues:

# Enable Flask debug mode
FLASK_DEBUG=1 python app.py

# Enable CORS debug mode
CORS_DEBUG=1 python app.py

# Check app logs for registered routes and errors
# Logs are printed to console when starting the app

Health Checks

Basic health: GET /healthz → {"status": "ok"}
V2 regions: GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions
Check logs for registered routes on app startup

Data Verification

Verify database content with queries from DB_GUIDE.md:

-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;

-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;

22 KiB Raw Blame History

CLAUDE.md

LawRisk Backend - Claude Code Analysis

Project Overview

Key Features

Architecture & Project Structure

Core Framework & Libraries

Directory Structure

Quick Reference

Most Common Commands

Key Files

Development Workflow

Initial Setup

Virtual Environment Activation

Common Commands

Run the Application

Data Management

Code Quality

Data Management Commands

Database Operations

API Endpoints

V1 API (Legacy)

V2 API (Current/Recommended)

V2 Sub-endpoints

Health Check

Database Schema

Database 1: fs_law_risk

Tables

Database 2: licensing_risks

Tables

Configuration

Environment Variables (.env)

DashScope (Embeddings & LLM)

PostgreSQL Configuration

Application Settings

Testing

Current Testing Status

Testing Framework & Guidelines

Testing Commands

Manual Testing with V2 Tester

Coding Standards & Best Practices

Python Style Guidelines

Code Quality Guidelines

Documentation Files

Key Components Deep Dive

1. app.py - Application Entry Point (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\app.py)

2. lawrisk_service.py - Core Search Logic (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_service.py)

3. lawrisk_v2_service.py - Enhanced API (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\lawrisk_v2_service.py)

4. licensing_repo.py - Data Repository (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\services\licensing_repo.py)

5. smart_cors_middleware.py - Reusable CORS (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\middleware\smart_cors_middleware.py)

6. v2.py API Routes (C:\Users\WIN10\Desktop\work\11th-week\法律风险提示-new\市监局-lawRisk-backend\lawrisk\api\v2.py)

7. Utility Scripts

Security & Best Practices

Security Guidelines

Configuration Best Practices

Troubleshooting Guide

Common Issues

Database Connection Errors

Missing Environment Variables

LLM/Embedding API Errors

Empty Search Results

Port Already in Use

Data Export/Import Issues

Health Check Commands

Database Verification

Debug Mode

Health Checks

Data Verification

22 KiB

Raw Blame History