14 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

LawRisk Backend - Claude Code Analysis

Project Overview

LawRisk is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese).

Python Version Requirement: Python 3.10+ (uses PEP 604 union types like str | None)

Key Features

Semantic Search: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics
LLM-Powered Matching: Qwen (qwen-plus-latest) for intelligent subject selection
Two Database Architecture:
- fs_law_risk: Vector embeddings and subject-permit mappings
- licensing_risks: Structured permit and risk data with regions, themes, and compliance information
RESTful APIs: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search
CORS Enabled: Built-in CORS middleware for frontend integration

Architecture & Project Structure

Core Framework & Libraries

Framework: Flask (Python web framework)
Database Driver: pg8000 (PostgreSQL adapter)
Vector Embeddings: Aliyun DashScope OpenAI-compatible API
LLM: Qwen via DashScope (qwen-plus-latest)
Dependencies: Minimal footprint - Flask, pg8000, concurrent.futures

Directory Structure

市监局-lawRisk-backend/
├── app.py                          # Flask application entry point
├── requirements.txt                # Python dependencies
├── .env                            # Environment configuration
├── lawrisk/                        # Main application package
│   ├── __init__.py
│   ├── api/                        # API route handlers
│   │   ├── v1.py                   # V1 API (legacy)
│   │   └── v2.py                   # V2 API (current)
│   ├── services/                   # Business logic layer
│   │   ├── lawrisk_service.py      # Core search & embeddings
│   │   ├── lawrisk_v2_service.py   # V2 enhanced service
│   │   └── licensing_repo.py       # Data repository
│   ├── middleware/                 # HTTP middleware
│   │   └── smart_cors_middleware.py
│   └── utils/                      # Utility functions
│       ├── env_loader.py
│       ├── export_risk_json.py
│       └── ingest_lawrisk.py
├── static/                         # Static assets
│   └── v2_tester.html              # Web-based API tester
├── tests/                          # Test suite (planned)
├── data/                           # Data files
│   ├── risk_tables_export.json
│   └── licensing_risks_dump.sql
└── docs/                           # Documentation
    ├── PRD.md
    ├── API.md
    ├── V2_API文档.md
    ├── AGENTS.md
    ├── DB_GUIDE.md
    └── CLAUDE.md

Quick Reference

Most Common Commands

# Run the application
python app.py

# Export data from database
python export_risk_json.py

# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python ingest_lawrisk.py

# Format and lint code
black .
ruff .

# Test locally via browser
# Open static/v2_tester.html after starting the app

Key Files

app.py - Flask application entry point
lawrisk/ - Main application package
- api/v1.py - V1 API routes (legacy)
- api/v2.py - V2 API routes (current)
- services/lawrisk_service.py - Core search & embeddings
- services/lawrisk_v2_service.py - V2 enhanced service
- services/licensing_repo.py - Data repository
- middleware/smart_cors_middleware.py - CORS middleware
- utils/ - Utility functions
static/v2_tester.html - Web-based API testing interface
requirements.txt - Python dependencies
.env - Environment configuration

Development Workflow

Initial Setup

# 1. Create virtual environment
python -m venv .venv

# 2. Activate virtual environment (Windows PowerShell)
.venv\Scripts\activate

# 3. Install dependencies
pip install Flask pg8000 black ruff pytest

# 4. Load environment variables
# Edit .env with your database credentials

Common Commands

Run the Application

# Development mode
python app.py

# Custom port
PORT=8000 python app.py

# With debug logging
FLASK_DEBUG=1 python app.py

Data Management

# Export data from fs_law_risk database to JSON
python export_risk_json.py

# Ingest data with embeddings into database
python ingest_lawrisk.py
# Requires DASHSCOPE_API_KEY in .env

Code Quality

# Format code with Black (100 char line length)
black .

# Lint with Ruff
ruff .

# Run tests (when added)
pytest -q

Database Operations

# Connect to PostgreSQL
psql -h 8.138.196.105 -U postgres -d fs_law_risk

# Connect to licensing_risks database
psql -h 8.138.196.105 -U postgres -d licensing_risks

API Endpoints

V1 API (Legacy)

Path: /fs-ai-asistant/api/workflow/lawrisk
Methods: GET, POST
Mode: llm (default) or embed
Input: query (user question)
Output: Simple array of matching subjects with permit IDs

V2 API (Current/Recommended)

Base Path: /fs-ai-asistant/api/workflow/lawrisk/v2
Methods: GET, POST
Features:
- Structured results with regions, themes, permits, and risks
- Optional region filtering
- Debug mode with detailed execution info
- Direct permit matching by name

V2 Sub-endpoints

Search Endpoint
- Path: /fs-ai-asistant/api/workflow/lawrisk/v2
- Parameters:
  - query (required): User question
  - region (optional): Filter by region (市级, 禅城区, etc.)
  - debug (optional): Enable debug output (1/true/yes/on)
  - top (optional): Number of recommendations (default: 5)
Regions List
- Path: /fs-ai-asistant/api/workflow/lawrisk/v2/regions
- Method: GET
- Returns: All available regions for filtering
Get Permits
- Path: /fs-ai-asistant/api/workflow/lawrisk/getPermits
- Method: GET, POST
- Input: region (region ID or name)
- Returns: All permits for a specific region

Health Check

Path: /healthz
Method: GET
Returns: {"status": "ok"}

Database Schema

Database 1: fs_law_risk

Used for vector embeddings and semantic search.

Tables

law_sub: Subject matter with embeddings
- id (TEXT, PK): Subject ID
- name (TEXT): Subject name
- vector (JSONB): Embedding vector
law_sub_per: Subject-permit mappings
- sub_id (TEXT, PK): Subject ID
- per_ids (JSONB): Array of permit IDs
law_per: Permit information
- id (TEXT, PK): Permit ID
- name (TEXT): Permit name
- risk_ids (JSONB): Array of risk IDs

Database 2: licensing_risks

Used for structured compliance data.

Tables

regions: Administrative areas
- id (PK), name (unique)
business_scopes: Business scope definitions
- id (PK), description
region_scopes: Region-scope mappings
themes: Legal themes/subjects
- id (PK), name
region_themes: Region-theme mappings
permits: License/permit items
- id (PK), name
region_theme_permits: Tripartite linkage
risks: Risk information
- id (PK), risk_content, legal_basis, document_no, summary
region_permit_risks: Risk associations

Configuration

Environment Variables (.env)

DashScope (Embeddings & LLM)

DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest

PostgreSQL Configuration

# fs_law_risk database
PG_HOST=8.138.196.105
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=difyai123456
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres

# licensing_risks database
LIC_PG_HOST=8.138.196.105
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=difyai123456
LIC_PG_DATABASE=licensing_risks

Application Settings

FLASK_ENV=development

# Search thresholds (tunable)
LAWRISK_RETURN_IF_GE=0.7
LAWRISK_FALLBACK_GT=0.5

Testing

Current Testing Status

No dedicated test suite in the repository
pytest framework is recommended but not configured
Manual testing available via static/v2_tester.html

Testing Framework & Guidelines

When adding tests, use:

Framework: pytest with Flask test client
Focus Areas:
- API endpoints (V1 and V2)
- Middleware behavior (CORS)
- Database operations
- LLM selection logic
- Region filtering

Recommended test structure:

tests/
├── test_api_v1.py          # V1 API endpoint tests
├── test_api_v2.py          # V2 API endpoint tests
├── test_search_service.py  # Core search logic
├── test_licensing_repo.py  # Database repository
└── conftest.py             # Shared fixtures

Testing Commands

# Run all tests
pytest

# Run with coverage
pytest --cov=lawrisk_service

# Run specific test file
pytest tests/test_api_v2.py -v

Manual Testing with V2 Tester

Start the application: python app.py (defaults to port 8000)
Open static/v2_tester.html in your browser
Test queries like:
- "我要办一家电影院"
- "开办旅馆需要哪些许可"
- "公共场所卫生许可"
- Query with region filter: "电影院&region=市级&debug=1"

The tester provides a simple UI to experiment with the V2 API and view debug information.

Coding Standards & Best Practices

Python Style Guidelines

Indentation: 4 spaces (no tabs)
Encoding: UTF-8 for all source files
Naming Conventions:
- Functions/variables: snake_case
- Constants: SCREAMING_SNAKE_CASE
- Classes: PascalCase
Type Hints: Prefer type hints for all public functions
Code Formatting: Use black with 100-character line length
Linting: Use ruff with default rules

Code Quality Guidelines

Keep functions small and side-effect free
Prefer pure functions where possible
Document complex logic with comments
Use type hints from typing module
Handle errors gracefully with appropriate logging

Documentation Files

PRD.md - Product Requirements Document (specifies business logic and requirements)
API.md - API documentation for V1 endpoints
V2_API文档.md - Detailed API documentation for V2 endpoints
AGENTS.md - Development guidelines and best practices
DB_GUIDE.md - Database schema reference and query examples

Key Components Deep Dive

1. app.py - Application Entry Point

Creates Flask app with CORS enabled
Registers all API routes
Parameter extraction logic (GET/POST, JSON/form data)
Concurrent execution using ThreadPoolExecutor
Error handling and logging

2. lawrisk_service.py - Core Search Logic

EmbeddingClient: Handles DashScope API integration
ChatClient: Manages Qwen LLM interactions
Database helpers with pg8000
Search algorithms:
- Embedding-based cosine similarity
- LLM-based subject selection
Similarity threshold management

3. lawrisk_v2_service.py - Enhanced API

Structured response formatting
Region filtering logic
Permit direct matching by name
Markdown formatting for legal text
Complex query execution pipeline

4. licensing_repo.py - Data Repository

Separate database connection for licensing_risks
Query optimization for multi-table joins
Legal text formatting helpers
Pattern matching for Chinese legal documents

5. smart_cors_middleware.py - Reusable CORS

Wildcard and exact origin matching
Subdomain support
Preflight OPTIONS handling
NGINX integration mode
Debug and logging features

Troubleshooting Guide

Common Issues

Database Connection Errors

Symptom: pg8000.dbapi.Error when starting the app Solutions:

Check .env file exists with correct PostgreSQL credentials
Verify network connectivity to the database server
Ensure PostgreSQL server is running and accessible
Check database names: fs_law_risk and licensing_risks

Missing Environment Variables

Symptom: Key errors or default values being used Solutions:

Create .env file from the template (see Configuration section)
Ensure DASHSCOPE_API_KEY is set for embedding/chat features
Verify all required PG_* and LIC_* environment variables

LLM/Embedding API Errors

Symptom: API authentication failures or timeout errors Solutions:

Verify DASHSCOPE_API_KEY is valid and has sufficient quota
Check DASHSCOPE_BASE_URL matches the API endpoint
Ensure network access to DashScope API servers
Review API rate limits and batch sizes

Empty Search Results

Symptom: API returns empty risk_subject array Solutions:

Check if database tables are populated (fs_law_risk.law_sub, etc.)
Try debug=1 parameter to see detailed execution info
Verify similarity thresholds in .env (LAWRISK_RETURN_IF_GE, LAWRISK_FALLBACK_GT)
Test with known queries like "我要办一家电影院"

Port Already in Use

Symptom: OSError: [Errno 10048] Only one usage of each socket address Solutions:

Change port: PORT=8001 python app.py
Kill existing process using the port: netstat -ano | findstr :8000 then taskkill /PID <PID> /F

Debug Mode

Enable debug logging to troubleshoot issues:

# Enable Flask debug mode
FLASK_DEBUG=1 python app.py

# Enable CORS debug mode
CORS_DEBUG=1 python app.py

# Check app logs for registered routes and errors
# Logs are printed to console when starting the app

Health Checks

Basic health: GET /healthz → {"status": "ok"}
V2 regions: GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions
Check logs for registered routes on app startup

Data Verification

Verify database content with queries from DB_GUIDE.md:

-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;

-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;

14 KiB Raw Blame History

CLAUDE.md

LawRisk Backend - Claude Code Analysis

Project Overview

Key Features

Architecture & Project Structure

Core Framework & Libraries

Directory Structure

Quick Reference

Most Common Commands

Key Files

Development Workflow

Initial Setup

Common Commands

Run the Application

Data Management

Code Quality

Database Operations

API Endpoints

V1 API (Legacy)

V2 API (Current/Recommended)

V2 Sub-endpoints

Health Check

Database Schema

Database 1: fs_law_risk

Tables

Database 2: licensing_risks

Tables

Configuration

Environment Variables (.env)

DashScope (Embeddings & LLM)

PostgreSQL Configuration

Application Settings

Testing

Current Testing Status

Testing Framework & Guidelines

Testing Commands

Manual Testing with V2 Tester

Coding Standards & Best Practices

Python Style Guidelines

Code Quality Guidelines

Documentation Files

Key Components Deep Dive

1. app.py - Application Entry Point

2. lawrisk_service.py - Core Search Logic

3. lawrisk_v2_service.py - Enhanced API

4. licensing_repo.py - Data Repository

5. smart_cors_middleware.py - Reusable CORS

Troubleshooting Guide

Common Issues

Database Connection Errors

Missing Environment Variables

LLM/Embedding API Errors

Empty Search Results

Port Already in Use

Debug Mode

Health Checks

Data Verification

14 KiB

Raw Blame History