fs-lawrisk/docs/CLAUDE.md

14 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.


LawRisk Backend - Claude Code Analysis

Project Overview

LawRisk is a Flask-based Python backend service that provides intelligent legal compliance risk retrieval for business licensing and permit requirements. It uses vector embeddings and LLM-based matching to help users find relevant permits, licenses, and associated legal risks based on natural language queries (in Chinese).

Python Version Requirement: Python 3.10+ (uses PEP 604 union types like str | None)

Key Features

  • Semantic Search: Uses Aliyun DashScope embeddings (text-embedding-v4) to find similar legal topics
  • LLM-Powered Matching: Qwen (qwen-plus-latest) for intelligent subject selection
  • Two Database Architecture:
    • fs_law_risk: Vector embeddings and subject-permit mappings
    • licensing_risks: Structured permit and risk data with regions, themes, and compliance information
  • RESTful APIs: Clean REST endpoints for V1 (legacy) and V2 (enhanced) search
  • CORS Enabled: Built-in CORS middleware for frontend integration

Architecture & Project Structure

Core Framework & Libraries

  • Framework: Flask (Python web framework)
  • Database Driver: pg8000 (PostgreSQL adapter)
  • Vector Embeddings: Aliyun DashScope OpenAI-compatible API
  • LLM: Qwen via DashScope (qwen-plus-latest)
  • Dependencies: Minimal footprint - Flask, pg8000, concurrent.futures

Directory Structure

市监局-lawRisk-backend/
├── app.py                          # Flask application entry point
├── requirements.txt                # Python dependencies
├── .env                            # Environment configuration
├── lawrisk/                        # Main application package
│   ├── __init__.py
│   ├── api/                        # API route handlers
│   │   ├── v1.py                   # V1 API (legacy)
│   │   └── v2.py                   # V2 API (current)
│   ├── services/                   # Business logic layer
│   │   ├── lawrisk_service.py      # Core search & embeddings
│   │   ├── lawrisk_v2_service.py   # V2 enhanced service
│   │   └── licensing_repo.py       # Data repository
│   ├── middleware/                 # HTTP middleware
│   │   └── smart_cors_middleware.py
│   └── utils/                      # Utility functions
│       ├── env_loader.py
│       ├── export_risk_json.py
│       └── ingest_lawrisk.py
├── static/                         # Static assets
│   └── v2_tester.html              # Web-based API tester
├── tests/                          # Test suite (planned)
├── data/                           # Data files
│   ├── risk_tables_export.json
│   └── licensing_risks_dump.sql
└── docs/                           # Documentation
    ├── PRD.md
    ├── API.md
    ├── V2_API文档.md
    ├── AGENTS.md
    ├── DB_GUIDE.md
    └── CLAUDE.md

Quick Reference

Most Common Commands

# Run the application
python app.py

# Export data from database
python export_risk_json.py

# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python ingest_lawrisk.py

# Format and lint code
black .
ruff .

# Test locally via browser
# Open static/v2_tester.html after starting the app

Key Files

  • app.py - Flask application entry point
  • lawrisk/ - Main application package
    • api/v1.py - V1 API routes (legacy)
    • api/v2.py - V2 API routes (current)
    • services/lawrisk_service.py - Core search & embeddings
    • services/lawrisk_v2_service.py - V2 enhanced service
    • services/licensing_repo.py - Data repository
    • middleware/smart_cors_middleware.py - CORS middleware
    • utils/ - Utility functions
  • static/v2_tester.html - Web-based API testing interface
  • requirements.txt - Python dependencies
  • .env - Environment configuration

Development Workflow

Initial Setup

# 1. Create virtual environment
python -m venv .venv

# 2. Activate virtual environment (Windows PowerShell)
.venv\Scripts\activate

# 3. Install dependencies
pip install Flask pg8000 black ruff pytest

# 4. Load environment variables
# Edit .env with your database credentials

Common Commands

Run the Application

# Development mode
python app.py

# Custom port
PORT=8000 python app.py

# With debug logging
FLASK_DEBUG=1 python app.py

Data Management

# Export data from fs_law_risk database to JSON
python export_risk_json.py

# Ingest data with embeddings into database
python ingest_lawrisk.py
# Requires DASHSCOPE_API_KEY in .env

Code Quality

# Format code with Black (100 char line length)
black .

# Lint with Ruff
ruff .

# Run tests (when added)
pytest -q

Database Operations

# Connect to PostgreSQL
psql -h 8.138.196.105 -U postgres -d fs_law_risk

# Connect to licensing_risks database
psql -h 8.138.196.105 -U postgres -d licensing_risks

API Endpoints

V1 API (Legacy)

  • Path: /fs-ai-asistant/api/workflow/lawrisk
  • Methods: GET, POST
  • Mode: llm (default) or embed
  • Input: query (user question)
  • Output: Simple array of matching subjects with permit IDs

V2 API (Current/Recommended)

  • Base Path: /fs-ai-asistant/api/workflow/lawrisk/v2
  • Methods: GET, POST
  • Features:
    • Structured results with regions, themes, permits, and risks
    • Optional region filtering
    • Debug mode with detailed execution info
    • Direct permit matching by name

V2 Sub-endpoints

  1. Search Endpoint

    • Path: /fs-ai-asistant/api/workflow/lawrisk/v2
    • Parameters:
      • query (required): User question
      • region (optional): Filter by region (市级, 禅城区, etc.)
      • debug (optional): Enable debug output (1/true/yes/on)
      • top (optional): Number of recommendations (default: 5)
  2. Regions List

    • Path: /fs-ai-asistant/api/workflow/lawrisk/v2/regions
    • Method: GET
    • Returns: All available regions for filtering
  3. Get Permits

    • Path: /fs-ai-asistant/api/workflow/lawrisk/getPermits
    • Method: GET, POST
    • Input: region (region ID or name)
    • Returns: All permits for a specific region

Health Check

  • Path: /healthz
  • Method: GET
  • Returns: {"status": "ok"}

Database Schema

Database 1: fs_law_risk

Used for vector embeddings and semantic search.

Tables

  • law_sub: Subject matter with embeddings

    • id (TEXT, PK): Subject ID
    • name (TEXT): Subject name
    • vector (JSONB): Embedding vector
  • law_sub_per: Subject-permit mappings

    • sub_id (TEXT, PK): Subject ID
    • per_ids (JSONB): Array of permit IDs
  • law_per: Permit information

    • id (TEXT, PK): Permit ID
    • name (TEXT): Permit name
    • risk_ids (JSONB): Array of risk IDs

Database 2: licensing_risks

Used for structured compliance data.

Tables

  • regions: Administrative areas

    • id (PK), name (unique)
  • business_scopes: Business scope definitions

    • id (PK), description
  • region_scopes: Region-scope mappings

  • themes: Legal themes/subjects

    • id (PK), name
  • region_themes: Region-theme mappings

  • permits: License/permit items

    • id (PK), name
  • region_theme_permits: Tripartite linkage

  • risks: Risk information

    • id (PK), risk_content, legal_basis, document_no, summary
  • region_permit_risks: Risk associations


Configuration

Environment Variables (.env)

DashScope (Embeddings & LLM)

DASHSCOPE_API_KEY=sk-288824ef003e4e02bb963b8b3024b06a
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest

PostgreSQL Configuration

# fs_law_risk database
PG_HOST=8.138.196.105
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=difyai123456
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres

# licensing_risks database
LIC_PG_HOST=8.138.196.105
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=difyai123456
LIC_PG_DATABASE=licensing_risks

Application Settings

FLASK_ENV=development

# Search thresholds (tunable)
LAWRISK_RETURN_IF_GE=0.7
LAWRISK_FALLBACK_GT=0.5

Testing

Current Testing Status

  • No dedicated test suite in the repository
  • pytest framework is recommended but not configured
  • Manual testing available via static/v2_tester.html

Testing Framework & Guidelines

When adding tests, use:

  • Framework: pytest with Flask test client
  • Focus Areas:
    • API endpoints (V1 and V2)
    • Middleware behavior (CORS)
    • Database operations
    • LLM selection logic
    • Region filtering

Recommended test structure:

tests/
├── test_api_v1.py          # V1 API endpoint tests
├── test_api_v2.py          # V2 API endpoint tests
├── test_search_service.py  # Core search logic
├── test_licensing_repo.py  # Database repository
└── conftest.py             # Shared fixtures

Testing Commands

# Run all tests
pytest

# Run with coverage
pytest --cov=lawrisk_service

# Run specific test file
pytest tests/test_api_v2.py -v

Manual Testing with V2 Tester

  1. Start the application: python app.py (defaults to port 8000)
  2. Open static/v2_tester.html in your browser
  3. Test queries like:
    • "我要办一家电影院"
    • "开办旅馆需要哪些许可"
    • "公共场所卫生许可"
    • Query with region filter: "电影院&region=市级&debug=1"

The tester provides a simple UI to experiment with the V2 API and view debug information.


Coding Standards & Best Practices

Python Style Guidelines

  • Indentation: 4 spaces (no tabs)
  • Encoding: UTF-8 for all source files
  • Naming Conventions:
    • Functions/variables: snake_case
    • Constants: SCREAMING_SNAKE_CASE
    • Classes: PascalCase
  • Type Hints: Prefer type hints for all public functions
  • Code Formatting: Use black with 100-character line length
  • Linting: Use ruff with default rules

Code Quality Guidelines

  • Keep functions small and side-effect free
  • Prefer pure functions where possible
  • Document complex logic with comments
  • Use type hints from typing module
  • Handle errors gracefully with appropriate logging

Documentation Files

  • PRD.md - Product Requirements Document (specifies business logic and requirements)
  • API.md - API documentation for V1 endpoints
  • V2_API文档.md - Detailed API documentation for V2 endpoints
  • AGENTS.md - Development guidelines and best practices
  • DB_GUIDE.md - Database schema reference and query examples

Key Components Deep Dive

1. app.py - Application Entry Point

  • Creates Flask app with CORS enabled
  • Registers all API routes
  • Parameter extraction logic (GET/POST, JSON/form data)
  • Concurrent execution using ThreadPoolExecutor
  • Error handling and logging

2. lawrisk_service.py - Core Search Logic

  • EmbeddingClient: Handles DashScope API integration
  • ChatClient: Manages Qwen LLM interactions
  • Database helpers with pg8000
  • Search algorithms:
    • Embedding-based cosine similarity
    • LLM-based subject selection
  • Similarity threshold management

3. lawrisk_v2_service.py - Enhanced API

  • Structured response formatting
  • Region filtering logic
  • Permit direct matching by name
  • Markdown formatting for legal text
  • Complex query execution pipeline

4. licensing_repo.py - Data Repository

  • Separate database connection for licensing_risks
  • Query optimization for multi-table joins
  • Legal text formatting helpers
  • Pattern matching for Chinese legal documents

5. smart_cors_middleware.py - Reusable CORS

  • Wildcard and exact origin matching
  • Subdomain support
  • Preflight OPTIONS handling
  • NGINX integration mode
  • Debug and logging features

Troubleshooting Guide

Common Issues

Database Connection Errors

Symptom: pg8000.dbapi.Error when starting the app Solutions:

  • Check .env file exists with correct PostgreSQL credentials
  • Verify network connectivity to the database server
  • Ensure PostgreSQL server is running and accessible
  • Check database names: fs_law_risk and licensing_risks

Missing Environment Variables

Symptom: Key errors or default values being used Solutions:

  • Create .env file from the template (see Configuration section)
  • Ensure DASHSCOPE_API_KEY is set for embedding/chat features
  • Verify all required PG_* and LIC_* environment variables

LLM/Embedding API Errors

Symptom: API authentication failures or timeout errors Solutions:

  • Verify DASHSCOPE_API_KEY is valid and has sufficient quota
  • Check DASHSCOPE_BASE_URL matches the API endpoint
  • Ensure network access to DashScope API servers
  • Review API rate limits and batch sizes

Empty Search Results

Symptom: API returns empty risk_subject array Solutions:

  • Check if database tables are populated (fs_law_risk.law_sub, etc.)
  • Try debug=1 parameter to see detailed execution info
  • Verify similarity thresholds in .env (LAWRISK_RETURN_IF_GE, LAWRISK_FALLBACK_GT)
  • Test with known queries like "我要办一家电影院"

Port Already in Use

Symptom: OSError: [Errno 10048] Only one usage of each socket address Solutions:

  • Change port: PORT=8001 python app.py
  • Kill existing process using the port: netstat -ano | findstr :8000 then taskkill /PID <PID> /F

Debug Mode

Enable debug logging to troubleshoot issues:

# Enable Flask debug mode
FLASK_DEBUG=1 python app.py

# Enable CORS debug mode
CORS_DEBUG=1 python app.py

# Check app logs for registered routes and errors
# Logs are printed to console when starting the app

Health Checks

  • Basic health: GET /healthz{"status": "ok"}
  • V2 regions: GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions
  • Check logs for registered routes on app startup

Data Verification

Verify database content with queries from DB_GUIDE.md:

-- Check subject count
SELECT COUNT(*) FROM fs_law_risk.law_sub;

-- Check region-theme pairs
SELECT COUNT(*) FROM licensing_risks.region_themes;