12 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

LawRisk Backend - Development Guide

Project Overview

LawRisk is a Flask-based Python backend service for intelligent legal compliance risk retrieval. It helps users find permits, licenses, and legal risks based on natural language queries using vector embeddings and LLM matching.

Tech Stack

Framework: Flask 2.3+
Database: PostgreSQL (pg8000 driver)
AI Services: 阿里云DashScope (text-embedding-v4, qwen-plus-latest)
Development: Black, Ruff, Pytest

Two-Database Architecture

fs_law_risk: Vector embeddings and subject-permit mappings
licensing_risks: Structured permit and risk data (regions, themes, compliance)

Quick Reference

Most Common Commands

# Run the application (port 8000)
python app.py

# Install dependencies
pip install -r requirements.txt

# Format and lint code
black .
ruff .

# Run tests
pytest
pytest --cov=lawrisk

# Test API via curl
curl http://localhost:8000/healthz
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -d "query=我要办一家电影院&debug=1"

Key File Locations

app.py - Flask application entry point
lawrisk/api/v2.py - V2 API routes (current)
lawrisk/services/lawrisk_v2_service.py - Enhanced V2 search logic
lawrisk/services/licensing_repo.py - Database operations
lawrisk/api/auth.py - Authentication endpoints
static/v2_tester.html - Web-based API testing interface
tests/test_auth.py - Auth system tests
tests/test_checkpoint_security.py - Checkpoint system tests

Architecture & Code Structure

Request Flow

HTTP Request
  → lawrisk/api/ (routing layer)
    → lawrisk/services/ (business logic)
      → lawrisk/services/licensing_repo.py (database access)
      → DashScope API (embeddings & LLM)

Core Modules

1. API Layer (lawrisk/api/)

v1.py - Legacy API (deprecated)
v2.py - Current API with structured responses + admin endpoints
auth.py - Authentication (login/logout/me endpoints)

2. Services Layer (lawrisk/services/)

lawrisk_service.py - Core search with embeddings (cosine similarity) + LLM matching
lawrisk_v2_service.py - Enhanced V2 with structured results, region filtering, direct permit matching
licensing_repo.py - PostgreSQL operations (both databases), checkpoint management
auth_service.py - User authentication, password hashing, seed admin creation

3. Middleware & Utils

middleware/smart_cors_middleware.py - Configurable CORS (wildcard, subdomains, NGINX mode)
utils/env_loader.py - Environment variable loading
utils/export_risk_json.py - Database export utility
utils/ingest_lawrisk.py - Data ingestion with embeddings

API Endpoints

Public Endpoints

V2 Search (Current)

Path: /fs-ai-asistant/api/workflow/lawrisk/v2
Method: POST (recommended), GET
Params:
- query (required): User question
- region (optional): Filter by region (市级, 禅城区, etc.)
- debug (optional): Enable debug output (1/true/yes/on)
- top (optional): Number of recommendations (default: 5)
Returns: Structured results with regions, themes, permits, risks

Supporting Endpoints

GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions - List all regions
GET /fs-ai-asistant/api/workflow/lawrisk/getPermits - Get permits by region
GET /healthz - Health check

Authentication Endpoints

GET /fs-ai-asistant/lawrisk/login - Login page (HTML)
POST /auth/login - Authenticate user
GET /auth/me - Get current user
GET /auth/logout - Logout

Admin Endpoints (Protected)

GET /fs-ai-asistant/api/workflow/lawrisk/admin/test - Admin test
GET /fs-ai-asistant/api/workflow/lawrisk/admin/regions - Region management
GET /fs-ai-asistant/api/workflow/lawrisk/admin/themes - Theme management
GET /fs-ai-asistant/api/workflow/lawrisk/admin/permits - Permit management
GET /fs-ai-asistant/api/workflow/lawrisk/admin/checkpoints - Checkpoint management (create/list/restore/delete)

Development Workflow

Environment Setup

# Windows PowerShell
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure .env with database credentials and DashScope API key

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=lawrisk --cov-report=html

# Run specific test file
pytest tests/test_auth.py -v

# Test authentication
pytest tests/test_auth.py::test_login_success -v

Test Files:

tests/test_auth.py - Authentication system tests (login, logout, session management)
tests/test_checkpoint_security.py - Database checkpoint security tests

Code Quality

# Format code
black .

# Lint with Ruff
ruff .

# Check specific file
black lawrisk/services/lawrisk_v2_service.py
ruff lawrisk/services/lawrisk_v2_service.py

Data Management

# Export data from fs_law_risk database
python lawrisk/utils/export_risk_json.py
# Output: data/risk_tables_export.json

# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python lawrisk/utils/ingest_lawrisk.py

Configuration

Required Environment Variables (.env)

DashScope AI Services

DASHSCOPE_API_KEY=your_api_key
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest

PostgreSQL Databases

# fs_law_risk (embeddings database)
PG_HOST=your_host
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=your_password
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres

# licensing_risks (structured data)
LIC_PG_HOST=your_host
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=your_password
LIC_PG_DATABASE=licensing_risks

Authentication

FLASK_SECRET_KEY=your-secret-key
LAWRISK_ADMIN_USERNAME=admin
LAWRISK_ADMIN_PASSWORD=adminpassword123
# Optional: LAWRISK_ADMIN_ROLE, LAWRISK_ADMIN_GRADE, LAWRISK_ADMIN_DISPLAY_NAME

Search Thresholds

LAWRISK_RETURN_IF_GE=0.7    # Return results if similarity >= 0.7
LAWRISK_FALLBACK_GT=0.5     # Use fallback if similarity > 0.5

Database Schema

fs_law_risk (Vector Embeddings)

law_sub - Subject matter with embeddings (id, name, vector)
law_sub_per - Subject-permit mappings (sub_id, per_ids)
law_per - Permit information (id, name, risk_ids)

licensing_risks (Structured Compliance)

regions - Administrative areas
themes - Legal themes/subjects
permits - License/permit items
risks - Risk information (content, legal_basis, document_no, summary)
business_scopes - Business scope definitions
Junction tables: region_themes, region_theme_permits, region_permit_risks

Checkpoint System

Licensing_repo.py implements database checkpoint management:

create_checkpoint() - Create database backup
list_checkpoints() - List available backups
restore_checkpoint() - DANGEROUS - Restore from checkpoint
delete_checkpoint() - Remove old checkpoints

Security Guidelines

Critical Security Notes

NEVER commit secrets - All credentials in .env or environment variables
Protect admin endpoints - /admin/* should be restricted in production
Checkpoint restore is dangerous - Database operation with confirmation flow
API keys externalized - DASHSCOPE_API_KEY and database passwords must be in .env

Authentication System

Session-based auth using Flask sessions
Password hashing with passlib
First admin auto-created from environment variables on startup
Role-based access (admin, reviewer, analyst, etc.)
Login page: /fs-ai-asistant/lawrisk/login
Protected endpoints use @login_required decorator

Recent Features (from git log)

Checkpoint System (Recent)

Database backup/restore functionality
Timeline view of checkpoints
Progress indicators for restore operations
Security tests in test_checkpoint_security.py

Permit Risk Snapshot

Workflow for permit risk snapshots
Unified snapshot and checkpoint timeline
Enhanced batch display for snapshots

Licensing Import Enhancement

Optimized district/region merging during import
Enhanced source display for permits

Testing Guidelines

Test Structure

tests/
├── __init__.py
├── test_auth.py              # Auth system tests (login, session, decorators)
└── test_checkpoint_security.py  # Checkpoint security tests

Running Tests

# All tests
pytest

# Verbose output
pytest -v

# Coverage report
pytest --cov=lawrisk --cov-report=term-missing

# Specific test
pytest tests/test_auth.py::test_login_success -v

Manual Testing

Start app: python app.py
Open browser: static/v2_tester.html
Test queries:
- "我要办一家电影院"
- "开办旅馆需要哪些许可"
- With region filter and debug mode

Troubleshooting

Common Issues

Database Connection

# Verify database is accessible
psql -h $PG_HOST -U $PG_USER -d $PG_DATABASE

# Check tables exist
SELECT COUNT(*) FROM fs_law_risk.law_sub;
SELECT COUNT(*) FROM licensing_risks.regions;

API Errors

# Test health check
curl http://localhost:8000/healthz

# Test V2 API with debug
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -d "query=电影院&debug=1"

# Check app logs for registered routes
python app.py 2>&1 | grep "Registered routes"

Missing Embeddings

# Check if embeddings exist
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;

# If empty, run ingestion
python lawrisk/utils/ingest_lawrisk.py

Documentation Files

README.md - Project overview and quick start
AGENTS.md - Development guidelines, coding style, testing approach
docs/V2_API文档.md - Detailed V2 API documentation
docs/API.md - V1 API documentation (legacy)
docs/DB_GUIDE.md - Database schema and query examples
docs/PRD.md - Product requirements
docs/CLAUDE.md - Detailed Claude Code guidance (comprehensive)

Key Components Deep Dive

V2 Service Architecture

lawrisk_v2_service.py implements:

Structured response formatting
Region filter normalization
Direct permit name matching
Markdown formatting for legal text
Complex query execution pipeline with concurrency

Authentication Flow

lawrisk/api/auth.py provides:

Login page with redirect handling
Session management
@login_required decorator for protecting endpoints
JSON vs HTML response handling (API vs browser)

Checkpoint Security

test_checkpoint_security.py tests:

Checkpoint creation authorization
Restore operation security
User permission validation
Operation audit logging

Best Practices

Code Style

Black: 100-character line length, Python 3.10+
Type Hints: Use PEP 604 union types (str | None)
Imports: Ruff-compatible, group by standard library → third-party → local
Naming: snake_case (functions/variables), SCREAMING_SNAKE_CASE (constants), PascalCase (classes)

Error Handling

Graceful degradation on startup (errors surface on first request)
Structured error responses: {"success": false, "message": "error", "data": {}}
Logging to stdout with structured format

Configuration

Use lawrisk.utils.env_loader for environment variables
Default values for non-critical configs
Environment-specific overrides supported

Health Checks

# Basic health
curl http://localhost:8000/healthz

# Check regions
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions

# Test search
curl -X POST http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2 \
  -d "query=电影院&debug=1"

View app startup logs to see all registered routes.

12 KiB Raw Blame History

CLAUDE.md

LawRisk Backend - Development Guide

Project Overview

Tech Stack

Two-Database Architecture

Quick Reference

Most Common Commands

Key File Locations

Architecture & Code Structure

Request Flow

Core Modules

API Endpoints

Public Endpoints

V2 Search (Current)

Supporting Endpoints

Authentication Endpoints

Admin Endpoints (Protected)

Development Workflow

Environment Setup

Testing

Code Quality

Data Management

Configuration

Required Environment Variables (.env)

DashScope AI Services

PostgreSQL Databases

Authentication

Search Thresholds

Database Schema

fs_law_risk (Vector Embeddings)

licensing_risks (Structured Compliance)

Checkpoint System

Security Guidelines

Critical Security Notes

Authentication System

Recent Features (from git log)

Checkpoint System (Recent)

Permit Risk Snapshot

Licensing Import Enhancement

Testing Guidelines

Test Structure

Running Tests

Manual Testing

Troubleshooting

Common Issues

Database Connection

API Errors

Missing Embeddings

Documentation Files

Key Components Deep Dive

V2 Service Architecture

Authentication Flow

Checkpoint Security

Best Practices

Code Style

Error Handling

Configuration

Health Checks

12 KiB

Raw Blame History