fs-lawrisk/CLAUDE.md

12 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.


LawRisk Backend - Development Guide

Project Overview

LawRisk is a Flask-based Python backend service for intelligent legal compliance risk retrieval. It helps users find permits, licenses, and legal risks based on natural language queries using vector embeddings and LLM matching.

Tech Stack

  • Framework: Flask 2.3+
  • Database: PostgreSQL (pg8000 driver)
  • AI Services: 阿里云DashScope (text-embedding-v4, qwen-plus-latest)
  • Development: Black, Ruff, Pytest

Two-Database Architecture

  1. fs_law_risk: Vector embeddings and subject-permit mappings
  2. licensing_risks: Structured permit and risk data (regions, themes, compliance)

Quick Reference

Most Common Commands

# Run the application (port 8000)
python app.py

# Install dependencies
pip install -r requirements.txt

# Format and lint code
black .
ruff .

# Run tests
pytest
pytest --cov=lawrisk

# Test API via curl
curl http://localhost:8000/healthz
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -d "query=我要办一家电影院&debug=1"

Key File Locations

  • app.py - Flask application entry point
  • lawrisk/api/v2.py - V2 API routes (current)
  • lawrisk/services/lawrisk_v2_service.py - Enhanced V2 search logic
  • lawrisk/services/licensing_repo.py - Database operations
  • lawrisk/api/auth.py - Authentication endpoints
  • static/v2_tester.html - Web-based API testing interface
  • tests/test_auth.py - Auth system tests
  • tests/test_checkpoint_security.py - Checkpoint system tests

Architecture & Code Structure

Request Flow

HTTP Request
  → lawrisk/api/ (routing layer)
    → lawrisk/services/ (business logic)
      → lawrisk/services/licensing_repo.py (database access)
      → DashScope API (embeddings & LLM)

Core Modules

1. API Layer (lawrisk/api/)

  • v1.py - Legacy API (deprecated)
  • v2.py - Current API with structured responses + admin endpoints
  • auth.py - Authentication (login/logout/me endpoints)

2. Services Layer (lawrisk/services/)

  • lawrisk_service.py - Core search with embeddings (cosine similarity) + LLM matching
  • lawrisk_v2_service.py - Enhanced V2 with structured results, region filtering, direct permit matching
  • licensing_repo.py - PostgreSQL operations (both databases), checkpoint management
  • auth_service.py - User authentication, password hashing, seed admin creation

3. Middleware & Utils

  • middleware/smart_cors_middleware.py - Configurable CORS (wildcard, subdomains, NGINX mode)
  • utils/env_loader.py - Environment variable loading
  • utils/export_risk_json.py - Database export utility
  • utils/ingest_lawrisk.py - Data ingestion with embeddings

API Endpoints

Public Endpoints

V2 Search (Current)

  • Path: /fs-ai-asistant/api/workflow/lawrisk/v2
  • Method: POST (recommended), GET
  • Params:
    • query (required): User question
    • region (optional): Filter by region (市级, 禅城区, etc.)
    • debug (optional): Enable debug output (1/true/yes/on)
    • top (optional): Number of recommendations (default: 5)
  • Returns: Structured results with regions, themes, permits, risks

Supporting Endpoints

  • GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions - List all regions
  • GET /fs-ai-asistant/api/workflow/lawrisk/getPermits - Get permits by region
  • GET /healthz - Health check

Authentication Endpoints

  • GET /fs-ai-asistant/lawrisk/login - Login page (HTML)
  • POST /auth/login - Authenticate user
  • GET /auth/me - Get current user
  • GET /auth/logout - Logout

Admin Endpoints (Protected)

  • GET /fs-ai-asistant/api/workflow/lawrisk/admin/test - Admin test
  • GET /fs-ai-asistant/api/workflow/lawrisk/admin/regions - Region management
  • GET /fs-ai-asistant/api/workflow/lawrisk/admin/themes - Theme management
  • GET /fs-ai-asistant/api/workflow/lawrisk/admin/permits - Permit management
  • GET /fs-ai-asistant/api/workflow/lawrisk/admin/checkpoints - Checkpoint management (create/list/restore/delete)

Development Workflow

Environment Setup

# Windows PowerShell
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure .env with database credentials and DashScope API key

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=lawrisk --cov-report=html

# Run specific test file
pytest tests/test_auth.py -v

# Test authentication
pytest tests/test_auth.py::test_login_success -v

Test Files:

  • tests/test_auth.py - Authentication system tests (login, logout, session management)
  • tests/test_checkpoint_security.py - Database checkpoint security tests

Code Quality

# Format code
black .

# Lint with Ruff
ruff .

# Check specific file
black lawrisk/services/lawrisk_v2_service.py
ruff lawrisk/services/lawrisk_v2_service.py

Data Management

# Export data from fs_law_risk database
python lawrisk/utils/export_risk_json.py
# Output: data/risk_tables_export.json

# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python lawrisk/utils/ingest_lawrisk.py

Configuration

Required Environment Variables (.env)

DashScope AI Services

DASHSCOPE_API_KEY=your_api_key
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest

PostgreSQL Databases

# fs_law_risk (embeddings database)
PG_HOST=your_host
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=your_password
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres

# licensing_risks (structured data)
LIC_PG_HOST=your_host
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=your_password
LIC_PG_DATABASE=licensing_risks

Authentication

FLASK_SECRET_KEY=your-secret-key
LAWRISK_ADMIN_USERNAME=admin
LAWRISK_ADMIN_PASSWORD=adminpassword123
# Optional: LAWRISK_ADMIN_ROLE, LAWRISK_ADMIN_GRADE, LAWRISK_ADMIN_DISPLAY_NAME

Search Thresholds

LAWRISK_RETURN_IF_GE=0.7    # Return results if similarity >= 0.7
LAWRISK_FALLBACK_GT=0.5     # Use fallback if similarity > 0.5

Database Schema

fs_law_risk (Vector Embeddings)

  • law_sub - Subject matter with embeddings (id, name, vector)
  • law_sub_per - Subject-permit mappings (sub_id, per_ids)
  • law_per - Permit information (id, name, risk_ids)

licensing_risks (Structured Compliance)

  • regions - Administrative areas
  • themes - Legal themes/subjects
  • permits - License/permit items
  • risks - Risk information (content, legal_basis, document_no, summary)
  • business_scopes - Business scope definitions
  • Junction tables: region_themes, region_theme_permits, region_permit_risks

Checkpoint System

Licensing_repo.py implements database checkpoint management:

  • create_checkpoint() - Create database backup
  • list_checkpoints() - List available backups
  • restore_checkpoint() - DANGEROUS - Restore from checkpoint
  • delete_checkpoint() - Remove old checkpoints

Security Guidelines

Critical Security Notes

  • NEVER commit secrets - All credentials in .env or environment variables
  • Protect admin endpoints - /admin/* should be restricted in production
  • Checkpoint restore is dangerous - Database operation with confirmation flow
  • API keys externalized - DASHSCOPE_API_KEY and database passwords must be in .env

Authentication System

  • Session-based auth using Flask sessions
  • Password hashing with passlib
  • First admin auto-created from environment variables on startup
  • Role-based access (admin, reviewer, analyst, etc.)
  • Login page: /fs-ai-asistant/lawrisk/login
  • Protected endpoints use @login_required decorator

Recent Features (from git log)

Checkpoint System (Recent)

  • Database backup/restore functionality
  • Timeline view of checkpoints
  • Progress indicators for restore operations
  • Security tests in test_checkpoint_security.py

Permit Risk Snapshot

  • Workflow for permit risk snapshots
  • Unified snapshot and checkpoint timeline
  • Enhanced batch display for snapshots

Licensing Import Enhancement

  • Optimized district/region merging during import
  • Enhanced source display for permits

Testing Guidelines

Test Structure

tests/
├── __init__.py
├── test_auth.py              # Auth system tests (login, session, decorators)
└── test_checkpoint_security.py  # Checkpoint security tests

Running Tests

# All tests
pytest

# Verbose output
pytest -v

# Coverage report
pytest --cov=lawrisk --cov-report=term-missing

# Specific test
pytest tests/test_auth.py::test_login_success -v

Manual Testing

  1. Start app: python app.py
  2. Open browser: static/v2_tester.html
  3. Test queries:
    • "我要办一家电影院"
    • "开办旅馆需要哪些许可"
    • With region filter and debug mode

Troubleshooting

Common Issues

Database Connection

# Verify database is accessible
psql -h $PG_HOST -U $PG_USER -d $PG_DATABASE

# Check tables exist
SELECT COUNT(*) FROM fs_law_risk.law_sub;
SELECT COUNT(*) FROM licensing_risks.regions;

API Errors

# Test health check
curl http://localhost:8000/healthz

# Test V2 API with debug
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -d "query=电影院&debug=1"

# Check app logs for registered routes
python app.py 2>&1 | grep "Registered routes"

Missing Embeddings

# Check if embeddings exist
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;

# If empty, run ingestion
python lawrisk/utils/ingest_lawrisk.py

Documentation Files

  • README.md - Project overview and quick start
  • AGENTS.md - Development guidelines, coding style, testing approach
  • docs/V2_API文档.md - Detailed V2 API documentation
  • docs/API.md - V1 API documentation (legacy)
  • docs/DB_GUIDE.md - Database schema and query examples
  • docs/PRD.md - Product requirements
  • docs/CLAUDE.md - Detailed Claude Code guidance (comprehensive)

Key Components Deep Dive

V2 Service Architecture

lawrisk_v2_service.py implements:

  • Structured response formatting
  • Region filter normalization
  • Direct permit name matching
  • Markdown formatting for legal text
  • Complex query execution pipeline with concurrency

Authentication Flow

lawrisk/api/auth.py provides:

  • Login page with redirect handling
  • Session management
  • @login_required decorator for protecting endpoints
  • JSON vs HTML response handling (API vs browser)

Checkpoint Security

test_checkpoint_security.py tests:

  • Checkpoint creation authorization
  • Restore operation security
  • User permission validation
  • Operation audit logging

Best Practices

Code Style

  • Black: 100-character line length, Python 3.10+
  • Type Hints: Use PEP 604 union types (str | None)
  • Imports: Ruff-compatible, group by standard library → third-party → local
  • Naming: snake_case (functions/variables), SCREAMING_SNAKE_CASE (constants), PascalCase (classes)

Error Handling

  • Graceful degradation on startup (errors surface on first request)
  • Structured error responses: {"success": false, "message": "error", "data": {}}
  • Logging to stdout with structured format

Configuration

  • Use lawrisk.utils.env_loader for environment variables
  • Default values for non-critical configs
  • Environment-specific overrides supported

Health Checks

# Basic health
curl http://localhost:8000/healthz

# Check regions
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions

# Test search
curl -X POST http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2 \
  -d "query=电影院&debug=1"

View app startup logs to see all registered routes.