# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

---

# LawRisk Backend - Development Guide

## Project Overview

**LawRisk** is a Flask-based Python backend service for intelligent legal compliance risk retrieval. It helps users find permits, licenses, and legal risks based on natural language queries using vector embeddings and LLM matching.

### Tech Stack
- **Framework**: Flask 2.3+
- **Database**: PostgreSQL (pg8000 driver)
- **AI Services**: 阿里云DashScope (text-embedding-v4, qwen-plus-latest)
- **Development**: Black, Ruff, Pytest

### Two-Database Architecture
1. **fs_law_risk**: Vector embeddings and subject-permit mappings
2. **licensing_risks**: Structured permit and risk data (regions, themes, compliance)

---

## Quick Reference

### Most Common Commands
```bash
# Run the application (port 8000)
python app.py

# Install dependencies
pip install -r requirements.txt

# Format and lint code
black .
ruff .

# Run tests
pytest
pytest --cov=lawrisk

# Test API via curl
curl http://localhost:8000/healthz
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -d "query=我要办一家电影院&debug=1"
```

### Key File Locations
- `app.py` - Flask application entry point
- `lawrisk/api/v2.py` - V2 API routes (current)
- `lawrisk/services/lawrisk_v2_service.py` - Enhanced V2 search logic
- `lawrisk/services/licensing_repo.py` - Database operations
- `lawrisk/api/auth.py` - Authentication endpoints
- `static/v2_tester.html` - Web-based API testing interface
- `tests/test_auth.py` - Auth system tests
- `tests/test_checkpoint_security.py` - Checkpoint system tests

---

## Architecture & Code Structure

### Request Flow
```
HTTP Request
  → lawrisk/api/ (routing layer)
    → lawrisk/services/ (business logic)
      → lawrisk/services/licensing_repo.py (database access)
      → DashScope API (embeddings & LLM)
```

### Core Modules

**1. API Layer (`lawrisk/api/`)**
- `v1.py` - Legacy API (deprecated)
- `v2.py` - Current API with structured responses + admin endpoints
- `auth.py` - Authentication (login/logout/me endpoints)

**2. Services Layer (`lawrisk/services/`)**
- `lawrisk_service.py` - Core search with embeddings (cosine similarity) + LLM matching
- `lawrisk_v2_service.py` - Enhanced V2 with structured results, region filtering, direct permit matching
- `licensing_repo.py` - PostgreSQL operations (both databases), checkpoint management
- `auth_service.py` - User authentication, password hashing, seed admin creation

**3. Middleware & Utils**
- `middleware/smart_cors_middleware.py` - Configurable CORS (wildcard, subdomains, NGINX mode)
- `utils/env_loader.py` - Environment variable loading
- `utils/export_risk_json.py` - Database export utility
- `utils/ingest_lawrisk.py` - Data ingestion with embeddings

---

## API Endpoints

### Public Endpoints

#### V2 Search (Current)
- **Path**: `/fs-ai-asistant/api/workflow/lawrisk/v2`
- **Method**: POST (recommended), GET
- **Params**:
  - `query` (required): User question
  - `region` (optional): Filter by region (市级, 禅城区, etc.)
  - `debug` (optional): Enable debug output (1/true/yes/on)
  - `top` (optional): Number of recommendations (default: 5)
- **Returns**: Structured results with regions, themes, permits, risks

#### Supporting Endpoints
- `GET /fs-ai-asistant/api/workflow/lawrisk/v2/regions` - List all regions
- `GET /fs-ai-asistant/api/workflow/lawrisk/getPermits` - Get permits by region
- `GET /healthz` - Health check

### Authentication Endpoints
- `GET /fs-ai-asistant/lawrisk/login` - Login page (HTML)
- `POST /auth/login` - Authenticate user
- `GET /auth/me` - Get current user
- `GET /auth/logout` - Logout

### Admin Endpoints (Protected)
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/test` - Admin test
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/regions` - Region management
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/themes` - Theme management
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/permits` - Permit management
- `GET /fs-ai-asistant/api/workflow/lawrisk/admin/checkpoints` - Checkpoint management (create/list/restore/delete)

---

## Development Workflow

### Environment Setup
```bash
# Windows PowerShell
python -m venv .venv
.venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Configure .env with database credentials and DashScope API key
```

### Testing
```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=lawrisk --cov-report=html

# Run specific test file
pytest tests/test_auth.py -v

# Test authentication
pytest tests/test_auth.py::test_login_success -v
```

**Test Files**:
- `tests/test_auth.py` - Authentication system tests (login, logout, session management)
- `tests/test_checkpoint_security.py` - Database checkpoint security tests

### Code Quality
```bash
# Format code
black .

# Lint with Ruff
ruff .

# Check specific file
black lawrisk/services/lawrisk_v2_service.py
ruff lawrisk/services/lawrisk_v2_service.py
```

### Data Management
```bash
# Export data from fs_law_risk database
python lawrisk/utils/export_risk_json.py
# Output: data/risk_tables_export.json

# Ingest data with embeddings (requires DASHSCOPE_API_KEY)
python lawrisk/utils/ingest_lawrisk.py
```

---

## Configuration

### Required Environment Variables (.env)

#### DashScope AI Services
```
DASHSCOPE_API_KEY=your_api_key
DASHSCOPE_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
DASHSCOPE_EMBED_MODEL=text-embedding-v4
DASHSCOPE_EMBED_DIM=1024
DASHSCOPE_MAX_BATCH=10
DASHSCOPE_CHAT_MODEL=qwen-plus-latest
```

#### PostgreSQL Databases
```
# fs_law_risk (embeddings database)
PG_HOST=your_host
PG_PORT=5432
PG_USER=postgres
PG_PASSWORD=your_password
PG_DATABASE=fs_law_risk
PG_ADMIN_DB=postgres

# licensing_risks (structured data)
LIC_PG_HOST=your_host
LIC_PG_PORT=5432
LIC_PG_USER=postgres
LIC_PG_PASSWORD=your_password
LIC_PG_DATABASE=licensing_risks
```

#### Authentication
```
FLASK_SECRET_KEY=your-secret-key
LAWRISK_ADMIN_USERNAME=admin
LAWRISK_ADMIN_PASSWORD=adminpassword123
# Optional: LAWRISK_ADMIN_ROLE, LAWRISK_ADMIN_GRADE, LAWRISK_ADMIN_DISPLAY_NAME
```

#### Search Thresholds
```
LAWRISK_RETURN_IF_GE=0.7    # Return results if similarity >= 0.7
LAWRISK_FALLBACK_GT=0.5     # Use fallback if similarity > 0.5
```

---

## Database Schema

### fs_law_risk (Vector Embeddings)
- **`law_sub`** - Subject matter with embeddings (id, name, vector)
- **`law_sub_per`** - Subject-permit mappings (sub_id, per_ids)
- **`law_per`** - Permit information (id, name, risk_ids)

### licensing_risks (Structured Compliance)
- **`regions`** - Administrative areas
- **`themes`** - Legal themes/subjects
- **`permits`** - License/permit items
- **`risks`** - Risk information (content, legal_basis, document_no, summary)
- **`business_scopes`** - Business scope definitions
- **Junction tables**: region_themes, region_theme_permits, region_permit_risks

### Checkpoint System
Licensing_repo.py implements database checkpoint management:
- `create_checkpoint()` - Create database backup
- `list_checkpoints()` - List available backups
- `restore_checkpoint()` - **DANGEROUS** - Restore from checkpoint
- `delete_checkpoint()` - Remove old checkpoints

---

## Security Guidelines

### Critical Security Notes
- **NEVER commit secrets** - All credentials in `.env` or environment variables
- **Protect admin endpoints** - `/admin/*` should be restricted in production
- **Checkpoint restore is dangerous** - Database operation with confirmation flow
- **API keys externalized** - `DASHSCOPE_API_KEY` and database passwords must be in `.env`

### Authentication System
- Session-based auth using Flask sessions
- Password hashing with `passlib`
- First admin auto-created from environment variables on startup
- Role-based access (admin, reviewer, analyst, etc.)
- Login page: `/fs-ai-asistant/lawrisk/login`
- Protected endpoints use `@login_required` decorator

---

## Recent Features (from git log)

### Checkpoint System (Recent)
- Database backup/restore functionality
- Timeline view of checkpoints
- Progress indicators for restore operations
- Security tests in `test_checkpoint_security.py`

### Permit Risk Snapshot
- Workflow for permit risk snapshots
- Unified snapshot and checkpoint timeline
- Enhanced batch display for snapshots

### Licensing Import Enhancement
- Optimized district/region merging during import
- Enhanced source display for permits

---

## Testing Guidelines

### Test Structure
```
tests/
├── __init__.py
├── test_auth.py              # Auth system tests (login, session, decorators)
└── test_checkpoint_security.py  # Checkpoint security tests
```

### Running Tests
```bash
# All tests
pytest

# Verbose output
pytest -v

# Coverage report
pytest --cov=lawrisk --cov-report=term-missing

# Specific test
pytest tests/test_auth.py::test_login_success -v
```

### Manual Testing
1. Start app: `python app.py`
2. Open browser: `static/v2_tester.html`
3. Test queries:
   - "我要办一家电影院"
   - "开办旅馆需要哪些许可"
   - With region filter and debug mode

---

## Troubleshooting

### Common Issues

#### Database Connection
```bash
# Verify database is accessible
psql -h $PG_HOST -U $PG_USER -d $PG_DATABASE

# Check tables exist
SELECT COUNT(*) FROM fs_law_risk.law_sub;
SELECT COUNT(*) FROM licensing_risks.regions;
```

#### API Errors
```bash
# Test health check
curl http://localhost:8000/healthz

# Test V2 API with debug
curl -X POST "http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2" \
  -d "query=电影院&debug=1"

# Check app logs for registered routes
python app.py 2>&1 | grep "Registered routes"
```

#### Missing Embeddings
```bash
# Check if embeddings exist
SELECT id, name FROM fs_law_risk.law_sub LIMIT 5;

# If empty, run ingestion
python lawrisk/utils/ingest_lawrisk.py
```

---

## Documentation Files

- **README.md** - Project overview and quick start
- **AGENTS.md** - Development guidelines, coding style, testing approach
- **docs/V2_API文档.md** - Detailed V2 API documentation
- **docs/API.md** - V1 API documentation (legacy)
- **docs/DB_GUIDE.md** - Database schema and query examples
- **docs/PRD.md** - Product requirements
- **docs/CLAUDE.md** - Detailed Claude Code guidance (comprehensive)

---

## Key Components Deep Dive

### V2 Service Architecture
`lawrisk_v2_service.py` implements:
- Structured response formatting
- Region filter normalization
- Direct permit name matching
- Markdown formatting for legal text
- Complex query execution pipeline with concurrency

### Authentication Flow
`lawrisk/api/auth.py` provides:
- Login page with redirect handling
- Session management
- `@login_required` decorator for protecting endpoints
- JSON vs HTML response handling (API vs browser)

### Checkpoint Security
`test_checkpoint_security.py` tests:
- Checkpoint creation authorization
- Restore operation security
- User permission validation
- Operation audit logging

---

## Best Practices

### Code Style
- **Black**: 100-character line length, Python 3.10+
- **Type Hints**: Use PEP 604 union types (`str | None`)
- **Imports**: Ruff-compatible, group by standard library → third-party → local
- **Naming**: snake_case (functions/variables), SCREAMING_SNAKE_CASE (constants), PascalCase (classes)

### Error Handling
- Graceful degradation on startup (errors surface on first request)
- Structured error responses: `{"success": false, "message": "error", "data": {}}`
- Logging to stdout with structured format

### Configuration
- Use `lawrisk.utils.env_loader` for environment variables
- Default values for non-critical configs
- Environment-specific overrides supported

---

## Health Checks

```bash
# Basic health
curl http://localhost:8000/healthz

# Check regions
curl http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2/regions

# Test search
curl -X POST http://localhost:8000/fs-ai-asistant/api/workflow/lawrisk/v2 \
  -d "query=电影院&debug=1"
```

View app startup logs to see all registered routes.