diff --git a/AGENTS.md b/AGENTS.md index cdcb6d0..4e341d3 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -3,9 +3,14 @@ Guidelines for AI coding agents working on BillAI - a microservices bill analysis system. ## Architecture -- `web/` - SvelteKit 5 + TailwindCSS 4 + TypeScript (Frontend Proxy & UI) -- `server/` - Go 1.21 + Gin + MongoDB (Main API & Data Storage) -- `analyzer/` - Python 3.12 + FastAPI (Data Cleaning & Analysis Service) +- `web/` - SvelteKit 5 + TailwindCSS 4 + TypeScript (Frontend Proxy & UI, port 3000) +- `server/` - Go 1.21 + Gin + MongoDB (Main API & Data Storage, port 8080) +- `analyzer/` - Python 3.12 + FastAPI (Data Cleaning & Analysis Service, port 8001) + +The SvelteKit frontend acts as a **proxy**: all `/api/*` browser requests are forwarded by +`web/src/routes/api/[...path]/+server.ts` to the Go backend. The browser never contacts Go +directly. `API_URL` env var controls the target (`http://server:8080` in Docker, +`http://localhost:8080` in local dev). ## Build/Lint/Test Commands @@ -13,14 +18,16 @@ Guidelines for AI coding agents working on BillAI - a microservices bill analysi **Working Directory:** `/Users/clz/Projects/BillAI/web` ```bash -npm run dev # Start dev server -npm run build # Production build -npm run check # TypeScript check (svelte-check) -npm run lint # Prettier + ESLint -npm run format # Format code (Prettier) -npm run test:unit # Run all unit tests (Vitest) +npm run dev # Start Vite dev server +npm run build # Production build (adapter-node) +npm run preview # Preview production build +npm run check # TypeScript check (svelte-check) +npm run lint # Prettier --check + ESLint +npm run format # Format with Prettier +npm run test # Run all unit tests once (CI mode) +npm run test:unit # Run unit tests in watch mode npx vitest run src/routes/+page.spec.ts # Run single test file -npx vitest run -t "test name" # Run test by name pattern +npx vitest run -t "test name pattern" # Run tests by name pattern ``` ### Backend (server/) @@ -31,9 +38,9 @@ go run . # Start server go build -o server . # Build binary go mod tidy # Clean dependencies go test ./... # Run all tests -go test ./handler/... # Run package tests -go test -run TestName # Run single test function -go test -v ./handler/... # Run tests with verbose output +go test ./handler/... # Run handler package tests +go test -run TestName ./... # Run single test function +go test -v ./handler/... # Verbose test output ``` ### Analyzer (analyzer/) @@ -43,9 +50,9 @@ go test -v ./handler/... # Run tests with verbose output python server.py # Start FastAPI server directly uvicorn server:app --reload # Start with hot reload pytest # Run all tests -pytest test_file.py # Run single test file -pytest -k "test_name" # Run test by name pattern -pip install -r requirements.txt # Install dependencies +pytest test_jd_cleaner.py # Run single test file +pytest -k "test_name" # Run test by name pattern +pip install -r requirements.txt # Install dependencies ``` ### Docker @@ -60,68 +67,144 @@ docker-compose down # Stop all services ## Code Style ### General -- **Comments:** Existing comments often use Chinese for business logic explanations. Maintain this style where appropriate, but English is also acceptable for technical explanations. -- **Conventions:** Follow existing patterns strictly. Do not introduce new frameworks or libraries without checking `package.json`/`go.mod`/`requirements.txt`. +- **Comments:** Existing comments often use Chinese for business logic explanations. Maintain this + style where appropriate; English is also acceptable for technical explanations. +- **Conventions:** Follow existing patterns strictly. Do not introduce new frameworks or libraries + without checking `package.json` / `go.mod` / `requirements.txt`. ### TypeScript/Svelte (web/) -- **Formatting:** Prettier (Tabs, single quotes, no trailing commas, printWidth 100). -- **Naming:** `PascalCase` for types/components/interfaces, `camelCase` for variables/functions. -- **Imports:** Use `$lib` alias for internal imports. +- **Formatting:** Prettier — tabs, single quotes, no trailing commas, printWidth 100, + `prettier-plugin-svelte`. +- **Naming:** `PascalCase` for types/interfaces/components, `camelCase` for variables/functions. +- **Imports:** Use `$lib` alias for internal imports and `$app/*` for SvelteKit builtins. Never + use relative paths for lib-level modules. ```typescript - import { browser } from '$app/environment'; - import { auth } from '$lib/stores/auth'; - import type { UIBill } from '$lib/models/bill'; + import { browser } from '$app/environment' + import { goto } from '$app/navigation' + import { auth } from '$lib/stores/auth' + import type { UIBill } from '$lib/models/bill' + import Upload from '@lucide/svelte/icons/upload' ``` -- **Types:** Define interfaces for data models. Use `export interface`. -- **Error Handling:** Check `response.ok`. Throw `Error` with status for UI to catch. +- **Svelte 5 runes:** Use the new runes API — `$state`, `$derived`, `$effect`, `$props`. Event + handlers use `onclick={fn}` syntax (not legacy `on:click`). +- **Types:** Define `export interface` for all data models. Frontend models use `camelCase` fields + (`UIBill`); API responses use `snake_case` (`CleanedBill`). Provide explicit converter functions + (e.g., `cleanedBillToUIBill`, `uiBillToUpdateBillRequest`) in `web/src/lib/models/bill.ts`. +- **Error Handling:** Check `response.ok`; throw `new Error(\`HTTP ${response.status}\`)` for the + UI to catch. On 401, call `auth.logout()` and redirect to `/login`. +- **Auth pattern:** `createAuthStore()` factory in `$lib/stores/auth.ts`. Token stored in + `localStorage` under key `auth`. All API calls go through `apiFetch()` in `$lib/api.ts`, which + injects `Authorization: Bearer ` and handles 401 centrally. +- **Testing:** Vitest + `vitest-browser-svelte` + Playwright. Test files co-located with routes + as `*.spec.ts`. Use `describe` / `it` / `expect` from vitest, `render` from + `vitest-browser-svelte`. ### Go Backend (server/) -- **Structure:** `handler` (HTTP) → `service` (Logic) → `repository` (DB) → `model` (Structs). -- **Tags:** Use `json` (snake_case) and `form` tags. Use `omitempty` for optional fields. +- **Layer structure:** `handler` (HTTP) → `service` (logic) → `adapter` (external Python service) + and `repository` (DB) → `model` (structs). Handlers must not contain business logic. +- **Struct tags:** JSON uses `snake_case`. `omitempty` on optional response fields. Use `form` tags + for query/form binding. Use pointer fields (`*string`) for optional patch request fields. Sensitive + fields get `json:"-"`. ```go + type CleanedBill struct { + ID primitive.ObjectID `bson:"_id,omitempty" json:"id,omitempty"` + BillType string `bson:"bill_type" json:"bill_type"` + } type UpdateBillRequest struct { - Category *string `json:"category,omitempty" form:"category"` + Category *string `json:"category,omitempty"` } ``` -- **Error Handling:** Return `500` for DB errors, `400` for bad requests. Wrap errors with context. +- **Error Handling:** Return `500` for DB/internal errors, `400` for bad requests, `404` for not + found. Wrap errors with context using `fmt.Errorf("context: %w", err)`. Check + `err == repository.ErrNotFound` for 404 disambiguation. Use `Result bool` (not `Success`) in + response envelopes. ```go if err != nil { c.JSON(http.StatusInternalServerError, Response{Result: false, Message: err.Error()}) return } ``` +- **Response envelope:** Most endpoints: `Result bool`, `Message string`, `Data *T`. Auth endpoints + use `success bool`, `error string`, `data interface{}`. +- **Interfaces:** Use `adapter.Cleaner` and `repository.BillRepository` interfaces. Access global + singletons via `adapter.GetCleaner()` and `repository.GetRepository()`. +- **Time:** Use the custom `LocalTime` type (wraps `time.Time`) for all timestamp fields. It + serializes as `"2006-01-02 15:04:05"` in both JSON and BSON, preserving local time. +- **Soft delete:** Bills are never hard-deleted. All queries must filter `is_deleted: false`. ### Python Analyzer (analyzer/) -- **Style:** PEP 8. Use `snake_case` for variables/functions. -- **Type Hints:** Mandatory for function arguments and return types. -- **Models:** Use `pydantic.BaseModel` for API schemas. +- **Style:** PEP 8. `snake_case` for variables, functions, and filenames. `UPPER_CASE` for + module-level constants. Prefix private module globals with `_`. +- **Type Hints:** Mandatory for all function arguments and return types. Use `Optional[str]` from + `typing` or `str | None` (Python 3.10+ union syntax). +- **Models:** Use `pydantic.BaseModel` for all API request/response schemas. ```python class CleanRequest(BaseModel): input_path: str + output_path: str + year: Optional[str] = None bill_type: Optional[str] = "auto" ``` -- **Docstrings:** Use triple quotes. Chinese descriptions are common for API docs. +- **FastAPI patterns:** Use `HTTPException(status_code=400, detail=message)` for user errors. + Manage temporary files with `tempfile.NamedTemporaryFile` + `os.unlink` in `finally` blocks. +- **Cleaner classes:** Extend `BaseCleaner(ABC)` from `cleaners/base.py`. Implement `clean()` and + optionally `reclassify()`. Category inference reads rules from `config/category.yaml` via + `yaml.safe_load`. +- **Docstrings:** Triple-quoted. Chinese descriptions are common for API endpoint docs. ## Key Patterns -- **API Flow:** - - Frontend talks to `server` (Go) via `/api` proxy. - - `server` handles auth, DB operations, and delegates complex file processing to `analyzer` (Python). - - `analyzer` cleanses CSV/Excel files and returns structured JSON/CSV to `server`. +### API Flow +``` +Browser → SvelteKit proxy (/api/[...path]/+server.ts) + → Go server (Gin, AuthRequired middleware) + → handler → service → adapter.GetCleaner() → HTTP POST to Python FastAPI + → repository.GetRepository() → MongoDB +``` -- **Authentication:** - - JWT based. Token stored in frontend. - - Header: `Authorization: Bearer `. - - Backend middleware checks token. 401 triggers logout/redirect. +### Authentication +- JWT (HS256). Token in `localStorage` under key `auth`. +- Header: `Authorization: Bearer `. +- `middleware.AuthRequired()` wraps all `/api/*` routes except `/api/auth/*`. +- Passwords in `config.yaml` support plaintext or SHA-256 hashed values. +- 401 anywhere → `auth.logout()` + redirect to `/login`. -- **File Processing:** - - Flow: Upload (ZIP/XLSX) -> Extract/Convert (to UTF-8 CSV) -> Clean (normalize columns) -> Import to DB. - - `analyzer` uses `openpyxl` for Excel and regex for cleaning text. +### File Processing +Upload flow: Upload (ZIP/XLSX) → Extract → Convert to UTF-8 CSV (Python `/convert`) → +Auto-detect bill type → Deduplicate against DB → Clean/normalize (Python `/clean/upload`) → +Save raw + cleaned bills to MongoDB. + +Deduplication: raw bills check `transaction_id`; cleaned bills check +`transaction_id + merchant_order_no`. JD bills trigger soft-deletion of overlapping records in +other sources to prevent double-counting. + +### Adapter (Go ↔ Python) +`adapter.Cleaner` interface has two implementations: HTTP-based (`adapter/http`, default) and +subprocess-based (`adapter/python`, legacy). Controlled by `ANALYZER_MODE` env var. ## Important Files -- `web/src/lib/api.ts` - Centralized API client methods. -- `web/src/lib/models/*.ts` - Frontend data models (should match backend JSON). -- `server/handler/*.go` - HTTP endpoint definitions. -- `server/repository/mongo.go` - MongoDB connection and queries. -- `analyzer/server.py` - FastAPI entry point and routing. -- `analyzer/cleaners/*.py` - Specific logic for Alipay/Wechat/JD bills. +| File | Role | +|---|---| +| `web/src/lib/api.ts` | Central API client; `apiFetch()` injects auth and handles 401 | +| `web/src/lib/stores/auth.ts` | Auth state; JWT in localStorage; login/logout/validate | +| `web/src/lib/models/bill.ts` | `UIBill` model + converters to/from API `CleanedBill` shape | +| `web/src/routes/api/[...path]/+server.ts` | SvelteKit proxy to Go backend | +| `server/main.go` | Entry point; wires config, adapters, repository, router | +| `server/config/config.go` | YAML + env config; priority: defaults → config.yaml → env vars | +| `server/router/router.go` | All route definitions and middleware assignment | +| `server/middleware/auth.go` | JWT validation + user context injection | +| `server/handler/upload.go` | Full upload pipeline (extract → convert → clean → store) | +| `server/handler/bills.go` | List/filter bills with pagination and monthly stats | +| `server/model/bill.go` | `RawBill`, `CleanedBill`, `MonthlyStat`; custom `LocalTime` type | +| `server/adapter/adapter.go` | `Cleaner` interface definition | +| `server/repository/repository.go` | `BillRepository` interface (14 persistence methods) | +| `server/repository/mongo/repository.go` | MongoDB implementation with aggregation pipelines | +| `analyzer/server.py` | FastAPI entry point; `/health`, `/clean`, `/convert`, `/detect` routes | +| `analyzer/cleaners/base.py` | `BaseCleaner` ABC; shared filtering and output logic | +| `analyzer/cleaners/alipay.py` | Alipay-specific normalization | +| `analyzer/cleaners/wechat.py` | WeChat-specific normalization | +| `analyzer/cleaners/jd.py` | JD (京东) normalization and 3-level review scoring | +| `analyzer/category.py` | `infer_category()` using YAML keyword rules | +| `analyzer/converter.py` | xlsx→csv (openpyxl), GBK→UTF-8 re-encoding, type detection | +| `server/config.yaml` | Server port, MongoDB URI, JWT settings, user list | +| `docker-compose.yaml` | 5 services: web, server, analyzer, mongodb, mongo-express |