QuotaCheap Playbook

Cost observability: log gì để biết workflow nào đang đốt tiền

Hướng dẫn cost observability cho AI agent: request id, workflow id, user/workspace, model, tokens, latency, tool calls, retries, estimated cost và dashboar…

Không đo cost theo workflow thì bạn chỉ biết tháng này tốn tiền, không biết tiền chảy đi đâu.

Hướng dẫn cost observability cho AI agent: request id, workflow id, user/workspace, model, tokens, latency, tool calls, retries, estimated cost và dashboard vận hành.

Cost observability: log gì để biết workflow nào đang đốt tiền AI bill tổng chỉ nói bạn đã tốn bao nhiêu.

Nó không nói workflow nào gây tốn, user nào dùng nhiều, model nào bị gọi sai chỗ hay tool nào làm retry liên tục.

Cost observability là cách nối usage/cost với workflow thật.

Log tối thiểu cho mỗi model request Mỗi request nên có: request id user id workspace id workflow id session id model input tokens output tokens cache read tokens nếu có latency status/error estimated cost Log tool calls Model cost chỉ là một phần.

Agent workflow cần thêm: tool name tool call count failed tool calls retry count tool latency output size Tool output dài thường làm input tokens tăng ở bước model tiếp theo.

Cost per successful task Metric quan trọng không phải cost/request đơn lẻ, mà là cost trên outcome: cost per answered ticket cost per generated post cost per successful code review cost per completed workflow Nếu một workflow rẻ nhưng fail nhiều, nó có thể đắt hơn workflow dùng model mạnh nhưng thành công ngay.

Dashboard cần có Dashboard tối thiểu: top workflows by cost top users/workspaces by usage top models by spend high retry workflows high input token requests high latency requests error rate by model/tool Alert cần có daily spend vượt threshold request cost vượt threshold retry spike input tokens tăng bất thường model fallback tăng bất thường cron/background job dùng nhiều hơn bình thường QuotaCheap angle QuotaCheap phù hợp làm lớp OpenAI compatible gateway để app ghi nhận request logs, token usage, latency và billing visibility.

Từ đó bạn có dữ liệu để route model, đặt quota, giới hạn plan và tối ưu unit economics.

Kết luận Không có cost observability, mọi cost optimization chỉ là đoán.

Hãy log theo workflow, nối model calls với tool calls, và tối ưu dựa trên cost per successful task.

Cách áp dụng trong sản phẩm thật Cost optimization không nên bắt đầu bằng việc đổi model hàng loạt.

Cách an toàn hơn là thêm visibility trước, rồi tối ưu từng điểm có dữ liệu.

Một rollout thực dụng: 1.

Đo trước : log model, input tokens, output tokens, latency, workflow id và user/workspace id.

Tìm top waste : xem workflow nào tốn nhất, request nào context dài nhất, job nào retry nhiều nhất.

Cắt phần thừa ít rủi ro : prune tool output, giới hạn history, tách prompt theo task type.

Route model có kiểm soát : đưa task đơn giản sang model rẻ/nhanh hơn, nhưng giữ quality gate.

Đặt guardrails : quota, retry cap, per workflow budget, alert spend spike.

Điểm quan trọng: đừng tối ưu bằng cảm giác.

Hãy tối ưu bằng cost per successful task.

Ví dụ dashboard tối thiểu Một dashboard cost cho AI agent không cần phức tạp ngay.

Bản đầu tiên nên trả lời được: Hôm nay đã tốn bao nhiêu credit?

Workflow nào tốn nhiều nhất?

Model nào được gọi nhiều nhất?