Skip to main content

Track LLM Costs

This guide walks you through setting up cost tracking for Large Language Model (LLM) API calls in your Python application. By the end you will see per-request token counts and costs appear in Beakpoint Insights automatically.

Prerequisites

Before you begin, ensure you have:

Install Dependencies

Install the OpenTelemetry SDK, the Beakpoint exporter, and the GenAI instrumentation package for your provider.

pip install opentelemetry-sdk opentelemetry-exporter-otlp-proto-http opentelemetry-instrumentation-openai-v2

Configure the Exporter

Set your Beakpoint API key and OTLP endpoint as environment variables before running your application:

export OTEL_EXPORTER_OTLP_ENDPOINT="https://ingest.beakpointinsights.com"
export OTEL_EXPORTER_OTLP_HEADERS="x-api-key=YOUR_API_KEY"
export OTEL_SERVICE_NAME="my-llm-service"

Replace YOUR_API_KEY with your Beakpoint Insights API key.

Instrument Your Application

Add the following setup code once at application startup, before you create any API clients:

from opentelemetry import trace
from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from opentelemetry.instrumentation.openai_v2 import OpenAIInstrumentor
from openai import OpenAI

# 1. Configure the tracer provider
provider = TracerProvider()
exporter = OTLPSpanExporter() # reads OTEL_EXPORTER_OTLP_* env vars
provider.add_span_processor(BatchSpanProcessor(exporter))
trace.set_tracer_provider(provider)

# 2. Instrument the OpenAI client
OpenAIInstrumentor().instrument()

# 3. Use the client as normal — all calls are now traced
client = OpenAI()

response = client.chat.completions.create(
model="gpt-4.1-mini",
messages=[{"role": "user", "content": "Summarise the key points of the Beakpoint docs."}],
)

print(response.choices[0].message.content)

The instrumentation automatically attaches the following attributes to each span:

  • gen_ai.system — identifies the provider (openai)
  • gen_ai.request.model — the model you requested
  • gen_ai.usage.input_tokens / gen_ai.usage.output_tokens — token counts used for cost calculation
  • gen_ai.usage.input_tokens.cached — cached input tokens (when prompt caching is active)

Verify Traces in Beakpoint

  1. Run your instrumented application and make at least one LLM API call.
  2. Log in to Beakpoint Insights.
  3. Navigate to Traces and search for your service name (the value you set in OTEL_SERVICE_NAME).
  4. Open a trace and confirm you can see a span with gen_ai.system = openai and non-zero token counts.
  5. Navigate to Costs to see the calculated spend broken down by model and request.
tip

If no traces appear within a minute of running your application, check that your OTEL_EXPORTER_OTLP_ENDPOINT and OTEL_EXPORTER_OTLP_HEADERS environment variables are set correctly and that outbound HTTPS traffic to the Beakpoint ingest endpoint is not blocked by a firewall.

Next Steps