Langfuse Tracing and Prompt Management for Azure OpenAI and Langchain
This cookbook demonstate use of Langfuse with Azure OpenAI and Langchain for prompt versioning and evaluations.
Setup
%pip install --quiet langfuse langchain langchain-openai --upgrade
import os
# get keys for your project from https://cloud.langfuse.com
os.environ["LANGFUSE_PUBLIC_KEY"] = "pk-lf-***"
os.environ["LANGFUSE_SECRET_KEY"] = "sk-lf-***"
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # for EU data region
# os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # for US data region
# your azure openai configuration
os.environ["AZURE_OPENAI_ENDPOINT"] = "your Azure OpenAI endpoint"
os.environ["AZURE_OPENAI_API_KEY"] = "your Azure OpenAI API key"
os.environ["OPENAI_API_TYPE"] = "azure"
os.environ["OPENAI_API_VERSION"] = "2023-09-01-preview"
We’ll use the native Langfuse intgeration for Langchain. Learn more it in the documentation.
from langfuse.callback import CallbackHandler
langfuse_handler = CallbackHandler()
# optional, verify your Langfuse credentials
langfuse_handler.auth_check()
Langchain imports
from langchain_openai import AzureChatOpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.prompts.chat import (
ChatPromptTemplate,
SystemMessagePromptTemplate,
HumanMessagePromptTemplate,
)
from langchain.schema import HumanMessage
Simple example
from langchain_openai import AzureChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langfuse.callback import CallbackHandler
langfuse_handler = CallbackHandler()
prompt = ChatPromptTemplate.from_template("what is the city {person} is from?")
model = AzureChatOpenAI(
deployment_name="gpt-35-turbo",
model_name="gpt-3.5-turbo",
)
chain = prompt | model
chain.invoke({"person": "Satya Nadella"}, config={"callbacks":[langfuse_handler]})
✨ Done. Go to the Langfuse Dashboard to explore the trace of this run.
Example using Langfuse Prompt Management and Langchain
Learn more about Langfuse Prompt Management in the docs.
# Initialize the Langfuse Client
from langfuse import Langfuse
langfuse = Langfuse()
template = """
You are an AI assistant travel assistant that provides vacation recommendations to users.
You should also be able to provide information about the weather, local customs, and travel restrictions.
"""
# Push the prompt to Langfuse and immediately promote it to production
langfuse.create_prompt(
name="travel_consultant",
prompt=template,
is_active=True,
)
In your production environment, you can then fetch the production version of the prompt. The Langfuse client caches the prompt to improve performance. You can configure this behavior via a custom TTL or disable it completely.
# Get the prompt from Langfuse, cache it for 5 minutes
langfuse_prompt = langfuse.get_prompt("travel_consultant", cache_ttl_seconds=300)
We do not use the native Langfuse prompt.compile()
but use the raw prompt.prompt
as Langchain will insert the prompt variables (if any).
system_message_prompt = SystemMessagePromptTemplate.from_template(langfuse_prompt.prompt)
llm = AzureChatOpenAI(
deployment_name="gpt-35-turbo",
model_name="gpt-3.5-turbo",
)
human_message_prompt = HumanMessagePromptTemplate.from_template("{text}")
chat_prompt = ChatPromptTemplate.from_messages(
[system_message_prompt, human_message_prompt]
)
chain = LLMChain(llm=llm, prompt=chat_prompt)
result = chain.run(
f"Where should I go on vaction in Decemember for warm weather and beaches?",
callbacks=[langfuse_handler],
)
print(result)
Multiple Langchain runs in same Langfuse trace
Langchain setup
from langchain_openai import AzureChatOpenAI
from langchain.prompts import ChatPromptTemplate
from langchain.schema import StrOutputParser
from operator import itemgetter
prompt1 = ChatPromptTemplate.from_template(
"What {type} is easiest to learn but hardest to master? Give a step by step approach of your thoughts, ending in your answer"
)
prompt2 = ChatPromptTemplate.from_template(
"How {type} can be learned in 21 days? respond in {language}"
)
model = AzureChatOpenAI(
deployment_name="gpt-35-turbo",
model_name="gpt-3.5-turbo",
)
chain1 = prompt1 | model | StrOutputParser()
chain2 = (
{"type": chain1, "language": itemgetter("language")}
| prompt2
| model
| StrOutputParser()
)
Run the chain multiple times within the same Langfuse trace.
# Create trace using Langfuse Client
langfuse = Langfuse()
trace = langfuse.trace(name="chain_of_thought_example", user_id="user-1234")
# Create a handler scoped to this trace
langfuse_handler = trace.get_langchain_handler()
# First run
chain2.invoke(
{"type": "business", "language": "german"}, config={"callbacks": [langfuse_handler]}
)
# Second run
chain2.invoke(
{"type": "business", "language": "english"}, config={"callbacks": [langfuse_handler]}
)
Adding scores
When evaluating traces of your LLM application in Langfuse, you need to add scores to the trace. For simplicity, we’ll add a mocked score. Check out the docs for more information on complex score types.
Get the trace_id. We use the previous run where we created the trace using langfuse.trace()
. You can also get the trace_id via langfuse_handler.get_trace_id()
.
trace_id = trace.id
# Add score to the trace via the Langfuse Python Client
langfuse = Langfuse()
trace = langfuse.score(
trace_id=trace_id,
name="user-explicit-feedback",
value=1,
comment="I like how personalized the response is",
)