If you find yourself typing out long assertions, use inline-snapshot
Similarly, dirty-equals can be useful for comparing large data structures
Use TestModel or FunctionModel in place of your actual model to avoid the usage, latency and variability of real LLM calls
Use Agent.override to replace your model inside your application logic
Set ALLOW_MODEL_REQUESTS=False globally to block any requests from being made to non-test models accidentally
Unit testing with TestModel
The simplest and fastest way to exercise most of your application code is using TestModel, this will (by default) call all tools in the agent, then return either plain text or a structured response depending on the return type of the agent.
TestModel is not magic
The "clever" (but not too clever) part of TestModel is that it will attempt to generate valid structured data for function tools and result types based on the schema of the registered tools.
There's no ML or AI in TestModel, it's just plain old procedural Python code that tries to generate data that satisfies the JSON schema of a tool.
The resulting data won't look pretty or relevant, but it should pass Pydantic's validation in most cases.
If you want something more sophisticated, use FunctionModel and write your own data generation logic.
Let's write unit tests for the following application code:
weather_app.py
importasynciofromdatetimeimportdatefrompydantic_aiimportAgent,RunContextfromfake_databaseimportDatabaseConnfromweather_serviceimportWeatherServiceweather_agent=Agent('openai:gpt-4o',deps_type=WeatherService,system_prompt='Providing a weather forecast at the locations the user provides.',)@weather_agent.tooldefweather_forecast(ctx:RunContext[WeatherService],location:str,forecast_date:date)->str:ifforecast_date<date.today():returnctx.deps.get_historic_weather(location,forecast_date)else:returnctx.deps.get_forecast(location,forecast_date)asyncdefrun_weather_forecast(user_prompts:list[tuple[str,int]],conn:DatabaseConn):"""Run weather forecast for a list of user prompts and save."""asyncwithWeatherService()asweather_service:asyncdefrun_forecast(prompt:str,user_id:int):result=awaitweather_agent.run(prompt,deps=weather_service)awaitconn.store_forecast(user_id,result.data)# run all prompts in parallelawaitasyncio.gather(*(run_forecast(prompt,user_id)for(prompt,user_id)inuser_prompts))
Here we have a function that takes a list of (user_prompt,user_id) tuples, gets a weather forecast for each prompt, and stores the result in the database.
We want to test this code without having to mock certain objects or modify our code so we can pass test objects in.
fromdatetimeimporttimezoneimportpytestfromdirty_equalsimportIsNow,IsStrfrompydantic_aiimportmodels,capture_run_messagesfrompydantic_ai.models.testimportTestModelfrompydantic_ai.messagesimport(ModelResponse,SystemPromptPart,TextPart,ToolCallPart,ToolReturnPart,UserPromptPart,ModelRequest,)fromfake_databaseimportDatabaseConnfromweather_appimportrun_weather_forecast,weather_agentpytestmark=pytest.mark.anyiomodels.ALLOW_MODEL_REQUESTS=Falseasyncdeftest_forecast():conn=DatabaseConn()user_id=1withcapture_run_messages()asmessages:withweather_agent.override(model=TestModel()):prompt='What will the weather be like in London on 2024-11-28?'awaitrun_weather_forecast([(prompt,user_id)],conn)forecast=awaitconn.get_forecast(user_id)assertforecast=='{"weather_forecast":"Sunny with a chance of rain"}'assertmessages==[ModelRequest(parts=[SystemPromptPart(content='Providing a weather forecast at the locations the user provides.',timestamp=IsNow(tz=timezone.utc),),UserPromptPart(content='What will the weather be like in London on 2024-11-28?',timestamp=IsNow(tz=timezone.utc),),]),ModelResponse(parts=[ToolCallPart(tool_name='weather_forecast',args={'location':'a','forecast_date':'2024-01-01',},tool_call_id=IsStr(),)],model_name='test',timestamp=IsNow(tz=timezone.utc),),ModelRequest(parts=[ToolReturnPart(tool_name='weather_forecast',content='Sunny with a chance of rain',tool_call_id=IsStr(),timestamp=IsNow(tz=timezone.utc),),],),ModelResponse(parts=[TextPart(content='{"weather_forecast":"Sunny with a chance of rain"}',)],model_name='test',timestamp=IsNow(tz=timezone.utc),),]
Unit testing with FunctionModel
The above tests are a great start, but careful readers will notice that the WeatherService.get_forecast is never called since TestModel calls weather_forecast with a date in the past.
To fully exercise weather_forecast, we need to use FunctionModel to customise how the tools is called.
Here's an example of using FunctionModel to test the weather_forecast tool with custom inputs
test_weather_app2.py
importreimportpytestfrompydantic_aiimportmodelsfrompydantic_ai.messagesimport(ModelMessage,ModelResponse,TextPart,ToolCallPart,)frompydantic_ai.models.functionimportAgentInfo,FunctionModelfromfake_databaseimportDatabaseConnfromweather_appimportrun_weather_forecast,weather_agentpytestmark=pytest.mark.anyiomodels.ALLOW_MODEL_REQUESTS=Falsedefcall_weather_forecast(messages:list[ModelMessage],info:AgentInfo)->ModelResponse:iflen(messages)==1:# first call, call the weather forecast tooluser_prompt=messages[0].parts[-1]m=re.search(r'\d{4}-\d{2}-\d{2}',user_prompt.content)assertmisnotNoneargs={'location':'London','forecast_date':m.group()}returnModelResponse(parts=[ToolCallPart('weather_forecast',args)])else:# second call, return the forecastmsg=messages[-1].parts[0]assertmsg.part_kind=='tool-return'returnModelResponse(parts=[TextPart(f'The forecast is: {msg.content}')])asyncdeftest_forecast_future():conn=DatabaseConn()user_id=1withweather_agent.override(model=FunctionModel(call_weather_forecast)):prompt='What will the weather be like in London on 2032-01-01?'awaitrun_weather_forecast([(prompt,user_id)],conn)forecast=awaitconn.get_forecast(user_id)assertforecast=='The forecast is: Rainy with a chance of sun'
Overriding model via pytest fixtures
If you're writing lots of tests that all require model to be overridden, you can use pytest fixtures to override the model with TestModel or FunctionModel in a reusable way.
Here's an example of a fixture that overrides the model with TestModel:
tests.py
importpytestfromweather_appimportweather_agentfrompydantic_ai.models.testimportTestModel@pytest.fixturedefoverride_weather_agent():withweather_agent.override(model=TestModel()):yieldasyncdeftest_forecast(override_weather_agent:None):...# test code here