Experimenting with the GPT-5 Responses API Web Search Tool

An experimental record of implementing web search with OpenAI's GPT-5 Responses API, focused on tool-support differences between models and how parameters shape responses. The analysis centers on web search tool compatibility between gpt-5 and gpt-5-chat-latest.

#AI #GPT-5 #API #web search #OpenAI #tutorial

Experimenting with the GPT-5 Responses API Web Search Tool

OpenAI's GPT-5 Responses API includes a built-in web search capability, and these experiments confirmed that using it requires a specific model identifier and a particular parameter configuration. This document records a series of API calls performed in Google Colab to examine which model identifiers expose the web search tool and how each parameter influences response generation.

Setting Up the Experiment

The experiments used the OpenAI Python library, with authentication handled by reading the API key from Google Colab's userdata and exporting it as an environment variable.

import os
from google.colab import userdata
os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

from openai import OpenAI
client = OpenAI()

This setup keeps the API key out of source code and provides a convenient way to run repeated experiments inside a Colab notebook.

Web Search with the GPT-5 Model

The first experiment specified "gpt-5" as the model identifier and enabled the web search tool. The call succeeded as expected.

response = client.responses.create(
  model="gpt-5",
  input=[{"role": "user", "content": "한국의 대표적인 DB 모니터링 솔루션 업체는?"}],
  tools=[{"type": "web_search", "search_context_size": "medium"}],
  text={"verbosity": "medium"},
  reasoning={"effort": "medium"},
)

print(response.output_text)

The response returned concrete vendor names and product information, as shown below.

대표적으로 아래 업체들이 많이 쓰입니다:
- 엑셈(Exem) – MaxGauge(맥스게이지)로 잘 알려진 DB 성능/모니터링 전문 솔루션을 제공. 클라우드 DB 통합 모니터링 제품도 보유.
- 셀파소프트 – Sherpa for Oracle/HANA 등 DB 성능 모니터링·사후 분석 솔루션 제공.
- 티맥스티베로(TmaxTibero) – 자사 DBMS용 성능 모니터링 솔루션 SysMasterDB 제공.
- 웨어밸리(Warevalley) – Orange 제품군으로 DB 운영·개발·성능 관리와 실시간 모니터링 기능 제공.

특정 DBMS나 환경(온프레미스/클라우드, Oracle·PostgreSQL 등)에 따라 권장 솔루션이 달라집니다.

This output suggests that the model used web search to gather current information rather than relying solely on its training data, demonstrating that the API can reach beyond static knowledge when needed.

Analyzing the Parameter Configuration

Each parameter used in the call shapes a different aspect of the response. The configuration breaks down as follows.

model parameter

The "gpt-5" identifier designates an API-oriented model that is compatible with the web search tool. Selecting this identifier turned out to be a prerequisite for invoking web search at all.

tools parameter

The web search tool is specified as a list of dictionaries with the following form.

tools=[{"type": "web_search", "search_context_size": "medium"}]

The type field names the tool to invoke ("web_search"), while search_context_size controls how much retrieved context the model considers. The accepted values are small, medium, and large.

text parameter

This parameter controls how detailed the response text is. The verbosity field accepts low, medium, or high.

text={"verbosity": "medium"}

The experiments used medium, which produced responses that were neither overly terse nor unnecessarily long.

reasoning parameter

This parameter governs how much reasoning effort the model spends while producing its response.

reasoning={"effort": "medium"}

The effort field accepts low, medium, or high. Higher values lead to longer response times in exchange for more elaborate reasoning.

Trying gpt-5-chat-latest and the Resulting Error

In the second experiment, the model identifier was changed to "gpt-5-chat-latest" while keeping the same web search configuration. The result was not what was expected.

response = client.responses.create(
  model="gpt-5-chat-latest",
  input=[{"role": "user", "content": "한국의 대표적인 DB 모니터링 솔루션 업체는?"}],
  tools=[{"type": "web_search", "search_context_size": "medium"}],
  text={"verbosity": "medium"},
  reasoning={"effort": "medium"},
)

The call failed with the following BadRequestError.

BadRequestError: Error code: 400 - {'error': {'message': "Hosted tool 'web_search_preview' is not supported with gpt-5-chat-latest.", 'type': 'invalid_request_error', 'param': 'tools', 'code': None}}

The error message indicates that the gpt-5-chat-latest model does not support the hosted tool internally identified as web_search_preview. This is not a parameter mistake but a capability constraint of the model itself. The gpt-5-chat-latest identifier appears to track the model used by the ChatGPT product, where the tool-use pattern is implemented differently from the API surface.

Comparing the Two Models

The differences between the two models observed during the experiments can be summarized as follows.

Model	Web search support	Notes
gpt-5	Supported	API-oriented model that is fully compatible with the web search tool
gpt-5-chat-latest	Not supported	Aligned with the ChatGPT product; raises an error when web search is requested via the API

The takeaway is that choosing a model identifier is not just a question of performance: it directly determines which capabilities are available.

Observations on Parameter Behavior

The search_context_size parameter appears to control how much retrieved information is supplied to the model. small references a narrower set of results, medium a moderate amount, and large a wider context. Simple factual questions worked well with small, while queries that called for synthesizing multiple facts produced more comprehensive answers at medium or above.

For verbosity, the low setting yielded concise responses containing only the essentials, whereas high produced detailed answers with additional background and supporting context. The medium setting sat between the two extremes and offered an appropriate level of detail for most queries.

The effect of reasoning.effort was less obvious than verbosity. At high, response times grew noticeably and answers to questions requiring multi-step reasoning appeared more logically structured. At low, responses came back faster, but for simple lookups the difference compared to medium was negligible.

What the Experiments Surfaced

These experiments confirmed that the availability of web search in the GPT-5 Responses API depends on the chosen model identifier, and that only gpt-5 fully supports the tool. The gpt-5-chat-latest model seems to prioritize consistency with the ChatGPT interface, and its integration with hosted API tools is limited.

Parameter configuration provides fine-grained control over response characteristics, but the exact mechanism by which each parameter shapes the output is not entirely transparent. How search_context_size, verbosity, and reasoning.effort interact to determine final response quality is still an area where more systematic experimentation would be useful.

API calls that invoke the web search tool produce qualitatively different responses from calls that rely on training data alone, and the feature looks valuable for applications that need current information. Because model selection has a direct effect on which capabilities are available, it is worth picking the right model identifier early during development.