Using Free AI Models with the GitHub Models Inference API
How to easily call modern AI models such as GPT-4.1 and DeepSeek R1 through GitHub's free model inference API
Introducing the GitHub Models Inference API
GitHub provides a service called GitHub Models that lets developers test and use a variety of modern AI models for free. Through GitHub's infrastructure, it offers access to leading AI models from OpenAI, Meta, Google, Anthropic, and others.
In this post, we will walk through, step by step, how to call AI models and receive responses using the GitHub Models Inference API.
Key Features
- Free to use: Any GitHub account can access the AI models at no cost
- Wide model selection: Supports recent models including GPT-4.1, DeepSeek R1, Llama, Phi, and more
- OpenAI compatible: Works with the OpenAI Python SDK as-is
- Simple authentication: Authenticates with a GitHub Personal Access Token
Prerequisites
1. Create a GitHub Personal Access Token
To use the GitHub Models API, you first need a Personal Access Token:
- Go to GitHub Settings > Developer settings > Personal access tokens
- Click "Generate new token"
- Select the required scopes (models-related permissions are needed)
- Generate the token and store it safely
2. Set Up the Python Environment
Install the OpenAI Python library:
pip install openaiRequired library version:
openai>=1.52.2
Listing Available Models
You can first check the list of models offered by GitHub Models:
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer YOUR_GITHUB_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://models.github.ai/catalog/models | jqThis command returns metadata for every available model in JSON format.
Environment Configuration
In Google Colab or a local environment, set the GitHub token as an environment variable:
import os
# When using Colab
from google.colab import userdata
os.environ['GITHUB_TOKEN'] = userdata.get('GITHUB_PERSONAL_ACCESS_TOKEN')
# In a local environment, set it directly
# os.environ['GITHUB_TOKEN'] = 'your_token_here'Security note: Do not hardcode the token in your source code. Use environment variables or a secret management system.
Hands-On Example 1: Using GPT-4.1
Here is an example of using OpenAI's latest GPT-4.1 model through GitHub Models:
from openai import OpenAI
# GitHub Models 설정
GITHUB_PERSONAL_ACCESS_TOKEN = os.environ['GITHUB_TOKEN']
GITHUB_INFERENCE_URL = "https://models.github.ai/inference/"
GITHUB_MODEL = "openai/gpt-4.1"
# OpenAI 클라이언트 생성
client = OpenAI(
api_key=GITHUB_PERSONAL_ACCESS_TOKEN,
base_url=GITHUB_INFERENCE_URL,
)
# 채팅 완성 요청
completion = client.chat.completions.create(
model=GITHUB_MODEL,
messages=[
{"role": "system", "content": "한국어로 대답해"},
{"role": "user", "content": "에어컨 여름철 적정 온도는? 한줄로 답변해줘"},
],
)
print(completion.choices[0].message.content)Output:
여름철 에어컨 적정 온도는 26~28도입니다.Code Walkthrough
- API endpoint: Set
https://models.github.ai/inference/as thebase_url - Model selection: Specify the model name in
openai/gpt-4.1format - Authentication: Use the GitHub Personal Access Token as the API key
- Message structure: Same format as the OpenAI API
system: The AI's role and behavioral guidelinesuser: The user's question
Hands-On Example 2: Using DeepSeek R1
DeepSeek R1 is a unique AI model that exposes its reasoning process:
from openai import OpenAI
# DeepSeek R1 모델 설정
GITHUB_PERSONAL_ACCESS_TOKEN = os.environ['GITHUB_TOKEN']
GITHUB_INFERENCE_URL = "https://models.github.ai/inference/"
GITHUB_MODEL = "deepseek/deepseek-r1"
client = OpenAI(
api_key=GITHUB_PERSONAL_ACCESS_TOKEN,
base_url=GITHUB_INFERENCE_URL,
)
completion = client.chat.completions.create(
model=GITHUB_MODEL,
messages=[
{"role": "system", "content": "한국어로 대답해"},
{"role": "user", "content": "에어컨 여름철 적정 온도는? 한줄로 답변해줘"},
],
)
print(completion.choices[0].message.content)Output:
<think>
Okay, the user is asking about the appropriate temperature for an air conditioner
during summer, and they want a one-line answer in Korean. Let me start by recalling
the standard recommendations. I remember that energy efficiency guidelines often
recommend around 24-26 degrees Celsius. But I should also consider comfort and
health aspects. Setting it too low can cause issues like increased energy bills
and potential health problems from the temperature difference between indoors and
outdoors. Also, in Korea, there might be specific guidelines or common practices.
Let me verify if there's a commonly cited temperature in Korean sources. Yes, I
think the Korean government often suggests 26 degrees Celsius as the ideal setting
for energy conservation. So combining both efficiency and comfort, 26 degrees seems
right. I should present it concisely in one line.
</think>
여름철 에어컨 적정 온도는 에너지 효율과 쾌적함을 고려하여 26℃로 설정하는 것이 좋습니다.What Makes DeepSeek R1 Special
The DeepSeek R1 model embeds its Chain of Thought reasoning inside <think> tags:
- Understanding the question
- Recalling relevant information
- Considering multiple angles (energy efficiency, comfort, health, etc.)
- Arriving at the final answer
This transparently shows how the AI thinks and reaches its conclusion.
API Usage Patterns
Basic Pattern
from openai import OpenAI
def create_github_client(token: str, model: str):
"""GitHub Models 클라이언트 생성"""
return OpenAI(
api_key=token,
base_url="https://models.github.ai/inference/",
)
def chat_completion(client: OpenAI, model: str, messages: list):
"""채팅 완성 요청"""
completion = client.chat.completions.create(
model=model,
messages=messages,
)
return completion.choices[0].message.contentStreaming Responses
For long responses, you can use streaming:
completion = client.chat.completions.create(
model=GITHUB_MODEL,
messages=[
{"role": "user", "content": "인공지능의 역사에 대해 설명해줘"},
],
stream=True, # 스트리밍 활성화
)
for chunk in completion:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="")Tuning Parameters
Various parameters let you control the quality of the response:
completion = client.chat.completions.create(
model=GITHUB_MODEL,
messages=messages,
temperature=0.7, # 창의성 조절 (0.0~2.0)
max_tokens=1000, # 최대 토큰 수
top_p=0.9, # 핵심 샘플링
frequency_penalty=0, # 반복 페널티
presence_penalty=0, # 주제 다양성
)Examples of Supported Models
Some of the major models available through GitHub Models:
OpenAI Family
openai/gpt-4.1- The latest GPT-4 modelopenai/gpt-4-turbo- Faster GPT-4openai/gpt-3.5-turbo- Lightweight model
Meta Family
meta/llama-3.1-405b-instruct- Ultra-large Llama modelmeta/llama-3.1-70b-instruct- Large Llama modelmeta/llama-3.1-8b-instruct- Lightweight Llama model
DeepSeek Family
deepseek/deepseek-r1- Reasoning-exposing modeldeepseek/deepseek-v3- The latest DeepSeek model
Microsoft Family
microsoft/phi-4- Efficient small model
Practical Use Cases
1. Code Review Assistant
def code_review(code: str):
messages = [
{
"role": "system",
"content": "당신은 경험 많은 시니어 개발자입니다. 코드를 리뷰하고 개선점을 제안하세요."
},
{
"role": "user",
"content": f"다음 코드를 리뷰해주세요:\n\n{code}"
}
]
completion = client.chat.completions.create(
model="openai/gpt-4.1",
messages=messages,
)
return completion.choices[0].message.content2. Document Summarization
def summarize_document(text: str, lang: str = "ko"):
messages = [
{
"role": "system",
"content": f"{lang}으로 핵심 내용을 3-5개 불릿 포인트로 요약하세요."
},
{
"role": "user",
"content": text
}
]
completion = client.chat.completions.create(
model="openai/gpt-4.1",
messages=messages,
temperature=0.3, # 일관성 있는 요약을 위해 낮은 temperature
)
return completion.choices[0].message.content3. Multilingual Translation
def translate(text: str, source_lang: str, target_lang: str):
messages = [
{
"role": "system",
"content": f"당신은 전문 번역가입니다. {source_lang}에서 {target_lang}로 자연스럽게 번역하세요."
},
{
"role": "user",
"content": text
}
]
completion = client.chat.completions.create(
model="openai/gpt-4.1",
messages=messages,
temperature=0.3,
)
return completion.choices[0].message.contentCost and Limitations
Free Tier Limits
GitHub Models is free to use, but comes with the following constraints:
- Rate limiting: A cap on requests per minute
- Token limits: A maximum number of tokens per day
- Per-model limits: Different models may have different limits
Production Use
For real production workloads:
- Treat GitHub Models as a development/testing tool
- Prefer each model provider's official API for actual services
- Consider paid services when high throughput is required
Best Practices
1. Error Handling
from openai import OpenAI, OpenAIError
import time
def safe_completion(client, model, messages, max_retries=3):
"""재시도 로직이 포함된 안전한 API 호출"""
for attempt in range(max_retries):
try:
completion = client.chat.completions.create(
model=model,
messages=messages,
)
return completion.choices[0].message.content
except OpenAIError as e:
if attempt == max_retries - 1:
raise
print(f"오류 발생, 재시도 중... ({attempt + 1}/{max_retries})")
time.sleep(2 ** attempt) # Exponential backoff2. Token Management
def count_tokens_approx(text: str) -> int:
"""대략적인 토큰 수 계산 (실제로는 tiktoken 라이브러리 사용 권장)"""
return len(text.split()) * 1.3 # 한국어는 영어보다 토큰 수가 많음
def truncate_text(text: str, max_tokens: int = 4000):
"""텍스트를 최대 토큰 수에 맞게 자르기"""
words = text.split()
if count_tokens_approx(text) > max_tokens:
# 토큰 수를 초과하면 단어 수를 줄임
target_words = int(max_tokens / 1.3)
return ' '.join(words[:target_words])
return text3. Prompt Engineering
def create_structured_prompt(task: str, context: str, constraints: str):
"""구조화된 프롬프트 생성"""
return f"""
## 작업 (Task)
{task}
## 맥락 (Context)
{context}
## 제약사항 (Constraints)
{constraints}
위 내용을 바탕으로 응답해주세요.
"""
# 사용 예
prompt = create_structured_prompt(
task="파이썬 함수 작성",
context="리스트에서 중복을 제거하고 정렬된 결과를 반환",
constraints="타입 힌트 포함, docstring 작성, 시간복잡도 O(n log n)"
)References
Closing Thoughts
The GitHub Models Inference API is an excellent platform for testing and experimenting with a wide range of modern AI models for free. Because it is compatible with the OpenAI SDK, you can easily port existing code over and start using current models such as GPT-4.1 and DeepSeek R1 right away.
It is especially useful for comparing and evaluating different models during development, and it works well for prototyping or learning purposes. For production environments, however, it is generally recommended to use each model provider's official API.
Go ahead and build AI-powered applications more easily by tapping into the GitHub Models API!