Development October 16, 2025 8 min read

AI assisted

Using Free AI Models with the GitHub Models Inference API

How to easily call modern AI models such as GPT-4.1 and DeepSeek R1 through GitHub's free model inference API

#GitHub #Inference API #AI #API #machine learning #GPT-4 #DeepSeek #OpenAI

Introducing the GitHub Models Inference API

GitHub provides a service called GitHub Models that lets developers test and use a variety of modern AI models for free. Through GitHub's infrastructure, it offers access to leading AI models from OpenAI, Meta, Google, Anthropic, and others.

In this post, we will walk through, step by step, how to call AI models and receive responses using the GitHub Models Inference API.

Key Features

Free to use: Any GitHub account can access the AI models at no cost
Wide model selection: Supports recent models including GPT-4.1, DeepSeek R1, Llama, Phi, and more
OpenAI compatible: Works with the OpenAI Python SDK as-is
Simple authentication: Authenticates with a GitHub Personal Access Token

Prerequisites

1. Create a GitHub Personal Access Token

To use the GitHub Models API, you first need a Personal Access Token:

Go to GitHub Settings > Developer settings > Personal access tokens
Click "Generate new token"
Select the required scopes (models-related permissions are needed)
Generate the token and store it safely

2. Set Up the Python Environment

Install the OpenAI Python library:

pip install openai

Required library version:

openai>=1.52.2

Listing Available Models

You can first check the list of models offered by GitHub Models:

curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer YOUR_GITHUB_TOKEN" \
  -H "X-GitHub-Api-Version: 2022-11-28" \
  https://models.github.ai/catalog/models | jq

This command returns metadata for every available model in JSON format.

Environment Configuration

In Google Colab or a local environment, set the GitHub token as an environment variable:

import os

# When using Colab
from google.colab import userdata
os.environ['GITHUB_TOKEN'] = userdata.get('GITHUB_PERSONAL_ACCESS_TOKEN')

# In a local environment, set it directly
# os.environ['GITHUB_TOKEN'] = 'your_token_here'

Security note: Do not hardcode the token in your source code. Use environment variables or a secret management system.

Hands-On Example 1: Using GPT-4.1

Here is an example of using OpenAI's latest GPT-4.1 model through GitHub Models:

from openai import OpenAI

# GitHub Models 설정
GITHUB_PERSONAL_ACCESS_TOKEN = os.environ['GITHUB_TOKEN']
GITHUB_INFERENCE_URL = "https://models.github.ai/inference/"
GITHUB_MODEL = "openai/gpt-4.1"

# OpenAI 클라이언트 생성
client = OpenAI(
    api_key=GITHUB_PERSONAL_ACCESS_TOKEN,
    base_url=GITHUB_INFERENCE_URL,
)

# 채팅 완성 요청
completion = client.chat.completions.create(
    model=GITHUB_MODEL,
    messages=[
        {"role": "system", "content": "한국어로 대답해"},
        {"role": "user", "content": "에어컨 여름철 적정 온도는? 한줄로 답변해줘"},
    ],
)

print(completion.choices[0].message.content)

Output:

여름철 에어컨 적정 온도는 26~28도입니다.

Code Walkthrough

API endpoint: Set https://models.github.ai/inference/ as the base_url
Model selection: Specify the model name in openai/gpt-4.1 format
Authentication: Use the GitHub Personal Access Token as the API key
Message structure: Same format as the OpenAI API
- system: The AI's role and behavioral guidelines
- user: The user's question

Hands-On Example 2: Using DeepSeek R1

DeepSeek R1 is a unique AI model that exposes its reasoning process:

from openai import OpenAI

# DeepSeek R1 모델 설정
GITHUB_PERSONAL_ACCESS_TOKEN = os.environ['GITHUB_TOKEN']
GITHUB_INFERENCE_URL = "https://models.github.ai/inference/"
GITHUB_MODEL = "deepseek/deepseek-r1"

client = OpenAI(
    api_key=GITHUB_PERSONAL_ACCESS_TOKEN,
    base_url=GITHUB_INFERENCE_URL,
)

completion = client.chat.completions.create(
    model=GITHUB_MODEL,
    messages=[
        {"role": "system", "content": "한국어로 대답해"},
        {"role": "user", "content": "에어컨 여름철 적정 온도는? 한줄로 답변해줘"},
    ],
)

print(completion.choices[0].message.content)

Output:

<think>
Okay, the user is asking about the appropriate temperature for an air conditioner
during summer, and they want a one-line answer in Korean. Let me start by recalling
the standard recommendations. I remember that energy efficiency guidelines often
recommend around 24-26 degrees Celsius. But I should also consider comfort and
health aspects. Setting it too low can cause issues like increased energy bills
and potential health problems from the temperature difference between indoors and
outdoors. Also, in Korea, there might be specific guidelines or common practices.
Let me verify if there's a commonly cited temperature in Korean sources. Yes, I
think the Korean government often suggests 26 degrees Celsius as the ideal setting
for energy conservation. So combining both efficiency and comfort, 26 degrees seems
right. I should present it concisely in one line.
</think>

여름철 에어컨 적정 온도는 에너지 효율과 쾌적함을 고려하여 26℃로 설정하는 것이 좋습니다.

What Makes DeepSeek R1 Special

The DeepSeek R1 model embeds its Chain of Thought reasoning inside <think> tags:

Understanding the question
Recalling relevant information
Considering multiple angles (energy efficiency, comfort, health, etc.)
Arriving at the final answer

This transparently shows how the AI thinks and reaches its conclusion.

API Usage Patterns

Basic Pattern

from openai import OpenAI

def create_github_client(token: str, model: str):
    """GitHub Models 클라이언트 생성"""
    return OpenAI(
        api_key=token,
        base_url="https://models.github.ai/inference/",
    )

def chat_completion(client: OpenAI, model: str, messages: list):
    """채팅 완성 요청"""
    completion = client.chat.completions.create(
        model=model,
        messages=messages,
    )
    return completion.choices[0].message.content

Streaming Responses

For long responses, you can use streaming:

completion = client.chat.completions.create(
    model=GITHUB_MODEL,
    messages=[
        {"role": "user", "content": "인공지능의 역사에 대해 설명해줘"},
    ],
    stream=True,  # 스트리밍 활성화
)

for chunk in completion:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")

Tuning Parameters

Various parameters let you control the quality of the response:

completion = client.chat.completions.create(
    model=GITHUB_MODEL,
    messages=messages,
    temperature=0.7,      # 창의성 조절 (0.0~2.0)
    max_tokens=1000,      # 최대 토큰 수
    top_p=0.9,            # 핵심 샘플링
    frequency_penalty=0,  # 반복 페널티
    presence_penalty=0,   # 주제 다양성
)

Examples of Supported Models

Some of the major models available through GitHub Models:

OpenAI Family

openai/gpt-4.1 - The latest GPT-4 model
openai/gpt-4-turbo - Faster GPT-4
openai/gpt-3.5-turbo - Lightweight model

Meta Family

meta/llama-3.1-405b-instruct - Ultra-large Llama model
meta/llama-3.1-70b-instruct - Large Llama model
meta/llama-3.1-8b-instruct - Lightweight Llama model

DeepSeek Family

deepseek/deepseek-r1 - Reasoning-exposing model
deepseek/deepseek-v3 - The latest DeepSeek model

Microsoft Family

microsoft/phi-4 - Efficient small model

Practical Use Cases

1. Code Review Assistant

def code_review(code: str):
    messages = [
        {
            "role": "system",
            "content": "당신은 경험 많은 시니어 개발자입니다. 코드를 리뷰하고 개선점을 제안하세요."
        },
        {
            "role": "user",
            "content": f"다음 코드를 리뷰해주세요:\n\n{code}"
        }
    ]

    completion = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=messages,
    )

    return completion.choices[0].message.content

2. Document Summarization

def summarize_document(text: str, lang: str = "ko"):
    messages = [
        {
            "role": "system",
            "content": f"{lang}으로 핵심 내용을 3-5개 불릿 포인트로 요약하세요."
        },
        {
            "role": "user",
            "content": text
        }
    ]

    completion = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=messages,
        temperature=0.3,  # 일관성 있는 요약을 위해 낮은 temperature
    )

    return completion.choices[0].message.content

3. Multilingual Translation

def translate(text: str, source_lang: str, target_lang: str):
    messages = [
        {
            "role": "system",
            "content": f"당신은 전문 번역가입니다. {source_lang}에서 {target_lang}로 자연스럽게 번역하세요."
        },
        {
            "role": "user",
            "content": text
        }
    ]

    completion = client.chat.completions.create(
        model="openai/gpt-4.1",
        messages=messages,
        temperature=0.3,
    )

    return completion.choices[0].message.content

Cost and Limitations

Free Tier Limits

GitHub Models is free to use, but comes with the following constraints:

Rate limiting: A cap on requests per minute
Token limits: A maximum number of tokens per day
Per-model limits: Different models may have different limits

Production Use

For real production workloads:

Treat GitHub Models as a development/testing tool
Prefer each model provider's official API for actual services
Consider paid services when high throughput is required

Best Practices

1. Error Handling

from openai import OpenAI, OpenAIError
import time

def safe_completion(client, model, messages, max_retries=3):
    """재시도 로직이 포함된 안전한 API 호출"""
    for attempt in range(max_retries):
        try:
            completion = client.chat.completions.create(
                model=model,
                messages=messages,
            )
            return completion.choices[0].message.content

        except OpenAIError as e:
            if attempt == max_retries - 1:
                raise
            print(f"오류 발생, 재시도 중... ({attempt + 1}/{max_retries})")
            time.sleep(2 ** attempt)  # Exponential backoff

2. Token Management

def count_tokens_approx(text: str) -> int:
    """대략적인 토큰 수 계산 (실제로는 tiktoken 라이브러리 사용 권장)"""
    return len(text.split()) * 1.3  # 한국어는 영어보다 토큰 수가 많음

def truncate_text(text: str, max_tokens: int = 4000):
    """텍스트를 최대 토큰 수에 맞게 자르기"""
    words = text.split()
    if count_tokens_approx(text) > max_tokens:
        # 토큰 수를 초과하면 단어 수를 줄임
        target_words = int(max_tokens / 1.3)
        return ' '.join(words[:target_words])
    return text

3. Prompt Engineering

def create_structured_prompt(task: str, context: str, constraints: str):
    """구조화된 프롬프트 생성"""
    return f"""
## 작업 (Task)
{task}

## 맥락 (Context)
{context}

## 제약사항 (Constraints)
{constraints}

위 내용을 바탕으로 응답해주세요.
"""

# 사용 예
prompt = create_structured_prompt(
    task="파이썬 함수 작성",
    context="리스트에서 중복을 제거하고 정렬된 결과를 반환",
    constraints="타입 힌트 포함, docstring 작성, 시간복잡도 O(n log n)"
)

References

Closing Thoughts

The GitHub Models Inference API is an excellent platform for testing and experimenting with a wide range of modern AI models for free. Because it is compatible with the OpenAI SDK, you can easily port existing code over and start using current models such as GPT-4.1 and DeepSeek R1 right away.

It is especially useful for comparing and evaluating different models during development, and it works well for prototyping or learning purposes. For production environments, however, it is generally recommended to use each model provider's official API.

Go ahead and build AI-powered applications more easily by tapping into the GitHub Models API!