A wide-format tech blog header image titled "Transitioning to Native Real-Time Audio." The visual features a futuristic workstation on the left connected to a glowing cloud brain icon on the right via a dynamic, flowing wave of audio frequencies and data streams. Icons for 'Low Latency,' 'End-to-End Processing,' and 'Scalable Cloud' are integrated into the data flow, representing the Azure OpenAI gpt-realtime-1.5 architecture.

Azure AI Implementation & Operations Microsoft Foundry

Hands-On with GPT-Realtime-1.5 on Azure

26.02.2026 5 Min Read

The transition from purely text-based chats to fluid, real-time audio conversations is the next massive leap in human-computer interaction. When I saw that Microsoft moved the new gpt-realtime-1.5 model out of preview and into General Availability (GA) on the Azure AI Foundry, I had to test it immediately.

Historically, developers had to rely on “chained” architectures for voice bots (Speech-to-Text -> Text-LLM -> Text-to-Speech), which inevitably caused unnatural latency. The Realtime API solves this by natively processing audio inputs and outputs end-to-end.

A technical diagram contrasting a cascading AI pipeline with a unified, native real-time audio large language model. On the left, 'Old: Chained Pipeline' shows sequential steps with microphone input processing through distinct 'Speech-to-Text,' 'Text-LLM,' and 'Text-to-Speech' stages, with explicit red clock delays between them, leading to a speaker output. On the right, 'New: Native Real-Time' depicts a 'Native Audio LLM' with high-speed horizontal data flow arrows and a lightning bolt icon, directly connected to microphone and speaker, demonstrating an end-to-end, low-latency approach. — From chained STT/TTS pipelines to native real-time audio

To demonstrate how this works in practice, I created a Python based reference project in GitHub. In this post, I will walk you through what the new 1.5 model brings to the table, the critical breaking changes you need to know for your Azure integration, and how you can run the code locally.

What’s New?

As noted by Microsoft’s Naomi Moneypenny, the upgrade to the 1.5 variant introduces several crucial improvements for production-grade voice agents:

Massive Accuracy Boosts: OpenAI and Microsoft report a +10.23% improvement in alphanumeric transcription accuracy. This is a game-changer for enterprise use cases where the agent needs to accurately capture spelled-out serial numbers, VINs, or passwords.
Smarter Reasoning & Instruction Following: The model shows a +5% lift on Big Bench Audio (which tests logical reasoning on audio inputs) and a +7% improvement in strictly following developer instructions.
More Natural Voices: The audio output is noticeably smoother with better pacing, and introduces two new standard voices, Marin and Cedar.

Setup & Critical Breaking Changes in Azure

To test the model, you need to deploy gpt-realtime-1.5-2026-02-23 in your Azure portal (currently available for global deployments in East US 2 and Sweden Central).

Retrieve your Azure Credentials

The navigation for credential retrieval has changed in the new Microsoft Foundry portal. Do not look for global project settings. Instead, follow this direct path:

Click on Start building and select Browse models from the dropdown menu.
Search the model catalog and select gpt-realtime-1.5.
Open the Details tab of your deployed model.

Here you will find two crucial values: The Target URI and the Key.

Architecture Note: Microsoft labels the endpoint as “Target URI” in the UI. This exact value must be mapped to your AZURE_OPENAI_ENDPOINT environment variable in your Python script. Your Deployment Name is the exact custom name you assigned during the initial provisioning.

A dark mode screenshot of the Microsoft Foundry portal showing the details page for a deployed gpt-realtime-1.5 model. The interface displays the Target URI, the API Key, and a deployment information section confirming a successful provisioning state. — Locating your Target URI (Endpoint) and API Key within the model details page

CRITICAL WARNING: The GA Endpoint Migration

Moving to General Availability (GA) introduces a major breaking change to the Realtime API endpoint. Microsoft has set a strict migration deadline for April 30, 2026. If you are using custom scripts, you must update your connection logic:

New Path: Your endpoint URL must now include /openai/v1. Example: https://<YOUR_RESOURCE>.openai.azure.com/openai/v1.
Drop the API Version: The previously required api-version query parameter (e.g., ?api-version=2024-10-01-preview) must be entirely removed for GA endpoints.

Architecture: How the Python Script Works

Building a fluid voice agent in Python means you have to capture microphone input, stream it to the model, and play the incoming response simultaneously—achieving true full-duplex communication. My repository handles this using the following stack:

Audio Streaming (pyaudio): Captures your raw microphone input and drives the speaker output.
WebSockets: Maintains a persistent connection to Azure OpenAI. (Architecture Tip: WebSockets are perfect for backend or server-to-server orchestration. If you are building a frontend directly in the browser, consider WebRTC instead to reduce latency and utilize built-in echo cancellation.)
Concurrency (asyncio): Asynchronous programming is not optional here. Running the send and receive tasks concurrently is what allows the system to handle user interruptions (“barge-ins”) gracefully and instantly halt playback when you speak over the agent.

The FinOps Angle: Managing Token Costs

Since we focus on enterprise Azure architecture, we have to talk about the bill. Native audio tokens are premium: the standard rate for gpt-realtime-1.5 is $32.00 per 1 million input tokens and $64.00 per 1 million output tokens.

The absolute advantage here is Prompt Caching. If you cache your static system instructions and retrieval contexts, your input costs plummet down to just $0.40 per 1 million cached input tokens. Architecturally, you must design your sessions to leverage this caching if you want to run high frequency interactions economically.

Quick Start: Try It Yourself!

Clone the repository and spin up your own voice agent in minutes:

# Clone the repo
git clone https://github.com/Frezz146/talking-with-realtime-audio-llm.git
cd talking-with-realtime-audio-llm

# Create a virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows use: .venv\Scripts\activate

# Install dependencies
pip install pyaudio openai[realtime] azure-identity

Set your Azure environment variables:

export AZURE_OPENAI_ENDPOINT="https://<YOUR_RESOURCE>.openai.azure.com"
export AZURE_OPENAI_DEPLOYMENT_NAME="gpt-realtime-1.5"
export AZURE_OPENAI_API_KEY="your-api-key"

Start the script with:

python talking-with-realtime-audio-llm.py

Live-Demo Ideas to stress-test the model:

The Alphanumeric Test: Dictate a complex Wi-Fi password or a license plate number to see the 10% accuracy bump in action.
The Interruption Test: Ask a long question and interrupt the agent mid-sentence. Watch how fast it stops the audio playback and pivots to your new intent.

Publishing the LLM: Moving to Production

Running PyAudio locally is great for testing, but it is not a deployable architecture. To publish your voice agent for actual users, you must decouple the client interface (microphone/speaker) from the LLM logic.

Alt Text: A modern 3D architectural cloud diagram illustrating a decoupled environment for native real time audio processing. It shows a user web browser frontend connected via a full duplex WebSocket to a FastAPI middleware hosted on Azure Container Apps. The middleware bridges the connection to a backend Azure OpenAI service running the gpt realtime model via a separate secure WebSocket. — Azure production architecture for low-latency native voice AI

To move this into production on Azure, follow these architectural steps:

Wrap in a FastAPI Server: Refactor your Python script to use FastAPI’s WebSocket implementation instead of local hardware audio capture. Your server will act as the middleman between the user’s browser and the Azure OpenAI endpoint.
Containerize: Write a Dockerfile to package your FastAPI application.
Deploy to Azure Container Apps (ACA): Push your container to an Azure Container Registry and deploy it via ACA. Container Apps natively support WebSocket connections, which is mandatory for the Realtime API to maintain the persistent duplex audio stream.
Frontend Client: Build a simple web frontend using the native browser Web Audio API to capture the user’s microphone, encode it to Base64 or PCM16, and send it over a WebSocket to your FastAPI backend.

Conclusion

With the GA release of gpt-realtime-1.5, Microsoft Foundry delivers a highly reliable foundation for the next generation of voice applications. The era of stitching together laggy STT and TTS pipelines is coming to an end.

Check out the full code in my repository: 👉 Frezz146/talking-with-realtime-audio-llm

If you build your own implementations, run into issues, or want to contribute a PR, let me know in the comments!

Tags:

Hands-On with GPT-Realtime-1.5 on Azure

What’s New?

Setup & Critical Breaking Changes in Azure

Retrieve your Azure Credentials

CRITICAL WARNING: The GA Endpoint Migration

Architecture: How the Python Script Works

The FinOps Angle: Managing Token Costs

Quick Start: Try It Yourself!

Publishing the LLM: Moving to Production

Conclusion

Tags:

frezz

Other Articles

Exploring Microsoft Foundry Local

No Comment! Be the first one.

Leave a Reply Cancel reply