microsoft foundry

75 Topics

Introducing Grok 4.3 on Microsoft Foundry: Latest Generation Agentic Capabilities
Customers building advanced AI systems increasingly need models that can reason deeply, act autonomously, and integrate reliably into real‑world workflows—all without compromising on governance or cost efficiency. Grok 4.3, xAI’s latest flagship model, is now available in Microsoft Foundry, giving developers and enterprises access to latest agentic intelligence within a production‑ready environment designed for scale. With Grok 4.3 on Microsoft Foundry, customers can more easily experiment with, evaluate, and deploy a powerful new option for agent‑based and domain‑specific applications—while benefiting from the safety controls, monitoring, and operational tooling needed to move from prototype to production with confidence. About Grok 4.3 Grok 4.3 is xAI’s latest flagship model, designed to support agent-based and productivity-focused workflows across a wide range of professional scenarios. Based on information provided by xAI and independent research conducted by Artificial Analysis, Grok 4.3 demonstrates strong performance across multiple benchmarks, reflecting a favorable balance between model capability and reported benchmark cost. *Benchmark data and cost metrics are provided by xAI and independently analyzed by Artificial Analysis. Source: https://artificialanalysis.ai Improved agentic capabilities Grok 4.3 is purpose‑built for agentic systems, improving in tool calling, instruction following, and lower hallucination, as reported by xAI. Grok 4.3 also enables policy‑aware support agents with reliable tool use and consistent behavior across extended conversations. On Microsoft Foundry, Grok 4.3 supports up to a 200k token context window, enabling extended multi‑turn reasoning and agent workflows. Multi-modal and domain‑specific strengths Grok 4.3 delivers strong performance across a range of professional and technical domains: Multimodal analysis: Native understanding across text, images, diagrams, and mixed data sources, enabling synthesis of visual and textual information for complex reasoning tasks. Web development: Excels in full‑stack web development, producing clean, production‑ready code with minimal guidance. Legal reasoning: supports interpretation of contracts, case law, and regulatory documents. Finance agents: supports financial analysis, modeling, and human decisions Built‑In Native Capabilities Grok 4.3 includes powerful native capabilities that simplify real‑world application development: Web search and X search for real‑time context Python code execution for analysis and automation File search (RAG) for enterprise knowledge grounding Excel, PDF, and PowerPoint generation for end‑to‑end workflows Together, these capabilities allow Grok 4.3 to function as a powerful agentic productivity engine, not just a language mode. Why Grok 4.3 on Microsoft Foundry Bringing Grok 4.3 to Microsoft Foundry delivers value beyond raw model performance. When deployed through Foundry, Azure AI Content Safety is enabled by default, adding an additional layer of protection for enterprise use. Customers can review the Microsoft Foundry model card for detailed safety and usage considerations. Microsoft Foundry also provides tools to support our customers with their responsible AI efforts, including model cards during selection, configurable guardrails such as jailbreak detection and content filtering, pre‑deployment evaluations and red teaming, and post‑deployment monitoring and governance. These capabilities help customers maintain output quality and deploy Grok 4.3 responsibly at scale. Pricing  Model   Deployment   Input/1M Tokens   Output/1M Tokens   Availability   Grok 4.3 Global Standard $1.25 $2.50 Public Preview Getting started Grok 4.3 is now available in Microsoft Foundry. Explore the model details in the Foundry model catalog, evaluate it using your own datasets, and start building and deployment in minutes.
Naomi Moneypenny
May 18, 2026 Place Microsoft Foundry Blog
512Views
0likes
0Comments
Now in Foundry: Tongyi-MAI Z-Image-Turbo, with FLUX.1-schnell and SDXL base 1.0
This week's Model Mondays edition pairs three models available through the Hugging Face collection in Microsoft Foundry: Tongyi-MAI's Z-Image-Turbo, a new designed for lower latency on a single GPU and native bilingual text rendering; Black Forest Labs' FLUX.1-schnell, a 12B rectified flow transformer distilled to 1–4 step inference and one of the most adopted open-weight image models since its 2024 release; and Stability AI's stable-diffusion-xl-base-1.0 (SDXL), a latent diffusion research model that can be used to generate and modify images based on text prompts. Models of the week Tongyi-MAI: Z-Image-Turbo Model Specs Parameters / size: 6B (BF16) Resolution: Up to 1024×1024 native Primary task: Text-to-image generation (English and Chinese) Why it's interesting (Spotlight) Scalable Single-Stream Diffusion Transformer (S3-DiT) architecture: Z-Image concatenates text tokens, visual semantic tokens, and image VAE tokens into a single unified input stream rather than running text and image through separate branches. This single-stream design can improve parameter efficiency relative to dual-stream DiT architectures at the same capacity. See the Z-Image technical report for details. 8-step inference at sub-second latency, fits in 16GB VRAM: Z-Image-Turbo is distilled with Decoupled Distribution Matching Distillation (Decoupled-DMD) and further refined with DMDR, a method that fuses DMD with reinforcement learning during post-training. The result is a model that runs 8 Number-of-Function-Evaluations (NFE) per image with no Classifier-Free Guidance (CFG)—which roughly halves the per-step compute compared to CFG-based inference. See the Decoupled-DMD and DMDR papers. Native bilingual text rendering and strong instruction adherence: Unlike most open-weight image models, which struggle with legible in-image text, Z-Image-Turbo renders complex English and Chinese text accurately which is useful for posters, signage, packaging mockups, and marketing creative. Try it Imagine you're a community programs coordinator at your city's parks department, planning a new summer event series — a "Cake Picnic in the Park" — designed to bring neighbors together over food in shared green space. The event is a few weeks out. You haven't booked bakery partners yet, so no actual cake exists, and you need marketing assets this week to start driving sign-ups: a hero image for the registration page, a flyer for community centers and libraries, social tiles for the city's channels. Use the prompt below and a photorealistic image, that can now be scaled to become additional assets like printed flyers or social images in minutes using image editing tools (or another model). Prompt: A round layered cake displayed on a white ceramic cake stand, topped with glossy fresh red cherries and smooth pastel pink buttercream frosting piped in delicate rosettes around the edge. One generous slice has been cleanly cut and removed from the front, revealing a perfect cross-section: four distinct horizontal layers alternating between soft pink sponge cake and fluffy white vanilla cream frosting. Professional bakery photography, soft natural window light from the left, shallow depth of field, marble countertop, warm and inviting atmosphere, photorealistic detail on the cake texture, cherry highlights, and frosting swirls. Black Forest Labs: FLUX.1-schnell Model Specs Parameters / size: 12B (rectified flow transformer) Resolution: Flexible up to 2 megapixels Primary task: Text-to-image generation Why it's interesting (Spotlight) Rectified flow transformer with adversarial distillation for 1–4 step inference: FLUX.1-schnell is the distilled, Apache 2.0 sibling of the FLUX.1 family. It uses a rectified flow formulation (a diffusion variant that learns straight-line probability paths between noise and data, reducing the number of solver steps needed) and is further compressed with latent adversarial diffusion distillation. The model generates high quality images in for latency-sensitive workloads. Permissive licensing for commercial use: Released under Apache 2.0, FLUX.1-schnell can be used for personal, scientific, and commercial purposes. This has driven broad adoption across product features that need an open, redistributable image backbone. Strong prompt adherence at its parameter range: At 12B parameters, FLUX.1-schnell sits between the SDXL family and frontier proprietary image models, and it remains a common reference point for evaluating open image generation prompt following—particularly for complex compositional prompts and longer captions—roughly two years after its initial release. Try it Hugging Face Spaces give developers the ability to experiment and try new models before deploying them. Test out a few prompts here: https://black-forest-labs-flux-1-schnell.hf.space then when you are ready, deploy the model in Microsoft Foundry. Stability AI: stable-diffusion-xl-base-1.0 stabilityai/stable-diffusion-xl-base-1.0 · Hugging Face Model Specs Parameters / size: 2.6B UNet (≈3.5B total with text encoders) Resolution: 1024×1024 native Primary task: Text-to-image generation Why it's interesting (Spotlight) Dual text encoder design and an ensemble-of-experts pipeline: SDXL uses two pretrained text encoders—OpenCLIP-ViT/G and CLIP-ViT/L—concatenated to capture both broad semantic alignment and finer-grained token-level cues. It can be run standalone or paired with the SDXL refiner in an ensemble-of-experts pipeline where the base model handles early denoising and the refiner specializes in the final steps. See the SDXL report for the original training and architecture details. CreativeML Open RAIL++-M licensing for managed deployments: SDXL is distributed under the CreativeML Open RAIL++-M license, which permits commercial use and downstream fine-tuning with documented use restrictions. Try it To go deeper on SDXL, take a look at Stability AI's generative-models GitHub repository, which implements the most popular diffusion frameworks for both training and inference and continues to expand with new capabilities like distillation. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry in two ways. The first by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. The second way is direct through the Hugging Face Hub, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure. Learn how to discover models and deploy them using Microsoft Foundry documentation: Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry
Osi
May 18, 2026 Place Microsoft Foundry Blog
35Views
0likes
0Comments
"Not Available in Your Region" Isn't a Dead End: A Security Assessment of Global Deployments
You want to build with the latest Microsoft Foundry model. You checked the regional availability, and it isn't there yet — only Global Standard. Now you're weighing the capability you actually need against your instinct to keep everything in a regional SKU. This post is for that moment. This is a more common situation than people realise. Microsoft typically releases new and preview models on Global first, then expands into specific regions over time as capacity is built out. It isn't an oversight. It's how Microsoft makes new capabilities available to the broadest set of customers as quickly as possible. If you want those capabilities, Global is the path. The good news is that the path is well-paved. Microsoft Foundry Global Standard is a secure, enterprise-grade deployment type backed by the same Azure controls you already rely on, with explicit contractual commitments on how your data is used. The data protection guarantees don't change because the model is newer or because regional capacity hasn't caught up — they're the same on day one of a new model on Global as they are on a model that's been deployed regionally for a year. The rest of this post walks through what Microsoft commits to, what you get out of the box, what you add on top, and the small number of cases where Global is genuinely the wrong choice. It's written for three audiences: Developers who want to know if they're allowed to ship on Global. Solution architects weighing the model choice against latency, quota, and resilience. Security architects who need to map Foundry's behaviour to enterprise controls before they sign off. Where does my data actually go? This is the question that drives most of the concern, and the answer has two parts. Mixing them up is what causes the confusion. Data at rest stays in the Azure geography of your Foundry resource. That includes your configuration, uploaded files, stored artifacts, and logs. This is true for Global deployments, exactly the same as it is for regional ones. Microsoft commits to this in the Azure data residency page. Data in processing is different. When you send a prompt, the model processes it in memory for a few hundred milliseconds and returns a response. For Global deployments, that processing can happen in any Azure region where the model is hosted. This is how Microsoft gives you the highest available capacity and the broadest model access. The prompt and response are not persisted as part of inference processing in the region that processed them. Once you separate "where my data lives" from "where the request runs," the residency picture becomes much clearer. Your customer data lives where you put it. The model that processes that data runs on Microsoft's global fleet. You can read the official description on the Microsoft Foundry deployment types page. What Microsoft commits These commitments are contractual, not marketing language — they sit inside Microsoft's Product Terms and Data Protection Addendum. According to the data privacy page for Azure Direct Models, your prompts and completions are not used to train Microsoft or OpenAI models, and your fine-tuned models are exclusively yours. Microsoft is also explicit that your data does not touch consumer OpenAI services: "Microsoft hosts the Azure Direct Models in Microsoft's Azure environment and Azure Direct Models do NOT interact with any services operated by Azure Direct Model providers, for example, OpenAI (e.g. ChatGPT, or the OpenAI API)." For partner and community models served through serverless APIs, the model catalog data privacy page confirms that those models are stateless and that Microsoft does not use prompts or outputs to train any model. What Global does NOT do A Global deployment does not replicate your stored data into other regions, does not expose your prompts to consumer OpenAI services, and does not use your inputs or outputs for training. The only cross‑region behavior is the transient execution of model inference, which is stateless and not customer‑addressable. What Global gives you on day one Before you configure anything yourself, a Global Standard deployment already includes the following: Encryption at rest using FIPS 140-2 compliant 256-bit AES with Microsoft-managed keys, applied transparently. See the Microsoft Foundry architecture page. Encryption in transit using TLS 1.2 or higher, enforced by the platform. Microsoft Entra ID authentication with Azure RBAC. Foundry separates control-plane actions (like creating deployments) from data-plane actions (like invoking models), so you can grant least privilege without writing custom roles. Tenant isolation. Your Foundry resource lives in your subscription, your data lives in your tenant, and any fine-tuned models you create are exclusively yours. Compliance inheritance. Foundry runs on Azure and inherits Azure's compliance controls, including ISO 27001, SOC 1/2/3, HIPAA, PCI DSS, FedRAMP, and many others. The current authoritative list is in the Azure compliance offerings catalogue and the Microsoft Trust Center. This baseline, with no extra configuration, already meets the security posture most enterprise teams target for new workloads. The controls you already know Securing Microsoft Foundry uses the same building blocks as securing any other Azure PaaS service. If your team already knows how to lock down Azure Storage or Azure SQL, you already know how to lock down Foundry. Developers see familiar patterns. Architects get a clean fit into the landing zone. Security architects review the same control surfaces they review elsewhere. The controls you'd apply are exactly what you'd expect: Private networking: Map the Foundry resource to a private IP using Private Link, back it with Private DNS, disable public network access, and route egress through Azure Firewall or an NVA. For agent workloads, Microsoft publishes a private networking template for Foundry Agent Service you can deploy with Bicep or Terraform. Note that Private Link secures the path to the endpoint, not the routing of requests inside the model fleet — you get a private network path without giving up Global's capacity benefits. Azure APIM GenAI gateway: Put Azure API Management's GenAI gateway in front of your Foundry Global deployments to control who can call models, how much they can use, and under what policies, independent of where inference runs. It enforces central auth, per‑consumer token limits, logging, and policy controls, turning Global deployments from “globally available” into centrally governed and auditable services. Identity and secrets: Use Managed Identity for application-to-model calls and avoid embedding API keys in code. Apply Conditional Access to admin sign-in and use Privileged Identity Management for just-in-time elevation on admin roles. Customer-managed keys: If your compliance regime requires key ownership, enable CMK on the Foundry resource via Azure Key Vault for rotation, revocation, and separation of duties. Logging and monitoring: Send diagnostics to a customer-owned Log Analytics workspace, enable the Azure Activity Log, and alert on token-usage spikes, unusual source IPs, and repeated authentication failures. Governance at scale: Use Azure Policy to enforce baselines (allowed locations, mandatory diagnostics, required private access) across your tenant, and pair it with Microsoft Defender for Cloud for continuous posture management. The risk that deserves attention: Data Exfiltration The most common security risk in any LLM deployment, on any SKU, is not Microsoft's infrastructure. It's the application layer. Examples include over-broad RAG retrieval pulling data the user shouldn't see, a tool-calling agent reaching an unintended destination, or a prompt that quietly echoes PII into a downstream log. These risks exist on Global, Data Zone, and Regional deployments equally. Choosing a more restrictive SKU does not mitigate them. The good news is that the mitigations are well understood and entirely under your control: Use Private Endpoints for Storage, AI Search, Cosmos DB, and any other backing services your application uses for RAG, so retrieval traffic stays off the public internet. For tool-calling and agent scenarios, route outbound traffic through Azure Firewall with FQDN filtering, and keep an explicit allowlist of destinations the agent is permitted to reach. Apply DLP and redaction at the application layer for high-risk data classes, before that data ever becomes part of a prompt. Treat prompts and completions as transient. Don't persist them unless you have a specific, auditable reason to do so. Doing this work on a Global deployment gives you exactly the same protection as doing it on a regional one. Is Global Deployment right for you? For most teams building on Microsoft Foundry, the answer is yes. Global Standard gives you: The highest default quotas and the broadest model availability in the catalogue. First access to new models and features, often weeks or months ahead of regional rollouts. Elastic absorption of demand spikes through Microsoft's global capacity pool. A simpler architecture, with no regional duplication or custom failover logic. The full Azure security stack: Entra ID, RBAC, Private Link, CMK, Azure Policy, Defender for Cloud, and Monitor. Contractual guarantees that your data isn't used for training and isn't shared with consumer OpenAI services. Global is not the right choice when a specific regulation explicitly requires inference processing to occur within a named country or zone. Note the word "processing" there: not data at rest, but the transient processing of the prompt itself. These cases do exist, particularly in some government, healthcare, and financial sector contexts, and Microsoft Foundry offers Data Zone (US or EU) and Regional SKUs for exactly those situations. But unless someone has pointed you at a specific clause in a specific regulation that names processing locality, you most likely don't need to step down from Global. Summary Microsoft Foundry Global deployments are secure, compliant, and enterprise‑ready. Data at rest remains in your chosen Azure geography. Prompts and completions are not used for training and do not interact with consumer AI services. Encryption, identity, networking, logging, governance, and monitoring are all first‑class Azure controls. Modified Abuse Monitoring is available for qualifying enterprise customers where required. A short summary for each audience: Developers: you can build on Global with confidence, using the Azure patterns you already know. Solution architects: Global is a sensible default unless a regulatory requirement specifically rules it out. Data Zone and Regional remain available for the cases that need them. Security architects: the control surfaces are familiar, the contractual commitments are explicit, and Global can be approved on the same basis as any other Azure PaaS service handling equivalent data classifications. If you've been defaulting to a regional SKU "just to be safe," it's worth taking a fresh look at whether Global actually fits your workload. In most cases, it will.
shikhasinha
May 18, 2026 Place Microsoft Foundry Blog
91Views
0likes
0Comments
A New Chapter for Realtime AI: Reasoning, Translation, and Real-Time Transcription
Voice can be one of the most direct and productive interfaces for AI — enabling customer support agents that may resolve issues without a single keystroke, live multilingual communication that can take on language barriers as conversations happen, and voice assistants capable of reasoning through complex requests in real time. Developers building these experiences need models that can keep pace with increasingly demanding latency, accuracy, and language coverage requirements. Today, OpenAI’s GPT-realtime-translate, GPT‑realtime‑2 and, GPT-realtime-whisper are rolling out into Microsoft Foundry starting today — together representing a significant step forward for the realtime model lineup available to developers on the platform. GPT-realtime-translate and GPT-realtime-whisper GPT-realtime-translate and GPT-realtime-whisper together extend the realtime stack for live multilingual audio workflows. GPT-realtime-translate is built for continuous, real-time translation, producing translated output as speech unfolds without relying on segmented pipeline processing, while GPT-realtime-whisper provides low-latency streaming transcription of the original audio in parallel. Used together, they help developers support scenarios such as live events, cross-language customer experiences, captions, monitoring, and archival workflows that require both translated output and visibility into the source speech. Continuous stream processing: This new model translates live audio without segmenting or buffering allowing for more natural interactions. New translation and transcription capabilities: Translate between languages in real time and observe faster text to speech. Available via the Realtime API GPT-realtime-2 GPT‑realtime‑2 is a generational upgrade to OpenAI's speech-to-speech model, bringing internal reasoning and an expanded context window to real-time voice applications. Where previous speech to speech models responded immediately, GPT‑realtime‑2 can work through a problem before speaking — making it well suited for voice applications that need to handle complex, multi-step queries entirely in the audio layer without routing to a separate text pipeline. Native reasoning capability: The newest realtime model introduces stronger reasoning capabilities. Now the model thinks internally before responding. Adjustable reasoning effort via {reasoning.effort}: Explicitly request the level of reasoning the model uses -- minimal, low, medium, high – to save on cost and latency. Audio in, audio out: No need for an intermediary text step, conversation stays fluid and natural. Available via the Realtime API This models is coming soon to Microsoft Foundry. Since, May 6, the models have been rolling out into the model catalog. We are excited for you to explore and build with our evolving collection of frontier models. Use cases These models work independently, but they're designed to complement each other in real-world pipelines: Live multilingual events. GPT-realtime-translate enables real-time translation of live audio, producing translated speech along with a transcript in the target language. GPT‑realtime‑whisper can be used in parallel to capture a transcription of the original speech for captions, monitoring, or archival purposes. Together, they enable multilingual live streaming with both translated experiences and visibility into the source language. Global customer support. Route inbound calls through GPT-realtime-translate to translate conversations in real time and provide a translated transcript for agents. Use GPT‑realtime‑whisper alongside it to capture the original conversation as text for compliance, quality review, or analytics. Then pass the interaction to an agent built with GPT‑realtime‑2 using {reasoning.effort}: high for complex issue resolution, all within a continuous audio pipeline. International voice assistants. Build once and deploy across languages. GPT-realtime-translate enables multilingual interaction and provides translated output with a target-language transcript, while GPT‑realtime‑whisper can optionally capture the original user input as text. GPT‑realtime‑2 manages reasoning and conversational context, supporting more complex voice interactions. Pricing Model Deployment Modality Pricing per 1M tokens Input Cached Input Output GPT-realtime-2 Global Standard Audio $32.00 $0.40 $64.00 Text $4.00 $0.40 $24.00 Image $5.00 $0.50 -- GPT-realtime-translate Global Standard Audio -- -- $2.04/hour GPT-realtime-whisper Global Standard Audio -- -- $1.02/hour *Pricing for GPT-realtime-translate and GPT-realtime-whisper will be done by the hour Getting Started Looking for ways to dive in? GPT-realtime-translate, GPT-realtime-whisper, and GPT‑realtime‑2 are rolling out into Microsoft Foundry today. Explore the model catalog and start building: https://ai.azure.com
Naomi Moneypenny
May 13, 2026 Place Microsoft Foundry Blog
4.5KViews
1like
5Comments
Building the Solution Teams Need to Secure AI Against Prompt Injection
As artificial intelligence continues to evolve, teams are prioritising rapid advancements and deployment of applications while often overlooking security considerations. Emerging threats such as prompt injection remain poorly understood, and this is putting systems, users, and infrastructure at serious risk. Much of the expertise required to mitigate these risks is currently fragmented and inaccessible, concentrated among a small group of cybersecurity specialists. Meanwhile, developers, under pressure to ship quickly, often lack both the tools and frameworks needed to systematically test their AI systems for vulnerabilities. This disconnect is creating a significant gap between the development and security assurance of AI applications. To address this gap, we developed a unified Prompt Injection Testing Platform and knowledge base, powered by Microsoft Foundry, designed to make LLM security testing accessible, structured, and understandable for developers. Project Overview Developers are rapidly integrating LLMs and agents into applications, but: Security testing is not standardised Prompt injection risks are increasingly understood in research, but poorly mitigated in practice by developers There is a lack of accessible, actionable tooling This creates a dangerous gap: applications are being deployed faster than they are being secured. As part of our UCL Industry Exchange Network (IXN) project in collaboration with Avanade, we built a Prompt Injection Testing Platform designed to solve this exact issue by: Providing a knowledge base of vulnerabilities and mitigations Helping teams identify vulnerabilities within their AI systems Enabling custom and automated testing pipelines Integrating tools like Garak for adversarial testing With this, we aim to make prompt injection testing accessible, standard, and understandable. Project Journey We divided our project into several phases: Phase 1: Understanding our Users’ Needs. We began by identifying the core users of our platform: AI developers and broader stakeholders across development, security, and safety disciplines integrating LLMs into their applications. By meeting with them, we uncovered a few key challenges: Developers have limited awareness of prompt injection risks There is a generalised lack of accessible tools for testing This first exploration set a core principle: We must build a developer-first solution which does not depend on extensive technical knowledge to be used. We concluded that to be as useful as possible, our solution should not require prior prompt injection knowledge. In order to solve the two challenges presented by our users, we concluded a platform would be the best approach, as it enables us to centralise fragmented knowledge while providing a structured, scalable environment for testing LLM vulnerabilities in practice. Phase 2: Understanding the Threat Landscape Building on our user research, we focused on developing a deep understanding of the prompt injection threat landscape to inform the design of our platform. This phase involved researching: Different types of prompt injection vulnerabilities Common attack scenarios and override techniques Existing mitigation strategies used in practice Tools and methodologies for prompt injection security testing The most widely used models to ensure our platform would be compatible with real-world systems. We consolidated these findings into a structured technical report, designed to be shared with developers, security testers, and semi-technical stakeholders. The goal was not only to guide our own implementation, but also to contribute to making prompt injection more standard and understandable. From our research, we realised prompt injection is not a single vulnerability, but a rapidly evolving attack surface that requires continuous, scalable testing rather than one-time validation. Phase 3: Building the Platform Guided by both our user insights and the threat landscape analysis, we moved to designing and developing a unified prompt injection testing platform and knowledge base. To do this, we defined three core principles: Developer first: no deep security knowledge would be required Unified: combines education (knowledge base) and execution (testing tools) Scalable: Expert users could extend the platform by bringing their own models, tests, and mitigations. During this stage, we built a platform which allows teams to: Connect their own LLM endpoints Run custom prompt injection tests Execute automated adversarial testing through Garak Access a centralised knowledge base of vulnerabilities and mitigation strategies. Export knowledge base information and test results as PDFs. By the end, we had developed a unified platform that enables developers to systematically test, understand, and mitigate prompt injection vulnerabilities in their AI applications. To understand how our platform works in practice, you can view our demo video. Platform home interface presenting an overview of prompt injection concepts and a structured vulnerability catalogue for exploring attack types and mitigation strategies. Key Features Model Integration and Configuration Users can use models included in the platform or connect their own LLM endpoints, allowing the platform to work across different providers: Supports multiple model providers through Microsoft Foundry Supports custom model integration via HTTP endpoints Enables model configurations such as custom system prompts and mitigation layers. Ensures flexibility as new models and mitigations emerge Testing Suite The platform allows users to create and run custom prompt injection tests tailored to their applications. This involves: Creating and executing targeted prompts Simulating real-world attack scenarios Running predefined adversarial testing suites (integrating NVIDIA Garak) Testing interface showing configuration of prompt injection tests and execution of automated scans, with results and risk evaluation displayed. Knowledge Base A core component of our platform is a structured knowledge base, which is designed to make prompt injection concepts accessible and understandable. This is divided into two key areas: Vulnerabilities: Provides information on different types of prompt injection attacks, including explanations of how each vulnerability works, with real-world examples and scenarios, as well as references to reputable external sources Mitigations: Focuses on how to defend against these vulnerabilities, and it includes clear implementation strategies and code examples demonstrating how to integrate each mitigation. To support exploration, we also included a chatbot interface, which answers questions using knowledge base data and trusted sources. This helps users quickly navigate vulnerabilities and mitigation strategies by providing contextual, reliable information and redirecting users to the appropriate page of our platform. Figure 3: Direct prompt injection analysis view, where users can explore attack techniques, observe unsafe model responses, and review corresponding mitigation approaches. Prompt Enhancer In addition to testing and learning, our platform integrates a prompt enhancer, designed to help users actively improve the security of their system prompts. It works in the following way: Takes an existing prompt as input Draws on the knowledge base insights and best practices Restructures the prompt to improve clarity and robustness Incorporates selected prompt-layer mitigations to reduce prompt injection risk Prompt Enhancer interface showing the application of prompt-layer mitigations (e.g. delimiter tokens, instruction hierarchy enforcement) to restructure and secure a system prompt against prompt injection attacks. Technical Details To support a flexible and scalable testing system, we designed our platform with a modular, layered architecture. This allows different components to operate independently while remaining integrated through clearly defined interfaces, ensuring both extensibility and maintainability. System Architecture We divided our platform into four main layers: Frontend Layer An interactive user interface that allows developers to: Explore the prompt injection knowledge base Configure and run tests View results and vulnerability analysis API Layer The API layer acts as the orchestration and communication layer between the frontend and the core system. Handles requests from the frontend to create and run tests. Provides frontend with available models, mitigations, and configurations. Ensures any newly added models and mitigations can be automatically reflected in the frontend without requiring manual updates. Domain Layer The layer which defines the core structure and logic of the system: Defines interfaces for key components such as mitigations, models, and test runners Establishes the test structure and data models Encapsulates logic to ensure consistency Integration Layer The layer which implements the abstractions defined in the domain layer and connects the platform to external services Implements model providers such as OpenAI, Anthropic, and other external HTTP-based endpoints Implements test runners, including custom prompt runners and external tools such as Garak. Implements database connections and repository classes. Results and Outcomes Through the research and development of our platform, we were able to gain several key insights into the behaviour and security of LLM-based applications: Prompt injection vulnerabilities are more prevalent than expected. Even simple prompts with carefully crafted inputs can unsafely manipulate a model’s behaviour. Lack of structured testing leads to hidden risks. Without a systematic approach, many vulnerabilities remain undetected. It is sometimes time consuming to manually craft unsafe prompts. Combining custom testing with framework-based testing improves coverage. Using both custom prompts (targeted and application-specific scenarios) and framework-driven testing (e.g. Garak) enables a more comprehensive evaluation of model safety, as both expected and unexpected vulnerabilities can be captured Structured prompts can significantly improve robustness. We observed that prompts with a clear structure and embedded mitigations are less susceptible to injection attacks. By the end of our project, we successfully developed a platform that: Bridges the gap between prompt injection knowledge and practical testing. Enables repeatable and structured testing of prompt injection vulnerabilities Provides a unified workflow for learning, testing, and improving prompt security. Supports multiple models and testing approaches, to cover the entire vulnerability safety. We demonstrated that prompt injection risks can be systematically identified, tested, and mitigated through a structured and repeatable approach. Lessons Learned Throughout the project, we identified several key insights that shaped both our technical approach and our understanding of AI security. AI is rapidly evolving, and systems must be designed accordingly. AI models and attack techniques are advancing extremely fast. As a result, static solutions are quickly becoming obsolete. We learned that it is essential to design a platform that is modular, extensible and adaptable. Through well-defined interfaces and generic services, we ensured our platform can evolve alongside attacks and mitigations. Security must be built into development, not considered at testing. Many developers are focusing on functionality first and security often takes a backseat. In the context of LLMs, vulnerabilities can fundamentally affect the security of the system and its users. As such, security should be treated as a core part of the development cycle. Models and external tools should only be connected if their safety is guaranteed. Bridging the gap between developers and security testers is necessary. We identified a major disconnect between developers building AI applications and the security testers evaluating them. These groups often operate with different priorities and levels of knowledge. We are bridging this gap by making prompt injection knowledge more accessible and creating workflows that are usable by developers while still grounded in robust security practices. Further Development While our platform provides a strong foundation for prompt injection testing and knowledge, there are several areas for future exploration: Expanding our testing framework integrations, by adding a broader coverage of attack techniques Integration with MCP servers and external systems, supporting interactions with tools, APIs and external data sources. Addressing additional indirect prompt injection vulnerabilities, including file uploads, website scraping, and multi-step workflows. Looking ahead, we also aim to integrate our platform more deeply into development workflows by introducing CI/CD integrations for continuous security testing and versioned tracking of model robustness over time. Our goal is to evolve the platform into a comprehensive security layer, capable of testing entire AI-driven systems in dynamic, real-world contexts. Conclusion As AI becomes increasingly integrated into real-world applications, ensuring their security is essential. As our research highlights, current practices have not kept pace with the rapid evolution of AI systems and attack techniques. Through our work, we demonstrated that prompt injection risks can be systematically identified, tested, and mitigated using a structured approach. By combining a unified knowledge base with a flexible testing platform powered by Microsoft Foundry, we are taking a step towards making AI systems safer and more reliable. More importantly, our project reinforces a broader idea: a developer-first approach to security, supported by collaboration across development, security, and safety disciplines, is essential for building AI at scale. Security should not remain confined to specialist teams but should be embedded directly into the development process, alongside practices such as red-teaming and continuous testing. Our project empowers teams with the knowledge and tools they need to build safer and more reliable AI systems. If you’re interested in building more secure AI systems or exploring prompt injection in practice, we invite you to join us through the Foundry Community on the 3rd of June at 2pm BST, when we will be showcasing our platform live, walking through real-world examples, and discussing how teams can integrate prompt injection testing into their development workflows. Team Teo Montero Bonet, UCL Computer Science Mario Mojarro Ruiz, UCL Computer Science David Thomas Garcia, UCL Computer Science Nathaniel Gibbon, UCL Computer Science With support from Josh McDonald, Avanade
teo-montero
May 13, 2026 Place Educator Developer Blog
185Views
0likes
0Comments
Now in Foundry: IBM Granite 4.1, NVIDIA Nemotron Nano Omni, and Qwen3.6-35B-A3B
This week Microsoft Foundry adds two major model families alongside a reasoning powerhouse that spans the full spectrum from specialized speech and vision to general-purpose coding and long-context analysis. IBM's Granite 4.1 is a famiily of 10: six LLMs across 3B, 8B, and 30B sizes in both full-precision and FP8 variants, plus a safety model, a vision-language model for document extraction, and a multilingual speech recognition model. NVIDIA's Nemotron-3-Nano-Omni-30B-A3B-Reasoning brings multimodal capability—video, audio, image, and text—to a 31B Mamba2-Transformer Hybrid Mixture-of-Experts (MoE) architecture that activates only 3B parameters per forward pass; three variants are available in Foundry (BF16, FP8, and NVFP4), with the FP8 variant featured here. Qwen3.6-35B-A3B is designed for agentic coding among open models, with thinking preservation across conversation turns and a context window extensible to 1 million tokens. Models of the week IBM: granite-4.1-30b Model Specs Parameters / size: 30B (flagship of the Granite 4.1 family) Context length: 131,072 tokens Primary task: Text generation (multilingual instruction following, RAG, tool calling, code, summarization) Why it's interesting The Granite 4.1 release brings 10 models to Microsoft Foundry. The LLM lineup covers granite-4.1-3b-instruct, granite-4.1-8b-instruct, and granite-4.1-30b-instruct with FP8 variants for each, plus granite-guardian-4.1-8b for safety, granite-vision-4.1-4b for document and chart understanding, and granite-speech-4.1-2b for multilingual speech recognition. This is a deployment-ready stack where teams can mix and match model sizes and modalities from a single provider. Strong instruction following and reasoning at the 30B scale: granite-4.1-30b-instruct scores 80.16 on MMLU, 64.09 on MMLU-Pro, 83.74 on Big-Bench Hard (BBH), 77.80 on AGI Eval, 45.76 on GPQA (Graduate-Level Google-Proof Q&A, a graduate-level science reasoning benchmark), and 89.65 average on IFEval (instruction following). These results reflect SFT and reinforcement learning post-training focused specifically on instruction compliance, tool calling accuracy, and long-context retrieval. (View benchmarks here) Enhanced tool calling and 12-language support: Granite 4.1 models are trained for structured function calling and support 12 languages—Arabic, Chinese, Czech, Dutch, English, French, German, Italian, Japanese, Korean, Portuguese, and Spanish—with dialog, extraction, and summarization capabilities. Safety and multimodal coverage within the same family: The inclusion of granite-guardian-4.1-8b (a safety classifier for detecting harmful content and prompt injections), granite-vision-4.1-4b (a Vision Language Model optimized for document extraction from PDFs, charts, and tables), and granite-speech-4.1-2b (a 2B multilingual Automatic Speech Recognition model) means teams can address safety, document parsing, and audio ingestion within the same model family—reducing integration complexity across a full pipeline. Try it Use Case Prompt Pattern Multilingual RAG Submit retrieved document passages in any of 12 supported languages; ask model to synthesize and cite sources Agentic tool calling Provide function definitions + user goal; model plans and executes tool calls in structured format Document extraction (granite-vision-4.1-4b) Submit PDF page image; extract tables, key figures, or form fields as structured JSON Safety classification (granite-guardian-4.1-8b) Pass user input or model output; receive structured risk assessment before serving response Sample prompt for an enterprise document processing deployment: You are building a multilingual document intelligence pipeline for a global financial institution. Using the granite-4.1-30b-instruct endpoint deployed in Microsoft Foundry, submit each incoming policy or regulatory document with the following system instruction: "You are a compliance analysis assistant. Review the document and extract: (1) all regulatory requirements described, (2) the entities to which each requirement applies, (3) any compliance deadlines mentioned, and (4) any penalties or consequences for non-compliance. Return the output as a structured JSON array with one entry per requirement." For documents that include scanned pages, first route them through granite-vision-4.1-4b to extract text and table content before passing to the 30B model for compliance analysis. Pass all user-facing outputs through granite-guardian-4.1-8b to screen for sensitive information before returning results. NVIDIA: Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8 Model Specs Parameters / size: 31B total, ~3B activated per forward pass (Mamba2-Transformer Hybrid Mixture-of-Experts) Context length: 256,000 tokens Primary task: Video-audio-image-text-to-text (Multimodal understanding, reasoning, tool calling) Why it's interesting Multimodal input from a single efficient endpoint: Nemotron-3-Nano-Omni-30B-A3B-Reasoning supports video (up to 2 minutes), audio (up to 1 hour), images (RGB), and text—all from a single model deployed in Microsoft Foundry. Three variants are available in Foundry: full-precision BF16, FP8, and NVFP4. Paper: Nemotron Nano Omni technical report. Strong results across vision, document, video, and audio benchmarks: With reasoning mode enabled, the model scores 82.8 on MathVista-MINI (visual math reasoning), 67.04 on OCRBenchV2-EN (document OCR), 63.6 on Charxiv Reasoning (chart understanding), 72.2 on Video MME (video Q&A), 74.52 on Daily Omni (video+audio omnimodal understanding), and 89.39 on VoiceBench (speech instruction following). On OSWorld (GUI agent benchmark measuring autonomous computer use), it scores 47.4—a notable result for a model at the 3B active parameter scale. (Please see above model cards for further benchmark data) Mamba2-Transformer Hybrid MoE for efficient long-context inference: The model's layers alternate between Mamba2 state-space blocks (which process sequences with linear rather than quadratic cost) and standard Transformer attention blocks, combined with Mixture-of-Experts feedforward layers. Only ~3B parameters are activated per token despite the 31B total, making the 256K context window practically usable at lower compute cost than a comparably sized dense model. Word-level timestamps, JSON output, and tool calling for structured media workflows: The model produces word-level timestamps from audio, enabling precise transcript-to-timecode alignment for review and indexing workflows. Combined with JSON-structured output, chain-of-thought reasoning, and native tool calling, it can serve as an agentic step that ingests raw media (meeting recordings, M&E assets, training videos) and produces structured outputs for downstream systems without requiring separate transcription or OCR preprocessing stages. Try it Use Case Prompt Pattern Meeting intelligence Submit audio recording (up to 1 hr); extract transcript with word-level timestamps, action items, and decisions as structured JSON Video content analysis Submit video clip (up to 2 min) + query; retrieve timestamped summary of key events or spoken content Document + audio joint analysis Submit scanned document image alongside narrated walkthrough audio; extract and reconcile information from both modalities Multimodal tool calling Provide tool definitions + combined image/audio input; model reasons over content and executes structured tool calls Sample prompt for a media and compliance deployment: You are building a broadcast compliance review system for a media company. Using the Nemotron-3-Nano-Omni-30B-A3B-Reasoning-FP8 endpoint deployed in Microsoft Foundry, submit each recorded segment as video input with the following instruction: "Review this video segment and produce a compliance report as a JSON object with the following fields: transcript (full text with word-level timestamps), flagged_segments (array of objects with start_time, end_time, content, and reason for flagging), speaker_count (estimated number of distinct speakers), and compliance_summary (overall assessment). Flag any content that includes unverified factual claims, restricted product categories, or regulatory disclosures that may be incomplete." Use the word-level timestamps from the compliance report to route flagged segments directly to the editorial review queue with precise timecode references. Qwen: Qwen3.6-35B-A3B Model Specs Parameters / size: 35B total, 3B activated (Mixture-of-Experts) Context length: 262,144 tokens natively, extensible to 1,010,000 tokens Primary task: Image-text-to-text (agentic coding, reasoning, vision) Why it's interesting Agentic coding improvements over Qwen3.5-35B-A3B: Qwen3.6-35B-A3B scores 73.4 on SWE-bench Verified (vs. 70.0 for Qwen3.5-35B-A3B and 52.0 for Gemma 4 31B), 67.2 on SWE-bench Multilingual (vs. 60.3 and 51.7), and 49.5 on SWE-bench Pro (vs. 44.6 and 35.7). Terminal-Bench 2.0 reaches 51.5 (vs. 40.5 and 42.9). The update targets frontend workflows and repository-level reasoning specifically, areas where earlier Qwen3.5 iterations showed gaps. Blog post: Qwen3.6-35B-A3B. Hybrid architecture: Gated DeltaNet and Mixture-of-Experts: The model's 40 layers alternate between Gated DeltaNet blocks (a form of linear attention that avoids the quadratic cost of standard self-attention), Gated Attention blocks (using Grouped Query Attention with 16 query heads and 2 key-value heads), and Mixture-of-Experts (MoE) feedforward layers with 256 experts (8 routed + 1 shared active per token). Only 3B parameters are activated per forward pass, keeping inference cost comparable to a 3B dense model while retaining the capacity of a 35B model for knowledge and specialization. Thinking preservation across conversation turns: Qwen3.6 introduces an option to retain reasoning context from previous messages in multi-turn conversations. In prior models, chain-of-thought traces were stripped between turns, requiring the model to re-derive context it had already reasoned through. With thinking preservation enabled, iterative coding workflows—such as debugging across multiple exchanges—benefit from accumulated reasoning without repeating earlier analysis. Natively extensible to 1 million token context: The 262K native context is already among the largest in open models at this size, and the architecture supports extension to 1,010,000 tokens. On GPQA Diamond (science reasoning), Qwen3.6-35B-A3B scores 86.0—above both Gemma 4 31B (84.3) and Qwen3.5-27B (85.5)—while matching Gemma 4 31B on MMLU Pro (85.2) and LiveCodeBench v6 (80.4 vs. 80.0). Try it Use Case Prompt Pattern Repository-level code change Provide repository structure + task description; model plans file edits and outputs unified diff Multi-turn iterative debugging Enable thinking preservation; submit failing test + code across multiple turns; accumulate reasoning context Frontend code generation Provide design spec or screenshot + existing codebase context; generate component implementation Long-document reasoning Submit technical specification (up to 262K tokens); ask model to identify ambiguities or implementation gaps Sample prompt for a software engineering deployment: You are building an automated code review and implementation assistant for a platform engineering team. Using the Qwen3.6-35B-A3B endpoint deployed in Microsoft Foundry, enable thinking preservation for multi-turn sessions. In the first turn, submit the repository file tree and a GitHub issue describing a required API endpoint change. Prompt the model: "Review the repository structure and describe your implementation plan, including which files need to change and why." In the second turn, submit the relevant source files and prompt: "Based on your earlier plan, implement the changes and produce a unified diff." In the third turn, submit the test suite and prompt: "Write additional unit tests for the new endpoint, covering edge cases identified in your reasoning." The thinking preservation feature ensures the model carries forward its understanding of the codebase across all three turns without re-explaining context. Getting started You can deploy open-source Hugging Face models directly in Microsoft Foundry by browsing the Hugging Face collection in the Foundry model catalog and deploying to managed endpoints in just a few clicks. You can also start from the Hugging Face Hub. First, select any supported model and then choose "Deploy on Microsoft Foundry", which brings you straight into Azure with secure, scalable inference already configured. Learn how to discover models and deploy them using Microsoft Foundry documentation. Follow along the Model Mondays series and access the GitHub to stay up to date on the latest Read Hugging Face on Azure docs Learn about one-click deployments from the Hugging Face Hub on Microsoft Foundry Explore models in Microsoft Foundry
Osi
May 06, 2026 Place Microsoft Foundry Blog
413Views
0likes
0Comments
Introducing OpenAI's newest chat model in Microsoft Foundry
OpenAI's GPT-5.5 Instant (or Chat-latest in the API) begins rolling out in Microsoft Foundry today as GPT-chat-latest. Built on GPT-5.4 and GPT-5.3-chat, the new model delivers measurable gains in factual accuracy, tool calling, and response efficiency. These improvements translate directly into more reliable production deployments. GPT-chat-latest is designed for the workflows builders are actually shipping: multi-turn assistants, agentic systems that orchestrate tools, and retrieval-grounded applications where precision and grounding matter as much as conversational quality. Why the name is changing In Microsoft Foundry, we are introducing GPT-chat-latest as the product name for this release, while the model continues to follow the existing Preview lifecycle and standard notice periods. We are also evaluating ways to simplify how customers access continuously updated models over time, but current behavior remains unchanged as that work continue Smarter, more factually reliable GPT-chat-latest closes the factuality gap from prior iterations with significant reductions in hallucinations, especially in domains where accuracy matters most. According to OpenAI, the new model produces 52.5% fewer hallucinations and reduces hallucinated claims by 37.3% on conversations previously flagged for factual errors when compared to GPT-5.3-chat. These gains extend beyond text. GPT-chat-latest shows improvements in visual reasoning, expert multimodal understanding, and STEM tasks, with measurable lifts across standard benchmarks: Benchmark GPT-5.3-chat GPT-chat-latest CharXiv-reasoning Scientific Chart Reasoning 75.0 81.6 MMMU-Pro Expert multimodal reasoning 69.2 76.0 GPQA PhD-level science questions 78.5 85.6 AIME 2025 Competition math 65.4 81.2 *Data shown comes from OpenAI’s testing” For builders shipping into regulated workloads such as clinical decision support, legal research, financial advisory, and technical analysis, these improvements raise the bar on the kinds of applications GPT-chat-latest can assist with. More efficient outputs GPT-chat-latest produces responses that may be more to-the-point without losing substance. The model may reduce verbosity and over formatting, ask fewer follow-up questions, and avoid cluttered output patterns that often require post-processing in production UIs. For builders, this can translate to two concrete benefits: lower output token costs at scale, and cleaner responses that drop into product surfaces with less downstream cleanup. In comparative testing from OpenAI, GPT-chat-latest produced roughly 25–30% fewer words than GPT-5.3-chat across a range of common prompts while preserving response quality, and in many cases improving it. Improving intelligence and tool calling GPT-chat-latest introduces measurable improvements in how the model interacts with tools, including better judgment about when and how to invoke them. The model produces more structured and context-aware tool invocation outputs, which is particularly relevant for workflows that rely on function calling, retrieval-augmented generation, and multi-step reasoning. Equally important, the model is better at deciding whether a tool is needed in the first place, reducing unnecessary tool calls in scenarios where it already has the information to answer directly. Improved search and context handling GPT-chat-latest includes targeted improvements to how the model retrieves, interprets, and synthesizes information when search is involved, with enhancements to query formulation, result ranking, and filtering, plus more grounded synthesis of retrieved content into final responses. These changes improve handling of ambiguous or underspecified queries and reduce noise in answers that depend on retrieved content. The model also makes better use of the context developers pass in, including system prompts, conversation history, retrieved documents, and structured data. Applications that maintain long-running state or stitch together multiple retrieval steps produce more coherent, context-aware outputs without developers having to over-engineer prompt scaffolding. Use Cases: When to choose the chat model Developers typically choose a chat-optimized model like GPT-5.5-chat when the application needs to sustain multi-turn conversations while reliably following instructions and coordinating external tools. This is a fit for assistants and agentic workflows where the model must interpret user intent over time, decide when to retrieve additional context, and produce structured outputs for downstream systems rather than just generate free-form text. Customer support and contact centers: virtual agents that maintain conversational context across a case, retrieve policy or product documentation via search, and hand off to a ticketing or CRM system through tool calls when escalation is needed. Retail and e-commerce: shopping and service assistants that clarify preferences over multiple turns, reference catalogs and policies via retrieval, and generate structured actions such as returns, exchanges, and order lookups through integrated tools. Manufacturing and field service: technician-facing assistants that combine conversational guidance with retrieval of manuals and work instructions, plus structured task creation in maintenance systems. Use GPT-chat-latest Use GPT-5.5 Reasoning Multi-turn assistants and customer-facing chat experiences Harder problems that benefit from more deliberate, step-by-step thinking Agentic workflows that coordinate tools (search, retrieval, ticketing, CRM) and benefit from structured tool outputs Complex analysis, planning, or decision support where correctness matters more than conversational flow Interactive experiences where you want quick back-and-forth clarification and task completion Tasks involving multi-constraint reasoning (policy interpretation, detailed requirements, long-horizon plans) RAG-based apps where the model must decide when to retrieve and then synthesize grounded answers Offline or low-tool scenarios where the main value is deeper reasoning over provided context Pricing Model Input ($/1M tokens) Cached input ($/1M tokens) Output ($/1M tokens) GPT-chat-latest $5 $0.50 $30 Responsible AI in Microsoft Foundry At Microsoft, our mission to empower people and organizations remains constant. In the age of AI, trust is foundational to adoption, and earning that trust requires a commitment to transparency, safety, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy models responsibly in production environments, aligned with Microsoft's Responsible AI principles. Getting started GPT-chat-latest is rolling out in Microsoft Foundry today.
Naomi Moneypenny
May 05, 2026 Place Microsoft Foundry Blog
4.2KViews
1like
0Comments
Introducing OpenAI's GPT-image-2 in Microsoft Foundry
Take a small design team running a global social campaign. They have the creative vision to produce localized imagery for every market, but not the resources to reshoot, reformat, or outsource that scale. Every asset needs to fit a different platform, a different dimension, a different cultural context, and they all need to ship at the same time. This is where flexible image generation comes in handy. OpenAI's GPT-image-2 is now generally available and rolling out today to Microsoft Foundry, introducing a step change in image generation. Developers and designers now get more control over image output, so a small team can execute with the reach and flexibility of a much larger one. What is new in GPT-image-2? GPT-image-2 brings real world intelligence, multilingual understanding, improved instruction following, increased resolution support, and an intelligent routing layer giving developers the tools to scale image generation for production workflows. Real world intelligence GPT-image-2 has a knowledge cut off of December 2025, meaning that it is able to give you more contextually relevant and accurate outputs. The model also comes with enhanced thinking capabilities that allow it to search the web, check its own outputs, and create multiple images from just one prompt. These enhancements shift image generation models away from being simple tools and runs them into creative sidekicks. Multilingual understanding GPT-image-2 includes increased language support across Japanese, Korean, Chinese, Hindi, and Bengali, as well as new thinking capabilities. This means the model can create images and render text that feels localized. Increased resolution support GPT-image-2 introduces 4K resolution support, giving developers the ability to generate rich, detailed, and photorealistic images at custom dimensions. Resolution guidelines to keep in mind: Constraint Detail Total pixel budget Maximum pixels in final image cannot exceed 8,294,400 Minimum pixels in final image cannot be less than 655,360 Requests exceeding this are automatically resized to fit. Resolutions 4K, 1024x1024, 1536x1024, and 1024x1536 Dimension alignment Each dimension must be a multiple of 16 Note: If your requested resolution exceeds the pixel budget, the service will automatically resize it down. Intelligent routing layer GPT-image-2 also includes an expanded routing layer with two distinct modes, allowing the service to intelligently select the right generation configuration for a request without requiring an explicitly set size value. Mode 1 — Legacy size selection In Mode 1, the routing layer selects one of the three legacy size tiers to use for generation: Size tier Description smimage Small image output image Standard image output xlimage Large image output This mode is useful for teams already familiar with the legacy size tiers who want to benefit from automatic selection without making any manual changes. Mode 2 — Token size bucket selection In Mode 2, the routing layer selects from six token size buckets — 16, 24, 36, 48, 64, 96 — which map roughly to the legacy size tiers: Token bucket Approximate legacy size 16, 24 smimage 36, 48 image 64, 96 xlimage This approach can allow for more flexibility in the number of tokens generated, which in turn helps to better optimize output quality and efficiency for a given prompt. See it in action GPT-image-2 shows improved image fidelity across visual styles, generating more detailed and refined images. But, don’t just take our word for it, let's see the model in action with a few prompts and edits. Here is the example we used: Prompt: Interior of an empty subway car (no people). Wide-angle view looking down the aisle. Clean, modern subway car with seats, poles, route map strip, and ad frames above the windows. Realistic lighting with a slight cool fluorescent tone, realistic materials (metal poles, vinyl seats, textured floor). As you can see, when using the same base prompt, the image quality and realism improved with each model. Now let’s take a look at adding incremental changes to the same image: Prompt: Populate the ad frames with a cohesive ad campaign for “Zava Flower Delivery” and use an array of flower types. And our subway is now full of ads for the new ZAVA flower delivery service. Let's ask for another small change: Prompt: In all Zava Flower Delivery advertisements, change the flowers shown to roses (red and pink roses). And in three simple prompts, we've created a mockup of a flower delivery ad. From marketing material to website creation to UX design, GPT-image-2 now allows developers to deliver production-grade assets for real business use cases. Image generation across industries These new capabilities open the door to richer, more production-ready image generation workflows across a range of enterprise scenarios: Retail & e-commerce: Generate product imagery at exact platform-required dimensions, from square thumbnails to wide banners, without post-processing. Marketing: Produce crisp, rich in color campaign visuals and social assets localized to different markets. Media & entertainment: Generate storyboard panels and scene at resolutions suited to production pipelines. Education & training: Create visual learning aids and course materials formatted to exact display requirements across devices. UI/UX design: Accelerate mockup and prototype workflows by generating interface assets at the precise dimensions your design system requires. Trust and safety At Microsoft, our mission to empower people and organizations remains constant. As part of this commitment, models made available through Foundry undergo internal reviews and are deployed with safeguards designed to support responsible use at scale. Learn more about responsible AI at Microsoft. For GPT-image-2, Microsoft applied an in-depth safety approach that addresses disallowed content and misuse while maintaining human oversight. The deployment combines OpenAI’s image generation safety mitigations with Azure AI Content Safety, including filters and classifiers for sensitive content. Pricing Model Offer type Pricing - Image Pricing - Text GPT-image-2 Standard Global Input Tokens: $8 Cached Input Tokens: $2 Output Tokens: $30 Input Tokens: $5 Cached Input Tokens: $1.25 Note: All prices are per 1M token. There is no billing for output tokens for the GPT-image-2 model. Getting started Whether you’re building a personalized retail experience, automating visual content pipelines or accelerating design workflows. GPT-image-2 gives your team the resolution control and intelligent routing to generate images that fit your exact needs. Try the GPT-image-2 in Microsoft Foundry today! Deploy the model in Microsoft Foundry Experiment with the model in the Image playground Read the documentation to learn more
Naomi Moneypenny
May 04, 2026 Place Microsoft Foundry Blog
14KViews
3likes
3Comments
Gemma 4 now available in Microsoft Foundry
Experimenting with open-source models has become a core part of how innovative AI teams stay competitive: experimenting with the latest architectures and often fine-tuning on proprietary data to achieve lower latencies and cost. Today, we’re happy to announce that the Gemma 4 family, Google DeepMind’s newest model family, is now available in Microsoft Foundry via the Hugging Face collection. Azure customers can now discover, evaluate, and deploy Gemma 4 inside their Azure environment with the same policies they rely on for every other workload. Foundry is the only hyperscaler platform where developers can access OpenAI, Anthropic, Gemma, and over 11,000+ models under a single control plane. Through our close collaboration with Hugging Face, Gemma 4 joining that collection continues Microsoft’s push to bring customers the widest selection of models from any cloud – and fits in line with our enhanced investments in open-source development. Frontier Intelligence, open-source weights Released by Google DeepMind on April 2, 2026, Gemma 4 is built from the same research foundation as Gemini 3 and packaged as open weights under an Apache 2.0 license. Key capabilities across the Gemma 4 family: Native multimodal: Text + image + video inputs across all sizes; analyze video by processing sequences of frames; audio input on edge models (E2B, E4B) Enhanced reasoning & coding capabilities: Multi-step planning, deep logic, and improvements in math and instruction-following enabling autonomous agents Trained for global deployment: Pretrained on 140+ languages with support for 35+ languages out of the box Long context: Context windows of up to 128K tokens (E2B/E4B) and 256K tokens (26B A4B/31B) allow developers to reason across extensive codebases, lengthy documents, or multi-session histories Why choose Foundry? Foundry is built to give developers breadth -- access to models from major model providers, open and proprietary, under one roof. Stay within Azure to work leading models. When you deploy through Foundry, models run inside your Azure environment and are subject to the same network policies, identity controls, and audit processes your organization already has in place. Managed online endpoints handle serving, scaling, and monitoring without manually setting up and managing the underlying infrastructure. Serverless deployment with Azure Container Apps allows developers to deploy and run containerized applications while reducing infrastructure management and saving costs. Gated model access integrates directly with Hugging Face user tokens, so models that require license acceptance stay compliant can be accessed without manual approvals. Foundry Local lets you run optimized Hugging Face models directly on your own hardware using the same model catalog and SDK patterns as your cloud deployments. Read the documentation here: https://aka.ms/foundrylocal and https://aka.ms/HF/foundrylocal Microsoft’s approach to Responsible AI is grounded in our AI principles of fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability. Microsoft Foundry provides governance controls, monitoring, and evaluation capabilities to help organizations deploy new models responsibly in production environments. What are teams building with Gemma 4 in Foundry Gemma 4’s combination of multimodal input, agentic function calling, and long context offers a wide range of production use cases: Document intelligence: Processing PDFs, charts, invoices, and complex tables using native vision capabilities Multilingual enterprise apps: 140+ natively trained languages — ideal for multinational customer support, content platforms as well as language learning tools for grammar correction and writing practice Long-context analytics: Reasoning across entire codebases, legal documents, or multi-session conversation histories Getting started Try Gemma 4 in Microsoft Foundry today. New models from Hugging Face continue to roll out to Foundry on a regular basis through our ongoing collaboration. If there's a model you want to see added, let us know here. Stay connected to our developer community on Discord and stay up to date on what is new in Foundry through the Model Mondays series.
vaidyas
Apr 14, 2026 Place Microsoft Foundry Blog
2.2KViews
1like
0Comments
Bringing GigaTIME to Microsoft Foundry: Unlocking Tumor Microenvironment Insights with Multimodal AI
Expanding Microsoft Foundry for Scientific AI Workloads AI is increasingly being applied to model complex real-world systems, from climate and industrial processes to human biology. In healthcare, one of the biggest challenges is translating routinely available data into deeper biological insight at scale. GigaTIME is now available in Microsoft Foundry, bringing advanced multimodal capabilities to healthcare and life sciences. This brings advanced multimodal capabilities into Foundry, with Foundry Labs enabling early exploration and the Foundry platform supporting scalable deployment aligned to the model’s intended use. Read on to understand how GigaTIME works and how you can start exploring it in Foundry. From Routine Slides to Deep Biological Insight Understanding how tumors interact with the immune system is central to modern cancer research. While techniques like multiplex immunofluorescence provide these insights, they are expensive and difficult to scale across large patient populations. GigaTIME addresses this challenge by translating widely available hematoxylin and eosin pathology slides into spatially resolved protein activation maps. This allows researchers to infer biological signals such as immune activity, tumor growth, and cellular interactions at a much deeper level. Developed in collaboration with Providence and the University of Washington, GigaTIME enables analysis of tumor microenvironments across diverse cancer types, helping accelerate discovery and improve understanding of disease biology. Use cases for GigaTIME GigaTIME is designed to support research and evaluation workflows across a range of real-world scientific scenarios: Population-Scale Tumor Microenvironment Analysis: Enable large-scale analysis of tumor–immune interactions by generating virtual multiplex immunofluorescence outputs from routine pathology slides. Biomarker Association Discovery: Identify relationships between protein activation patterns and clinical attributes such as mutations, biomarkers, and disease characteristics. Patient Stratification and Cohort Analysis: Segment patient populations across cancer types and subtypes using spatial and combinatorial signals for research and hypothesis generation. Clinical Trial Retrospective Analysis: Apply GigaTIME to H&E archives from completed clinical trials to retrospectively characterize tumor microenvironment features associated with treatment outcomes, enabling new insights from existing trial data without additional tissue processing. Tumor–Immune Interaction Analysis: Assess whether immune cells are infiltrating tumor regions or being excluded by analyzing spatial relationships between tumor and immune signals. Immune System Structure Characterization: Understand how immune cell populations are organized within tissue to evaluate coordination or fragmentation of immune response. Immune Checkpoint Context Interpretation: Examine how immune activity may be locally regulated by analyzing overlap between immune markers and checkpoint signals. Tumor Proliferation Analysis: Identify actively growing tumor regions by combining proliferation signals with tumor localization. Stromal and Vascular Context Understanding: Analyze how tissue architecture, such as vascular density and desmoplastic stroma, shapes immune cell access to tumor regions, helping characterize mechanisms of immune exclusion or infiltration. Start with Exploration, Then Go Deeper Explore in Foundry Labs Foundry Labs provides a lightweight environment for early exploration of emerging AI capabilities. It allows developers and researchers to quickly understand how models like GigaTIME behave before integrating them into production workflows. With GigaTIME in Foundry Labs, you can engage with real-world healthcare scenarios and explore how multimodal models translate pathology data into meaningful biological insight. Through curated experiences, you can: Run inference on pathology images Visualize spatial protein activation patterns Explore tumor and immune interactions in context This helps you build intuition and evaluate how the model can be applied to your specific use cases. Go Deeper with GitHub Examples For advanced scenarios, you can access the underlying notebooks and workflows via GitHub. These examples provide flexibility to customize pipelines, extend workflows, and integrate GigaTIME into broader research and application environments. Together, Foundry Labs and GitHub provide a path from guided exploration to deeper customization. Discover and Deploy GigaTIME in Microsoft Foundry Discover in the Foundry Catalog GigaTIME is available in the Foundry model catalog alongside a growing set of domain-specific models across healthcare, geospatial intelligence, physical systems and more. Deploy for Research Workflows For advanced usage, GigaTIME can be deployed as an endpoint within Foundry to support research and evaluation workflows such as: Biomarker discovery Patient stratification Clinical research pipelines You can start with early exploration in Foundry Labs and transition to scalable deployment on the Foundry platform using the tools and workflows designed for each stage, in line with the intended use of the model. A New Class of AI for Scientific Discovery GigaTIME reflects a broader shift toward AI systems designed to model real-world phenomena. These systems are multimodal, deeply tied to domain-specific data, and designed to produce spatial and contextual outputs. They rely on workflows that combine data processing, model inference, and interpretation, which requires platforms that support the full lifecycle from exploration to production. Microsoft Foundry is built to support this evolution. Learn More and Get Started To explore GigaTIME in more detail: Read the Microsoft Research blog on the underlying research, and population-scale findings. Try hands-on scenarios in Foundry Labs Access GitHub examples for advanced workflows Explore and deploy the model through the Foundry catalog Looking Ahead As AI continues to expand into domains like healthcare, climate science, and industrial systems, the ability to connect models, data, and workflows becomes increasingly important. GigaTIME highlights what is possible when these elements come together, transforming routinely available data into actionable scientific insight. We are excited to see what you build next.
Saumil-Shrivastava
Apr 14, 2026 Place Microsoft Foundry Blog
264Views
0likes
0Comments

HTTPS · techcommunity.microsoft.com

← Home