• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
Sas 101

Sas 101

Master the Art of Building Profitable Software

  • Home
  • Terms of Service (TOS)
  • Privacy Policy
  • About Us
  • Contact Us
  • Show Search
Hide Search

.NET Core

Powerful Headless Architectures & API-First Development with .NET

UnknownX · January 13, 2026 · Leave a Comment







 

Building Production-Ready Headless Architectures with API-First .NET

Executive Summary

Modern applications demand flexibility across web, mobile, IoT, and partner integrations, but traditional monoliths couple your business logic to specific frontends. Headless architectures solve this by creating a single, authoritative API-first backend that decouples your core domain from presentation layers. We’re building a scalable e-commerce catalog API using ASP.NET Core Minimal APIs, Entity Framework Core, and modern C#—ready for React, Next.js, Blazor, or native mobile apps. This approach delivers consistent data, independent scaling, and team velocity in production environments.

Prerequisites

  • .NET 9 SDK (latest LTS)
  • SQL Server (LocalDB for dev, or Docker container)
  • Visual Studio 2022 or VS Code with C# Dev Kit
  • Postman or Swagger for API testing
  • NuGet packages (installed via CLI below):
    dotnet new console -n HeadlessCatalogApi
    cd HeadlessCatalogApi
    dotnet add package Microsoft.EntityFrameworkCore.SqlServer
    dotnet add package Microsoft.EntityFrameworkCore.Design
    dotnet add package Microsoft.AspNetCore.OpenApi
    dotnet add package Microsoft.AspNetCore.Authentication.JwtBearer
    dotnet add package System.Text.Json

Step-by-Step Implementation

Step 1: Define Your Domain Models with API-First Contracts

Start with immutable records using primary constructors—the foundation of our headless backend. These represent your authoritative data contracts.

public record Product(
    Guid Id,
    string Name,
    string Description,
    decimal Price,
    int StockQuantity,
    ProductCategory Category,
    DateTime CreatedAt);

public record ProductCategory(Guid Id, string Name);

public record CreateProductRequest(
    string Name, 
    string Description, 
    decimal Price, 
    int StockQuantity,
    Guid CategoryId);

public record UpdateProductRequest(
    string? Name = null,
    string? Description = null,
    decimal? Price = null,
    int? StockQuantity = null);

Step 2: Set Up Data Layer with EF Core

Create a DbContext optimized for read-heavy headless APIs. Use owned types and JSON columns for flexibility.

public class CatalogDbContext : DbContext
{
    public DbSet<Product> Products { get; set; }
    public DbSet<ProductCategory> Categories { get; set; }

    public CatalogDbContext(DbContextOptions<CatalogDbContext> options) : base(options) { }

    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<Product>(entity =>
        {
            entity.HasKey(p => p.Id);
            entity.Property(p => p.Name).HasMaxLength(200).IsRequired();
            entity.HasIndex(p => p.Name).IsUnique();
            entity.HasOne<ProductCategory>().WithMany().HasForeignKey(p => p.Category.Id);
        });

        modelBuilder.Entity<ProductCategory>(entity =>
        {
            entity.HasKey(c => c.Id);
            entity.Property(c => c.Name).HasMaxLength(100).IsRequired();
        });

        // Seed data
        modelBuilder.Entity<ProductCategory>().HasData(
            new ProductCategory(Guid.NewGuid(), "Electronics"),
            new ProductCategory(Guid.NewGuid(), "Books")
        );
    }
}

Step 3: Build Minimal API Endpoints

Replace Program.cs with our API-first program. Use route groups, endpoint filters, and result types for clean, production-ready APIs.

using Microsoft.EntityFrameworkCore;

var builder = WebApplication.CreateSlimBuilder(args);

builder.AddSqlServerDbContext<CatalogDbContext>(conn =>
    conn.ConnectionString = "Server=(localdb)\\mssqllocaldb;Database=HeadlessCatalog;");

var app = builder.Build();

// Swagger for API documentation
app.MapSwagger();

var apiGroup = app.MapGroup("/api/v1").WithTags("Products");

// GET /api/v1/products?categoryId={guid}&minPrice=10&maxPrice=100&page=1&pageSize=20
apiGroup.MapGet("/products", async (CatalogDbContext db, 
    Guid? categoryId, decimal? minPrice, decimal? maxPrice, 
    int page = 1, int pageSize = 20) =>
{
    var query = db.Products.AsQueryable();

    if (categoryId.HasValue) query = query.Where(p => p.Category.Id == categoryId.Value);
    if (minPrice.HasValue) query = query.Where(p => p.Price >= minPrice.Value);
    if (maxPrice.HasValue) query = query.Where(p => p.Price <= maxPrice.Value);

    var total = await query.CountAsync();
    var products = await query
        .OrderBy(p => p.Name)
        .Skip((page - 1) * pageSize)
        .Take(pageSize)
        .ToListAsync();

    return Results.Ok(new { Items = products, Total = total, Page = page, PageSize = pageSize });
});

// POST /api/v1/products
apiGroup.MapPost("/products", async (CatalogDbContext db, CreateProductRequest request) =>
{
    var category = await db.Categories.FindAsync(request.CategoryId);
    if (category == null) return Results.BadRequest("Invalid category");

    var product = new Product(Guid.NewGuid(), request.Name, request.Description, 
        request.Price, request.StockQuantity, category, DateTime.UtcNow);
    
    db.Products.Add(product);
    await db.SaveChangesAsync();

    return Results.Created($"/api/v1/products/{product.Id}", product);
});

// PUT /api/v1/products/{id}
apiGroup.MapPut("/products/{id}", async (CatalogDbContext db, Guid id, UpdateProductRequest request) =>
{
    var product = await db.Products.FindAsync(id);
    if (product == null) return Results.NotFound();

    if (request.Name != null) product = product with { Name = request.Name };
    if (request.Description != null) product = product with { Description = request.Description };
    if (request.Price.HasValue) product = product with { Price = request.Price.Value };
    if (request.StockQuantity.HasValue) product = product with { StockQuantity = request.StockQuantity.Value };

    db.Products.Update(product);
    await db.SaveChangesAsync();

    return Results.NoContent();
});

app.Run();

Step 4: Add Authentication and Authorization

Secure your headless API with JWT. Add to Program.cs before building:

builder.Services.AddAuthentication(JwtBearerDefaults.AuthenticationScheme)
    .AddJwtBearer(options =>
    {
        options.TokenValidationParameters = new()
        {
            ValidateIssuer = true,
            ValidateAudience = true,
            ValidateLifetime = true,
            ValidateIssuerSigningKey = true,
            ValidIssuer = "headless-api",
            ValidAudience = "headless-client",
            IssuerSigningKey = new SymmetricSecurityKey(Encoding.UTF8.GetBytes("your-super-secret-key-min-256-bits"))
        };
    });

builder.Services.AddAuthorization();

// Protect endpoints
apiGroup.RequireAuthorization("ApiScope");

Step 5: Run and Test

dotnet ef database update
dotnet run

Test in Swagger at https://localhost:5001/swagger or Postman. Your frontend now consumes /api/v1/products consistently.

Production-Ready C# Examples

Here’s an optimized query handler using spans and interceptors for caching (add Microsoft.Extensions.Caching.Memory):

[Cacheable(60)] // Custom interceptor attribute
public static async ValueTask<List<Product>> GetFeaturedProductsAsync(
    CatalogDbContext db, ReadOnlySpan<Guid> categoryIds)
{
    return await db.Products
        .Where(p => categoryIds.Contains(p.Category.Id))
        .Where(p => p.StockQuantity > 0)
        .Take(10)
        .ToListAsync();
}

Common Pitfalls & Troubleshooting

  • N+1 Queries: Always use Include() or projection: db.Products.Select(p => new { p.Name, Category = p.Category.Name })
  • Idempotency: Use Etag headers or client-generated IDs for PUT/POST.
  • CORS Issues: app.UseCors(policy => policy.AllowAnyOrigin().AllowAnyMethod().AllowAnyHeader()); (restrict in prod).
  • JSON Serialization: Configure builder.Services.ConfigureHttpJsonOptions(opt => opt.SerializerOptions.PropertyNamingPolicy = JsonNamingPolicy.CamelCase);
  • DbContext Lifetime: Use AddDbContextFactory for background services.

Performance & Scalability Considerations

  • Pagination: Always implement cursor-based or offset pagination with total counts.
  • Caching: Output caching on GET endpoints: .CacheOutput(expiration: TimeSpan.FromMinutes(5)).
  • Async Everything: Use IAsyncEnumerable for streaming large result sets.
  • Rate Limiting: builder.Services.AddRateLimiter(options => options.GlobalLimiter = PartitionedRateLimiter.Create<HttpContext, string>(...)).
  • Horizontal Scaling: Deploy to Kubernetes with Dapr for service mesh, or Azure App Service with autoscaling.
  • Database: Read replicas for queries, sharding by tenant ID for multi-tenant.

Practical Best Practices

  • API Versioning: Use route prefixes /api/v1/, /api/v2/ with OpenAPI docs per version.
  • Validation: FluentValidation pipelines: apiGroup.AddEndpointFilter(ValidationFilter.Default);
  • Testing: Integration tests with Testcontainers: dotnet test -- TestServer.
  • Monitoring: OpenTelemetry for traces/metrics, Serilog for structured logging.
  • GraphQL Option: Add HotChocolate for flexible queries alongside REST.
  • Event-Driven: Use MassTransit for domain events (ProductStockLow → NotifyWarehouse).

Conclusion

You now have a battle-tested headless API backend serving consistent data to any frontend. Next steps: integrate GraphQL, add real-time subscriptions with SignalR, deploy to Kubernetes, or build a Blazor frontend consuming your API. Commit this to Git and iterate—your architecture scales from startup to enterprise.

FAQs

1. Should I use REST or GraphQL for headless APIs?

REST for simple CRUD with fixed payloads; GraphQL when clients need flexible, over/under-fetching control. Start REST, add GraphQL later via HotChocolate.

2. How do I handle file uploads in headless APIs?

Use IBrowserFile or multipart/form-data, store in Azure Blob/CDN, return signed URLs. Never store binaries in your DB.

3. What’s the best auth for public headless APIs?

JWT with refresh tokens for users, API keys with rate limits for public endpoints, mTLS for B2B partners.

4. How to implement search in my catalog API?

Integrate Elasticsearch or Azure Cognitive Search. Expose /api/v1/products/search?q=iphone&filters=category:electronics.

5. Can I mix Minimal APIs with Controllers?

Yes—use Minimal for public/query APIs (fast), Controllers for complex POST/PUT with model binding.

6. How to version my API without breaking clients?

SemVer in routes (/v1/), additive changes only, deprecate with ApiDeprecated attribute and 12-month notice.

7. What’s the migration path from MVC monolith?

Extract domain to shared library, build API layer first, proxy MVC to API during transition, then retire MVC.

8. How do I secure preview/draft content?

Signed JWT tokens with preview: true claim, validate on API with role checks.

9. Performance: When to use compiled queries?

Always for frequent, parameterless queries. EF’s CompileAsyncQuery gives 2-5x speedup.

10. Multi-tenancy in headless APIs?

Tenant ID in JWT claims or header, partition DB by TenantId, use policies: .RequireAssertion(ctx => ctx.User.HasClaim("tenant", tenantId)).



“`

You might like these topics

AI-Native .NET: Building Intelligent Applications with Azure OpenAI, Semantic Kernel, and ML.NET

AI-Augmented .NET Backends: Building Intelligent, Agentic APIs with ASP.NET Core and Azure OpenAI

Master Effortless Cloud-Native .NET Microservices Using DAPR, gRPC & Azure Kubernetes Service

AI-Driven Development in ASP.NET Core

UnknownX · January 12, 2026 · Leave a Comment

 

Building AI-Driven ASP.NET Core APIs: Hands-On Guide for .NET Developers

 

 

Executive Summary

AI-Driven Development in ASP.NET Core

– In modern enterprise applications, AI transforms static APIs into intelligent systems that analyze user feedback, generate personalized content, and automate decision-making. This guide builds a production-ready Feedback Analysis API that uses OpenAI’s GPT-4o-mini to categorize customer feedback, extract sentiment, and suggest actionable insights—solving real-world problems like manual review bottlenecks while ensuring scalability and security for enterprise deployments.

Prerequisites

  • .NET 10 SDK (latest stable)
  • Visual Studio 2022 or VS Code with C# Dev Kit
  • OpenAI API key (get from platform.openai.com)
  • NuGet packages: OpenAI, Microsoft.Extensions.Http, Microsoft.EntityFrameworkCore.Sqlite

Run these commands to scaffold the project:

dotnet new webapi -o AiFeedbackApi --use-program-main
cd AiFeedbackApi
dotnet add package OpenAI --prerelease
dotnet add package Microsoft.EntityFrameworkCore.Sqlite
dotnet add package Microsoft.EntityFrameworkCore.Design
code .

Step-by-Step Implementation

Step 1: Configure AI Settings Securely

Add your OpenAI key to appsettings.json using User Secrets in development:

// appsettings.json
{
  "AI": {
    "OpenAI": {
      "ApiKey": "your-api-key-here",
      "Model": "gpt-4o-mini"
    }
  },
  "ConnectionStrings": {
    "Default": "Data Source=feedback.db"
  }
}

Step 2: Create the Domain Model with Primary Constructors

Define our feedback entity using modern C# 13 primary constructors:

// Models/FeedbackItem.cs
public class FeedbackItem(int id, string text, string category, double sentimentScore)
{
    public int Id { get; } = id;
    public required string Text { get; init; } = text;
    public string Category { get; set; } = category;
    public double SentimentScore { get; set; } = sentimentScore;
    
    public FeedbackItem() : this(0, string.Empty, string.Empty, 0) { }
}

Step 3: Build the AI Analysis Service

Create a robust, typed AI service using the official OpenAI client and HttpClientFactory fallback:

// Services/IAiFeedbackAnalyzer.cs
public interface IAiFeedbackAnalyzer
{
    Task<(string Category, double SentimentScore)> AnalyzeAsync(string feedbackText);
}

// Services/AiFeedbackAnalyzer.cs
using OpenAI.Chat;
using OpenAI;

public class AiFeedbackAnalyzer(OpenAIClient client, IConfiguration config) : IAiFeedbackAnalyzer
{
    private readonly ChatClient _chatClient = client.GetChatClient(config["AI:OpenAI:Model"] ?? "gpt-4o-mini");
    
    public async Task<(string Category, double SentimentScore)> AnalyzeAsync(string feedbackText)
    {
        var messages = new List
        {
            new SystemChatMessage("""
                Analyze customer feedback and respond ONLY with JSON:
                {"category": "positive|negative|neutral|suggestion|bug", "sentiment": 0.0-1.0}
                Categories: positive, negative, neutral, suggestion, bug.
                Sentiment: 1.0 = very positive, 0.0 = very negative.
                """),
            new UserChatMessage(feedbackText)
        };
        
        var response = await _chatClient.CompleteChatAsync(messages);
        var jsonResponse = response.Value.Content[0].Text;
        
        // Parse structured JSON response safely
        using var doc = JsonDocument.Parse(jsonResponse);
        var category = doc.RootElement.GetProperty("category").GetString() ?? "neutral";
        var sentiment = doc.RootElement.GetProperty("sentiment").GetDouble();
        
        return (category, sentiment);
    }
}

Step 4: Set Up Dependency Injection and DbContext

Register services in Program.cs with minimal APIs:

// Program.cs
using Microsoft.EntityFrameworkCore;
using OpenAI;

var builder = WebApplication.CreateBuilder(args);

var apiKey = builder.Configuration["AI:OpenAI:ApiKey"] 
    ?? throw new InvalidOperationException("OpenAI ApiKey is required");

builder.Services.AddOpenAIClient(apiKey);
builder.Services.AddScoped<IAiFeedbackAnalyzer, AiFeedbackAnalyzer>();
builder.Services.AddDbContext(options =>
    options.UseSqlite(builder.Configuration.GetConnectionString("Default")));

builder.Services.AddEndpointsApiExplorer();
builder.Services.AddSwaggerGen();

var app = builder.Build();

if (app.Environment.IsDevelopment())
{
    app.UseSwagger();
    app.UseSwaggerUI();
}

app.UseHttpsRedirection();
app.MapFallback(() => Results.NotFound());

app.Run();

// AppDbContext.cs
public class AppDbContext(DbContextOptions options) : DbContext(options)
{
    public DbSet<FeedbackItem> FeedbackItems { get; set; } = null!;
    
    protected override void OnModelCreating(ModelBuilder modelBuilder)
    {
        modelBuilder.Entity<FeedbackItem>(entity =>
        {
            entity.HasKey(e => e.Id);
            entity.Property(e => e.Category).HasMaxLength(50);
        });
    }
}

Step 5: Implement Minimal API Endpoints

Add intelligent endpoints that process feedback in real-time:

// Add to Program.cs after app.Build()

app.MapPost("/api/feedback/analyze", async (IAiFeedbackAnalyzer analyzer, [FromBody] string text) =>
{
    var (category, sentiment) = await analyzer.AnalyzeAsync(text);
    return Results.Ok(new { Category = category, SentimentScore = sentiment });
});

app.MapPost("/api/feedback", async (AppDbContext db, IAiFeedbackAnalyzer analyzer, [FromBody] string text) =>
{
    var (category, sentiment) = await analyzer.AnalyzeAsync(text);
    var feedback = new FeedbackItem(0, text, category, sentiment);
    
    db.FeedbackItems.Add(feedback);
    await db.SaveChangesAsync();
    
    return Results.Created($"/api/feedback/{feedback.Id}", feedback);
});

app.MapGet("/api/feedback/stats", async (AppDbContext db) =>
    Results.Ok(await db.FeedbackItems
        .GroupBy(f => f.Category)
        .Select(g => new { Category = g.Key, Count = g.Count(), AvgSentiment = g.Average(f => f.SentimentScore) })
        .ToListAsync()));

Step 6: Test Your AI API

Run dotnet run and test with Swagger or curl:

curl -X POST "https://localhost:5001/api/feedback/analyze" \
  -H "Content-Type: application/json" \
  -d '"The UI is intuitive and fast!"'
  
# Response: {"category":"positive","sentimentScore":0.92}

Production-Ready C# Examples

Here’s our complete, optimized controller alternative using primary constructors and source generators:

[ApiController]
[Route("api/v1/[controller]")]
public class FeedbackController(AppDbContext db, IAiFeedbackAnalyzer analyzer) : ControllerBase
{
    [HttpPost]
    public async Task<IActionResult> AnalyzeAndStore([FromBody] AnalyzeRequest request)
    {
        ArgumentNullException.ThrowIfNull(request.Text);
        
        var (category, sentiment) = await analyzer.AnalyzeAsync(request.Text);
        
        var item = new FeedbackItem(0, request.Text, category, sentiment);
        db.FeedbackItems.Add(item);
        await db.SaveChangesAsync();
        
        return CreatedAtAction(nameof(GetById), new { id = item.Id }, item);
    }
    
    [HttpGet("{id:int}")]
    public async Task<IActionResult> GetById(int id) =>
        await db.FeedbackItems.FindAsync(id) is { } item 
            ? Ok(item) 
            : NotFound();
}

public record AnalyzeRequest(string Text);

Common Pitfalls & Troubleshooting

  • API Key Leaks: Never commit keys—use dotnet user-secrets and Azure Key Vault in prod.
  • Rate Limits: Implement Polly retry policies: AddHttpClient().AddPolicyHandler(...).
  • JSON Parsing Failures: Always validate AI responses with JsonDocument and provide fallbacks.
  • Cold Starts: Pre-warm AI clients in IHostedService.
  • Token Limits: Truncate long inputs: text[..Math.Min(4000, text.Length)].

Performance & Scalability Considerations

  • Caching: Cache frequent analysis patterns with IMemoryCache (TTL: 5min).
  • Background Processing: Use IBackgroundService + Channels for batch analysis.
  • Distributed Tracing: Integrate OpenTelemetry for AI call monitoring.
  • Model Routing: Abstract IAiProvider to switch between OpenAI, Azure OpenAI, or local models.
  • Horizontal Scaling: Stateless services + Redis for shared cache/state.

Practical Best Practices

  • Always implement structured prompting with system messages for consistent JSON output.
  • Use record types for requests/responses to leverage source generators.
  • Implement circuit breakers for AI dependencies using Polly.
  • Add unit tests mocking OpenAIClient with Moq.
  • Log AI requests/responses (anonymized) with Serilog for model improvement.
  • Enable streaming responses for long completions: chatClient.CompleteChatStreamingAsync().

Conclusion

You’ve built a production-grade AI-powered Feedback API that scales from MVP to enterprise. Next steps: integrate with Blazor frontend, add RAG with vector databases, or deploy to Azure Container Apps with auto-scaling. Keep iterating—AI development is about continuous improvement.

FAQs

1. How do I handle OpenAI rate limits in production?

Implement exponential backoff with Polly:

services.AddHttpClient<IAiFeedbackAnalyzer>().AddPolicyHandler(
    Policy.HandleResult<HttpResponseMessage>(r => !r.IsSuccessStatusCode)
          .WaitAndRetryAsync(3, retry => TimeSpan.FromSeconds(Math.Pow(2, retry)))); 

2. Can I use local ML.NET models instead of OpenAI?

Yes! Create IAiFeedbackAnalyzer implementations for both and use DI feature flags to switch.

3. How do I secure the AI endpoints?

Add [Authorize] with JWT, rate limiting via AspNetCoreRateLimit, and validate inputs with FluentValidation.

4. What’s the cost of GPT-4o-mini for 1M feedbacks?

~400 input tokens per analysis × $0.15/1M tokens = ~$0.10 per 1K analyses. Cache aggressively.

5. How do I add streaming AI responses?

Use CompleteChatStreamingAsync() and Response.StartAsync() for real-time UI updates.

6. Can I deploy this to Azure?

Perfect for Azure Container Apps + Azure OpenAI (same SDK). Use Managed Identity for keys.

7. How do I test AI responses deterministically?

Mock OpenAIClient or use fixed prompt responses in integration tests.

8. What’s the latency impact of AI calls?

200-800ms per call. Use async/await everywhere and consider client-side caching.




You might also like these

AI-Native .NET: Building Intelligent Applications with Azure OpenAI, Semantic Kernel, and ML.NET

AI-Augmented .NET Backends: Building Intelligent, Agentic APIs with ASP.NET Core and Azure OpenAI

Master Effortless Cloud-Native .NET Microservices Using DAPR, gRPC & Azure Kubernetes Service

Build an AI chat app with .NET (Microsoft Learn) — Quickstart showing how to use OpenAI or Azure OpenAI models with .NET.
🔗 https://learn.microsoft.com/en-us/dotnet/ai/quickstarts/build-chat-app

Develop .NET apps with AI features (Microsoft Learn) — Overview of AI integration in .NET apps (APIs, services, tooling).
🔗 https://learn.microsoft.com/en-us/dotnet/ai/overview

AI-Powered Group Chat sample with SignalR + OpenAI (Microsoft Learn) — Demonstrates real-time chat with AI in an ASP.NET Core app.
🔗 https://learn.microsoft.com/en-us/aspnet/core/tutorials/ai-powered-group-chat/ai-powered-group-chat?view=aspnetcore-9.0

Powerful AI-First .NET Backend Engineering for High-Throughput APIs (ONNX, Vector Search, Semantic Features)

UnknownX · January 11, 2026 · Leave a Comment

AI-First .NET 8 Backend for High-Throughput Semantic APIs (ONNX, Vector Search, Embeddings)

Executive Summary

AI-First .NET backend – We’ll build a high-throughput, AI-first .NET 8 backend that:

  • Uses an ONNX embedding model to convert text into vectors.
  • Stores those vectors in a vector database (e.g., Qdrant/pgvector or in-memory for demo).
  • Exposes production-ready HTTP APIs for semantic search, recommendations, and similarity matching.
  • Is implemented in modern C# (records, minimal APIs, DI, async, efficient memory usage).

This solves a real production problem: how to serve semantic capabilities (search, RAG, personalization, anomaly detection) from your existing .NET services without routing every request through a cloud LLM provider. You get:

  • Low latency: ONNX Runtime is highly optimized and runs in-process.
  • Cost control: Once the model is deployed, inference cost is predictable.
  • Data control: Vectors and documents stay inside your infrastructure.
  • Composable APIs: You can layer semantic features into any bounded context.

Prerequisites

Tools & Runtime

  • .NET 8 SDK installed.
  • Visual Studio 2022 / Rider / VS Code with C# extension.
  • ONNX Runtime available as a NuGet package.
  • Optionally: a running Qdrant or PostgreSQL + pgvector instance.

NuGet Packages

In your Web API project, add:

  • Microsoft.ML.OnnxRuntime – core ONNX inference.
  • Microsoft.ML.OnnxRuntime.Managed – CPU-only runtime (simpler deployment) or provider-specific packages if you want GPU.
  • System.Text.Json – built-in, but we’ll tweak options.
  • Dapper (if using pgvector + PostgreSQL for storage).
  • Qdrant.Client (if using Qdrant; or you can call its REST API directly with HttpClient).

Model & Data

  • A sentence embedding ONNX model (e.g., a BGE, MiniLM, or similar model exported to ONNX).
  • Text documents (product descriptions, knowledge base articles, etc.) to index.

Step-by-Step Implementation

Step 1: Project Setup

Create a new .NET 8 Web API (minimal APIs) project:

dotnet new webapi -n SemanticBackend
cd SemanticBackend

Edit SemanticBackend.csproj to target .NET 8 and add packages:

<Project Sdk="Microsoft.NET.Sdk.Web">
  <PropertyGroup>
    <TargetFramework>net8.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.ML.OnnxRuntime" Version="1.20.0" />
    <PackageReference Include="Microsoft.ML.OnnxRuntime.Managed" Version="1.20.0" />
    <PackageReference Include="Dapper" Version="2.1.35" />
    <PackageReference Include="Qdrant.Client" Version="3.5.0" />
  </ItemGroup>
</Project>

Place your ONNX model file under ./Models/embeddings.onnx and mark it as Copy if newer in the .csproj:

<ItemGroup>
  <None Include="Models\embeddings.onnx" CopyToOutputDirectory="PreserveNewest" />
</ItemGroup>

Step 2: Define Core Domain Types

We’ll focus on a simple domain: documents with semantic search.

namespace SemanticBackend.Documents;

public sealed record Document(
    Guid Id,
    string ExternalId,
    string Title,
    string Content,
    float[] Embedding,
    DateTimeOffset CreatedAt);

For API DTOs:

namespace SemanticBackend.Api;

public sealed record IndexDocumentRequest(
    string ExternalId,
    string Title,
    string Content);

public sealed record SearchRequest(
    string Query,
    int TopK = 5);

public sealed record SearchResult(
    Guid Id,
    string ExternalId,
    string Title,
    string Content,
    double Score);

Step 3: Implement an ONNX Embedding Service

This service will:

  • Load the ONNX model once at startup.
  • Preprocess text (tokenization can be done outside ONNX or inside, depending on the model).
  • Run inference and return a normalized embedding vector.

Basic abstraction:

namespace SemanticBackend.Embeddings;

public interface IEmbeddingGenerator
{
    ValueTask<float[]> GenerateAsync(string text, CancellationToken ct = default);
}

ONNX-based implementation (simplified – assumes the model takes a single input tensor already preprocessed; you can extend this to include tokenization or use a model exported with pre/post processing baked in):

using System.Numerics;
using Microsoft.ML.OnnxRuntime;
using Microsoft.ML.OnnxRuntime.Tensors;

namespace SemanticBackend.Embeddings;

public sealed class OnnxEmbeddingGenerator : IEmbeddingGenerator, IAsyncDisposable
{
    private readonly InferenceSession _session;
    private readonly string _inputName;
    private readonly string _outputName;

    public OnnxEmbeddingGenerator(string modelPath)
    {
        // Configure session options (CPU, threads, graph optimizations)
        var options = new SessionOptions
        {
            GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL
        };
        options.EnableMemoryPattern = true;

        _session = new InferenceSession(modelPath, options);

        // Inspect model metadata for input/output names if needed.
        _inputName = _session.InputMetadata.Keys.First();
        _outputName = _session.OutputMetadata.Keys.First();
    }

    public ValueTask<float[]> GenerateAsync(string text, CancellationToken ct = default)
    {
        // You would normally do proper tokenization here or call a model
        // that encapsulates tokenization in the ONNX graph.
        // For demo, we assume an external process provides us with a fixed-size input vector.
        // Replace this with real tokenization for a production system.

        // Example: fake tokenization into a fixed-length float vector
        const int inputLength = 128;
        var inputTensor = new DenseTensor<float>(new[] { 1, inputLength });

        var span = inputTensor.Buffer.Span;
        span.Clear();

        // SUPER simplified: map chars to floats
        var length = Math.Min(text.Length, inputLength);
        for (var i = 0; i < length; i++)
        {
            span[i] = text[i] % 128; // safe demo mapping
        }

        var inputs = new List<NamedOnnxValue>
        {
            NamedOnnxValue.CreateFromTensor(_inputName, inputTensor)
        };

        using var results = _session.Run(inputs);
        var outputTensor = results.First(v => v.Name == _outputName).AsTensor<float>();

        var embedding = outputTensor.ToArray();
        NormalizeInPlace(embedding);

        return ValueTask.FromResult(embedding);
    }

    private static void NormalizeInPlace(Span<float> vector)
    {
        var length = vector.Length;
        if (length == 0) return;

        // Use double accumulator to minimize rounding
        double sumSquares = 0;
        for (var i = 0; i < length; i++)
        {
            var v = vector[i];
            sumSquares += (double)v * v;
        }

        var norm = Math.Sqrt(sumSquares);
        if (norm < 1e-12) return;

        var inv = (float)(1.0 / norm);
        for (var i = 0; i < length; i++)
        {
            vector[i] *= inv;
        }
    }

    public ValueTask DisposeAsync()
    {
        _session.Dispose();
        return ValueTask.CompletedTask;
    }
}

Note: In production, you should plug in a real tokenizer and model-specific pre/post-processing. The overall pattern remains the same.

Step 4: Implement a Vector Store Abstraction

We want the rest of the code to be independent of the specific database implementation.

namespace SemanticBackend.VectorStore;

using SemanticBackend.Documents;

public interface IVectorStore
{
    Task IndexAsync(Document document, CancellationToken ct = default);

    Task<IReadOnlyList<(Document Document, double Score)>> SearchAsync(
        float[] queryEmbedding,
        int topK,
        CancellationToken ct = default);
}

Step 5: In-Memory Vector Store (for Fast Iteration)

We’ll start with an in-memory store implementing cosine similarity. This is great for local development and testing.

using System.Collections.Concurrent;
using SemanticBackend.Documents;

namespace SemanticBackend.VectorStore;

public sealed class InMemoryVectorStore : IVectorStore
{
    private readonly ConcurrentDictionary<Guid, Document> _documents = new();

    public Task IndexAsync(Document document, CancellationToken ct = default)
    {
        _documents[document.Id] = document;
        return Task.CompletedTask;
    }

    public Task<IReadOnlyList<(Document Document, double Score)>> SearchAsync(
        float[] queryEmbedding,
        int topK,
        CancellationToken ct = default)
    {
        if (_documents.Count == 0)
        {
            return Task.FromResult<IReadOnlyList<(Document, double)>>
                (Array.Empty<(Document, double)>());
        }

        // Cosine similarity: dot(a, b) / (|a| * |b|), but since vectors
        // are normalized, this is just dot(a, b).
        var results = new List<(Document, double)>(_documents.Count);

        foreach (var doc in _documents.Values)
        {
            var score = Dot(queryEmbedding, doc.Embedding);
            results.Add((doc, score));
        }

        var top = results
            .OrderByDescending(r => r.Item2)
            .Take(topK)
            .ToArray();

        return Task.FromResult<IReadOnlyList<(Document, double)>>(top);
    }

    private static double Dot(ReadOnlySpan<float> a, ReadOnlySpan<float> b)
    {
        if (a.Length != b.Length)
        {
            throw new InvalidOperationException(
                $"Vector dimension mismatch: {a.Length} vs {b.Length}.");
        }

        var sum = 0.0;
        for (var i = 0; i < a.Length; i++)
        {
            sum += a[i] * b[i];
        }

        return sum;
    }
}

Step 6: Qdrant Vector Store (Production-Style Example)

Let’s add a Qdrant-backed store to illustrate real vector DB usage. We assume a collection with vector_size equal to your embedding dimension and appropriate distance metric (cosine).

using Qdrant.Client;
using Qdrant.Client.Grpc;
using SemanticBackend.Documents;

namespace SemanticBackend.VectorStore;

public sealed class QdrantVectorStore : IVectorStore
{
    private readonly QdrantClient _client;
    private readonly string _collectionName;
    private readonly int _dimension;

    public QdrantVectorStore(QdrantClient client, string collectionName, int dimension)
    {
        _client = client;
        _collectionName = collectionName;
        _dimension = dimension;
    }

    public async Task IndexAsync(Document document, CancellationToken ct = default)
    {
        if (document.Embedding.Length != _dimension)
        {
            throw new InvalidOperationException(
                $"Vector dimension mismatch: expected {_dimension}, got {document.Embedding.Length}.");
        }

        var payload = new Dictionary<string, object?>
        {
            ["externalId"] = document.ExternalId,
            ["title"] = document.Title,
            ["content"] = document.Content,
            ["createdAt"] = document.CreatedAt
        };

        var point = new PointStruct
        {
            Id = document.Id.ToString(),
            Vectors = new Vectors
            {
                Vector_ = { document.Embedding.Select(v => (double)v) }
            },
            Payload = { payload.ToStruct() }
        };

        await _client.UpsertAsync(
            _collectionName,
            new[] { point },
            cancellationToken: ct);
    }

    public async Task<IReadOnlyList<(Document Document, double Score)>> SearchAsync(
        float[] queryEmbedding,
        int topK,
        CancellationToken ct = default)
    {
        var searchPoints = await _client.SearchAsync(
            _collectionName,
            queryEmbedding.Select(v => (double)v),
            topK,
            withPayload: true,
            cancellationToken: ct);

        var results = new List<(Document, double)>(searchPoints.Count);

        foreach (var point in searchPoints)
        {
            var payload = point.Payload?.Fields ?? new Dictionary<string, Google.Protobuf.WellKnownTypes.Value>();

            var externalId = payload.TryGetValue("externalId", out var extVal)
                ? extVal.StringValue
                : string.Empty;

            var title = payload.TryGetValue("title", out var titleVal)
                ? titleVal.StringValue
                : string.Empty;

            var content = payload.TryGetValue("content", out var contentVal)
                ? contentVal.StringValue
                : string.Empty;

            var createdAt = payload.TryGetValue("createdAt", out var createdVal)
                ? DateTimeOffset.Parse(createdVal.StringValue)
                : DateTimeOffset.UtcNow;

            // For many APIs, the original vector is not returned; you might not need it
            // for read scenarios. For simplicity, we reuse the query embedding.
            var doc = new Document(
                Guid.Parse(point.Id.StringValue),
                externalId,
                title,
                content,
                queryEmbedding,
                createdAt);

            results.Add((doc, point.Score));
        }

        return results;
    }
}

Note: The ToStruct() extension is straightforward to implement using Google.Protobuf.WellKnownTypes.Struct if your Qdrant client doesn’t already provide helpers.

Step 7: Application Service Layer

Now we compose the embedding generator with the vector store into a use-case–centric service.

using SemanticBackend.Api;
using SemanticBackend.Documents;
using SemanticBackend.Embeddings;
using SemanticBackend.VectorStore;

namespace SemanticBackend.Application;

public interface IDocumentService
{
    Task<Guid> IndexAsync(IndexDocumentRequest request, CancellationToken ct = default);

    Task<IReadOnlyList<SearchResult>> SearchAsync(SearchRequest request, CancellationToken ct = default);
}

public sealed class DocumentService(IEmbeddingGenerator embeddings, IVectorStore store)
    : IDocumentService
{
    public async Task<Guid> IndexAsync(IndexDocumentRequest request, CancellationToken ct = default)
    {
        var embedding = await embeddings.GenerateAsync(request.Content, ct);

        var document = new Document(
            Id: Guid.NewGuid(),
            ExternalId: request.ExternalId,
            Title: request.Title,
            Content: request.Content,
            Embedding: embedding,
            CreatedAt: DateTimeOffset.UtcNow);

        await store.IndexAsync(document, ct);

        return document.Id;
    }

    public async Task<IReadOnlyList<SearchResult>> SearchAsync(SearchRequest request, CancellationToken ct = default)
    {
        var queryEmbedding = await embeddings.GenerateAsync(request.Query, ct);
        var matches = await store.SearchAsync(queryEmbedding, request.TopK, ct);

        return matches
            .Select(m => new SearchResult(
                m.Document.Id,
                m.Document.ExternalId,
                m.Document.Title,
                m.Document.Content,
                m.Score))
            .ToArray();
    }
}

Step 8: Wire Everything in Program.cs (Minimal API)

Now we expose REST endpoints using minimal APIs.

using Microsoft.AspNetCore.Http.HttpResults;
using SemanticBackend.Api;
using SemanticBackend.Application;
using SemanticBackend.Embeddings;
using SemanticBackend.VectorStore;
using Qdrant.Client;

var builder = WebApplication.CreateBuilder(args);

// Configuration
var configuration = builder.Configuration;

var modelPath = Path.Combine(AppContext.BaseDirectory, "Models", "embeddings.onnx");
const int embeddingDimension = 384; // Adjust to your model

// DI registrations
builder.Services.AddSingleton<IEmbeddingGenerator>(_ => new OnnxEmbeddingGenerator(modelPath));

// Choose one vector store implementation.
// For local/dev:
builder.Services.AddSingleton<IVectorStore, InMemoryVectorStore>();

// For Qdrant (comment the above and uncomment these):
// var qdrantUri = configuration.GetValue<string>("Qdrant:Url") ?? "http://localhost:6334";
// var qdrantCollection = configuration.GetValue<string>("Qdrant:Collection") ?? "documents";
// builder.Services.AddSingleton(new QdrantClient(qdrantUri));
// builder.Services.AddSingleton<IVectorStore>(sp =>
// {
//     var client = sp.GetRequiredService<QdrantClient>();
//     return new QdrantVectorStore(client, qdrantCollection, embeddingDimension);
// });

builder.Services.AddScoped<IDocumentService, DocumentService>();

builder.Services.ConfigureHttpJsonOptions(options =>
{
    options.SerializerOptions.PropertyNamingPolicy = null;
    options.SerializerOptions.WriteIndented = false;
});

var app = builder.Build();

app.MapPost("/documents/index", async Task<Results<Ok<Guid>, BadRequest<string>>> (
    IndexDocumentRequest request,
    IDocumentService service,
    CancellationToken ct) =>
{
    if (string.IsNullOrWhiteSpace(request.Content))
    {
        return TypedResults.BadRequest("Content must not be empty.");
    }

    var id = await service.IndexAsync(request, ct);
    return TypedResults.Ok(id);
});

app.MapPost("/documents/search", async Task<Ok<IReadOnlyList<SearchResult>>> (
    SearchRequest request,
    IDocumentService service,
    CancellationToken ct) =>
{
    if (string.IsNullOrWhiteSpace(request.Query))
    {
        return TypedResults.Ok(Array.Empty<SearchResult>());
    }

    var results = await service.SearchAsync(request, ct);
    return TypedResults.Ok(results);
});

app.Run();

You now have:

  • POST /documents/index – index a document (compute embedding + store in vector DB).
  • POST /documents/search – semantic search over indexed documents.

Step 9: Semantic Features: RAG-style Answering (Optional but Powerful)

Once you have semantic search, layering retrieval-augmented generation (RAG) becomes straightforward. Instead of returning the documents, you can compose them into a prompt for an LLM (local ONNX LLM or remote provider).

Example service method (pseudo-LLM call):

public sealed class RagService(IDocumentService documents, IChatModel chatModel)
{
    public async Task<string> AskAsync(string question, CancellationToken ct = default)
    {
        var searchResults = await documents.SearchAsync(
            new SearchRequest(question, TopK: 5), ct);

        var context = string.Join("\n\n", searchResults.Select(r =>
            $"Title: {r.Title}\nContent: {r.Content}"));

        var prompt = $"""
        You are a helpful assistant. Answer the question based only on the context.

        Context:
        {context}

        Question: {question}
        """;

        var answer = await chatModel.CompleteAsync(prompt, ct);
        return answer;
    }
}

Where IChatModel could be implemented using another ONNX model (e.g., Phi-3) or a cloud provider.

Production-Ready C# Patterns & Examples

Pattern: Batching Embedding Requests

For high throughput, you want to batch embeddings whenever possible.

public interface IBatchEmbeddingGenerator
{
    ValueTask<float[][]> GenerateBatchAsync(
        IReadOnlyList<string> texts,
        CancellationToken ct = default);
}

Inside your ONNX implementation, you can create a tensor of shape [batchSize, sequenceLength] and run a single _session.Run() call, then split the output tensor into separate vectors per item. This significantly improves throughput when handling many small requests (e.g., indexing jobs).

Pattern: Background Indexing

Use a background queue for indexing to reduce latency on the write path:

public sealed class IndexingBackgroundService(
    Channel<IndexDocumentRequest> channel,
    IDocumentService documentService,
    ILogger<IndexingBackgroundService> logger) : BackgroundService
{
    protected override async Task ExecuteAsync(CancellationToken stoppingToken)
    {
        await foreach (var request in channel.Reader.ReadAllAsync(stoppingToken))
        {
            try
            {
                await documentService.IndexAsync(request, stoppingToken);
            }
            catch (Exception ex)
            {
                logger.LogError(ex, "Error indexing document {ExternalId}", request.ExternalId);
            }
        }
    }
}

Common Pitfalls & Troubleshooting

1. Vector Dimension Mismatch

Symptom: Errors like “vector dimension mismatch” or “expected dim X, got Y”.

Cause: Your model outputs a vector of dimension N, but your vector DB or code assumes a different size.

Fix:

  • Determine the embedding dimension once (inspect the ONNX output tensor shape).
  • Store that dimension in configuration and enforce it in the vector store (as shown in QdrantVectorStore).

2. ONNX Runtime Native Dependencies

Symptom: App fails to start with missing DLL or shared library errors.

Cause: Native ONNX Runtime binaries missing for your platform.

Fix:

  • Use Microsoft.ML.OnnxRuntime.Managed for CPU-only deployments to avoid native dependency complexity.
  • If using GPU or specific providers, ensure the correct runtime package is added and native libraries are present in your container or host.

3. Latency Spikes on First Request

Symptom: First inference is slow (model load, JIT, etc.).

Fix:

  • Warm up ONNX at startup by running a single dummy inference in the OnnxEmbeddingGenerator constructor or via IHostedService.

4. High Memory Usage

Symptom: Memory grows with concurrent requests.

Causes: Large allocations per request, no reuse of buffers, unbounded caching.

Fix:

  • Reuse tensors and buffers via pooling where possible.
  • Return only needed data to clients (avoid sending embeddings over the wire).
  • Use structs or readonly records for value types and avoid unnecessary copies.

5. Inaccurate or Poor-Quality Results

Symptom: Semantic search results look random or irrelevant.

Causes: Wrong model type, missing normalization, bad pre-processing.

Fix:

  • Use a model trained for sentence embeddings, not classification.
  • Normalize embeddings to unit length before storing.
  • Ensure the same pre-processing is used for indexing and querying.

Performance & Scalability Considerations

1. Horizontal Scaling

  • Deploy multiple instances of the API behind a load balancer.
  • Keep the vector store external (Qdrant/pgvector) so any instance can serve queries.
  • Ensure ONNX model loading is instance-local, but the model file is part of your container image.

2. Concurrency & Threading

  • ONNX InferenceSession is safe for concurrent use in many scenarios; use a singleton per model.
  • Limit max degree of parallelism via configuration if CPU saturates; you can wrap embeddings calls in a semaphore to protect CPU.

3. Caching

  • Cache embeddings for frequently queried texts (e.g., by hashing the text and storing the vector in a cache layer).
  • Cache search results for popular queries with a short TTL.

4. Indexing Strategy

  • Bulk index documents offline before flipping traffic for new datasets.
  • Use batch APIs for your vector DB to reduce network overhead.

5. Observability

  • Emit metrics: inference latency, search latency, QPS, error rates, queue depth for background indexing.
  • Log only necessary data (avoid raw embeddings in logs).

Practical Best Practices

1. Separate Concerns Clearly

  • Embedding generation is an infrastructure concern (ONNX).
  • Vector storage/search is another infrastructure boundary.
  • Application services orchestrate both to implement business use-cases.

2. Strong Typing Around Semantic Operations

Use domain-specific abstractions like SemanticSearchResult, Embedding value objects, and dedicated services. This makes it easier to evolve the underlying implementation without leaking details.

3. Testing Strategy

  • Unit tests: mock IEmbeddingGenerator and IVectorStore to test application logic.
  • Integration tests: spin up an in-memory vector store and run end-to-end index + search flows.
  • Load tests: use tools like k6 or NBomber to stress-test concurrent search/index semantics.

4. Configuration Management

  • Make model path, embedding dimension, vector DB connection details configurable via appsettings or environment variables.
  • Expose a health endpoint that checks ONNX session initialization and vector DB connectivity.

5. Backward Compatibility

  • If you upgrade models (changing embedding dimension), keep old and new collections in the vector DB and version them.
  • Provide a migration path or dual-read strategy until reindexing is done.

Conclusion

We’ve built a modern, AI-first .NET 8 backend that:

  • Uses ONNX Runtime for fast, local embedding generation.
  • Stores and searches embeddings via a pluggable vector store abstraction.
  • Exposes clean HTTP APIs for indexing and semantic search.

From here, you can:

  • Swap the in-memory vector store for Qdrant/pgvector in production.
  • Integrate a local or remote LLM and implement full RAG flows.
  • Extend the model to support multi-modal embeddings (e.g., images + text) using different ONNX models.

FAQs

1. How do I choose the right ONNX embedding model?

Pick a model that is explicitly designed for sentence embeddings or similarity search and has a good balance between embedding size and speed. Smaller dimensions (e.g., 384–768) are usually enough for many enterprise scenarios while being faster and more memory-efficient than very large embeddings.

2. Can I run this on .NET 6 or 7 instead of .NET 8?

Yes, the concepts are the same. Minimal APIs exist in .NET 6+, and ONNX Runtime works across these versions. You might need minor adjustments to the project file and language features depending on the C# version.

3. How do I implement real tokenization instead of the dummy char mapping?

You have two main options:

  • Export the model to ONNX with tokenizer and pre-processing embedded in the graph, so the input is raw text.
  • Implement the tokenizer in .NET to match the original model (e.g., BPE or WordPiece). This usually means porting the tokenizer logic or using a compatible library and then feeding token IDs into the ONNX model.

4. Should I normalize vectors in the ONNX graph or in C#?

Either works. Normalizing in C# (as shown) is flexible and easy to reason about; normalizing in the ONNX graph simplifies your C# code and guarantees consistent behavior across languages. The key is to normalize consistently for both indexing and querying.

5. How do I secure these APIs?

Treat them like any other internal microservice:

  • Use authentication/authorization (JWT, OAuth2, API keys) at the gateway or directly in the API.
  • Apply rate limiting on search and indexing endpoints.
  • Audit access if sensitive documents are indexed.

6. Can I store embeddings directly in PostgreSQL without pgvector?

You can store embeddings as arrays or JSON and compute similarity in your application or via custom functions, but performance will be limited. pgvector gives you efficient vector types and index structures (IVFFlat, HNSW) suitable for high-throughput APIs.

7. How large can my corpus be before I need a “real” vector DB?

In-memory or naive approaches work for thousands to tens of thousands of vectors. Once you reach hundreds of thousands or millions of vectors, specialized vector DBs (Qdrant, Milvus, pgvector) become important for both latency and resource usage.

8. How do I test semantic accuracy in an automated way?

Create a small labeled dataset of query-document pairs with ground-truth relevance labels. Run your semantic search pipeline against it and compute metrics such as MRR, nDCG, or precision@K. Integrate those tests into your CI/CD pipeline to catch regressions when changing models or preprocessing logic.

9. Can I add semantic capabilities to existing REST endpoints without breaking clients?

Yes. You can:

  • Add new query parameters (e.g., ?query= for semantic search).
  • Introduce new endpoints under a /semantic route.
  • Keep existing keyword-based search endpoints intact while gradually adopting semantic search behind a feature flag.

10. How do I handle multi-tenant data in the vector store?

Include a tenant identifier in your payload (Qdrant) or as a column (pgvector) and add it as a hard filter to all queries. You may also decide to use separate collections/tables per tenant if isolation requirements are strict or if you need different models per tenant.

 

 

 

 

 

You might be interest at

AI-Native .NET: Building Intelligent Applications with Azure OpenAI, Semantic Kernel, and ML.NET

AI-Augmented .NET Backends: Building Intelligent, Agentic APIs with ASP.NET Core and Azure OpenAI

Master Effortless Cloud-Native .NET Microservices Using DAPR, gRPC & Azure Kubernetes Service

🔗 Suggested External Links

  • ONNX Runtime Official Docs
    https://onnxruntime.ai/
  • Qdrant Vector Database
    https://qdrant.tech/
  • pgvector for PostgreSQL
    https://github.com/pgvector/pgvector

AI-Native .NET: Building Intelligent Applications with Azure OpenAI, Semantic Kernel, and ML.NET

UnknownX · January 10, 2026 · Leave a Comment

Building AI-Native .NET Applications with Azure OpenAI, Semantic Kernel, and ML.NET

 Executive Summary

Modern organizations are rapidly adopting AI-Native .NET approaches to remain competitive in an AI-accelerated landscape. Traditional .NET applications are no longer enough—teams now need systems that can reason over data, automate decision-making, and learn from patterns. Whether you’re building intelligent chatbots, document analysis pipelines, customer-support copilots, or predictive forecasting features, AI-Native .NET development using Azure OpenAI, Semantic Kernel, and ML.NET provides the optimal foundation.

This guide addresses a critical challenge: how to architect and implement artificial intelligence in .NET without reinventing the wheel, breaking clean architecture, or introducing untestable components. The real-world problem is clear—developers need a unified approach to orchestrate LLM calls, manage long-term memory and context, handle function calling, and integrate traditional machine-learning models inside scalable systems.

By combining Azure OpenAI for reasoning, Semantic Kernel for orchestration, and ML.NET for structured predictions, you can build production-ready, AI-Native .NET applications with clean architecture, testability, and maintainability. This tutorial synthesizes industry best practices into a step-by-step roadmap you can use immediately.


🛠️ Prerequisites for Building AI-Native .NET Apps

Before getting started, ensure you have the following setup.

Development Environment

  • .NET 8.0 SDK or higher

  • Visual Studio 2022 or VS Code with C# Dev Kit installed

  • Git for version control

  • Optional: Docker Desktop for local container testing

Azure Requirements

  • Active Azure subscription (free tier works to get started)

  • Azure OpenAI resource deployed with a GPT-4, GPT-4o, or newer model

  • Optional: Azure AI Search (for RAG/document intelligence scenarios)

  • Securely stored credentials:

    • API keys

    • Endpoint URLs

    • Managed Identity if working keyless

Required NuGet Packages:
– `Microsoft.SemanticKernel` (latest stable version)
– `Microsoft.Extensions.DependencyInjection`
– `Microsoft.Extensions.Logging`
– `Microsoft.Extensions.Configuration.UserSecrets`
– `ML.NET` (for traditional ML integration)
– `Azure.AI.OpenAI` (for direct Azure OpenAI calls)

Knowledge Prerequisites:
– Solid understanding of C# and async/await patterns
– Familiarity with dependency injection and configuration management
– Basic knowledge of REST APIs and authentication
– Understanding of LLM concepts (tokens, temperature, context windows)

Step-by-Step Implementation

 Step 1: Project Setup and Configuration

Create a new .NET console application and configure your project structure:


dotnet new console -n AINativeDotNet
cd AINativeDotNet
dotnet add package Microsoft.SemanticKernel
dotnet add package Microsoft.Extensions.DependencyInjection
dotnet add package Microsoft.Extensions.Logging.Console
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
dotnet add package Azure.AI.OpenAI
dotnet user-secrets init

Store your Azure OpenAI credentials securely using user secrets:


dotnet user-secrets set "AzureOpenAI:Endpoint" "https://your-resource.openai.azure.com/"
dotnet user-secrets set "AzureOpenAI:ApiKey" "your-api-key"
dotnet user-secrets set "AzureOpenAI:DeploymentName" "gpt-4o"

Step 2: Configure the Semantic Kernel

The Kernel is your central orchestrator. Set it up with proper dependency injection:


using Microsoft.SemanticKernel;
using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;

public static class KernelConfiguration
{
    public static IServiceCollection AddAIServices(
        this IServiceCollection services,
        IConfiguration configuration)
    {
        var endpoint = configuration["AzureOpenAI:Endpoint"]
            ?? throw new InvalidOperationException("Missing Azure OpenAI endpoint");
        var apiKey = configuration["AzureOpenAI:ApiKey"]
            ?? throw new InvalidOperationException("Missing Azure OpenAI API key");
        var deploymentName = configuration["AzureOpenAI:DeploymentName"]
            ?? throw new InvalidOperationException("Missing deployment name");

        var builder = Kernel.CreateBuilder()
            .AddAzureOpenAIChatCompletion(deploymentName, endpoint, apiKey)
            .AddLogging(logging => logging.AddConsole());

        services.AddSingleton(builder.Build());
        return services;
    }
}

Step 3: Create a Chat Service with Context Management

Build a reusable chat service that manages conversation history and execution settings:


using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.ChatCompletion;

public class ChatService(Kernel kernel)
{
    private readonly ChatHistory _chatHistory = new();
    private const string SystemPrompt = 
        "You are a helpful AI assistant. Provide clear, concise answers.";

    public async Task SendMessageAsync(string userMessage)
    {
        var chatCompletionService = kernel.GetRequiredService();
        
        // Initialize chat history with system prompt on first message
        if (_chatHistory.Count == 0)
        {
            _chatHistory.AddSystemMessage(SystemPrompt);
        }

        _chatHistory.AddUserMessage(userMessage);

        var executionSettings = new PromptExecutionSettings
        {
            Temperature = 0.7,
            TopP = 0.9,
            MaxTokens = 2000
        };

        var response = await chatCompletionService.GetChatMessageContentAsync(
            _chatHistory,
            executionSettings,
            kernel);

        _chatHistory.AddAssistantMessage(response.Content ?? string.Empty);

        return response.Content ?? string.Empty;
    }

    public void ClearHistory()
    {
        _chatHistory.Clear();
    }

    public IReadOnlyList GetHistory() => _chatHistory.AsReadOnly();
}

 Step 4: Implement Function Calling (Plugins)

Create native functions that the AI can invoke automatically:


using Microsoft.SemanticKernel;
using System.ComponentModel;

public class CalculatorPlugin
{
    [KernelFunction("add")]
    [Description("Adds two numbers together")]
    public static int Add(
        [Description("The first number")] int a,
        [Description("The second number")] int b)
    {
        return a + b;
    }

    [KernelFunction("multiply")]
    [Description("Multiplies two numbers")]
    public static int Multiply(
        [Description("The first number")] int a,
        [Description("The second number")] int b)
    {
        return a * b;
    }
}

public class WeatherPlugin
{
    [KernelFunction("get_weather")]
    [Description("Gets the current weather for a city")]
    public async Task GetWeather(
        [Description("The city name")] string city)
    {
        // In production, call a real weather API
        await Task.Delay(100);
        return $"The weather in {city} is sunny, 72°F";
    }
}

Register plugins with the kernel:


var kernel = builder.Build();
kernel.Plugins.AddFromType();
kernel.Plugins.AddFromType();

 Step 5: Build a RAG (Retrieval-Augmented Generation) System

For document-aware responses, integrate Azure AI Search:


using Azure.Search.Documents;
using Azure.Search.Documents.Models;
using Azure;

public class DocumentRetrievalService(SearchClient searchClient)
{
    public async Task<List> RetrieveRelevantDocumentsAsync(
        string query,
        int topResults = 3)
    {
        var searchOptions = new SearchOptions
        {
            Size = topResults,
            Select = { "content", "source" }
        };

        var results = await searchClient.SearchAsync(
            query,
            searchOptions);

        var documents = new List();
        await foreach (var result in results.GetResultsAsync())
        {
            if (result.Document.TryGetValue("content", out var content))
            {
                documents.Add(content.ToString() ?? string.Empty);
            }
        }

        return documents;
    }
}

public class RAGChatService(
    Kernel kernel,
    DocumentRetrievalService documentService)
{
    private readonly ChatHistory _chatHistory = new();

    public async Task SendMessageWithContextAsync(string userMessage)
    {
        // Retrieve relevant documents
        var documents = await documentService.RetrieveRelevantDocumentsAsync(userMessage);
        
        // Build context from documents
        var context = string.Join("\n\n", documents);
        var enrichedPrompt = $"""
            Based on the following documents:
            {context}
            
            Answer this question: {userMessage}
            """;

        _chatHistory.AddUserMessage(enrichedPrompt);

        var chatCompletionService = kernel.GetRequiredService();
        var response = await chatCompletionService.GetChatMessageContentAsync(
            _chatHistory,
            kernel: kernel);

        _chatHistory.AddAssistantMessage(response.Content ?? string.Empty);

        return response.Content ?? string.Empty;
    }
}

Step 6: Integrate ML.NET for Hybrid Intelligence

Combine LLMs with traditional ML for scenarios requiring fast, local inference:


using Microsoft.ML;
using Microsoft.ML.Data;

public class SentimentData
{
    [LoadColumn(0)]
    public string Text { get; set; } = string.Empty;

    [LoadColumn(1)]
    [ColumnName("Label")]
    public bool Sentiment { get; set; }
}

public class SentimentPrediction
{
    [ColumnName("PredictedLabel")]
    public bool Prediction { get; set; }

    public float Probability { get; set; }
    public float Score { get; set; }
}

public class HybridAnalysisService(Kernel kernel)
{
    private readonly MLContext _mlContext = new();
    private ITransformer? _model;

    public async Task AnalyzeTextAsync(string text)
    {
        // Step 1: Quick sentiment classification with ML.NET
        var sentimentScore = PredictSentiment(text);

        // Step 2: If sentiment is neutral or mixed, use LLM for deeper analysis
        if (sentimentScore.Probability < 0.7)
        {
            var chatService = new ChatService(kernel);
            var deepAnalysis = await chatService.SendMessageAsync(
                $"Provide a detailed sentiment analysis of: {text}");
            
            return new AnalysisResult
            {
                QuickSentiment = sentimentScore.Prediction,
                Confidence = sentimentScore.Probability,
                DetailedAnalysis = deepAnalysis
            };
        }

        return new AnalysisResult
        {
            QuickSentiment = sentimentScore.Prediction,
            Confidence = sentimentScore.Probability,
            DetailedAnalysis = null
        };
    }

    private SentimentPrediction PredictSentiment(string text)
    {
        // In production, load a pre-trained model
        var predictionEngine = _mlContext.Model.CreatePredictionEngine<SentimentData, SentimentPrediction>(_model!);
        return predictionEngine.Predict(new SentimentData { Text = text });
    }
}

public class AnalysisResult
{
    public bool QuickSentiment { get; set; }
    public float Confidence { get; set; }
    public string? DetailedAnalysis { get; set; }
}

 Step 7: Complete Program.cs with Dependency Injection

Wire everything together:


using Microsoft.Extensions.Configuration;
using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Logging;

var configuration = new ConfigurationBuilder()
    .AddUserSecrets()
    .Build();

var services = new ServiceCollection();

services
    .AddAIServices(configuration)
    .AddSingleton()
    .AddLogging(logging => logging.AddConsole());

var serviceProvider = services.BuildServiceProvider();
var chatService = serviceProvider.GetRequiredService();

Console.WriteLine("AI-Native .NET Chat Application");
Console.WriteLine("Type 'exit' to quit\n");

while (true)
{
    Console.Write("You: ");
    var userInput = Console.ReadLine();

    if (userInput?.Equals("exit", StringComparison.OrdinalIgnoreCase) ?? false)
        break;

    if (string.IsNullOrWhiteSpace(userInput))
        continue;

    try
    {
        var response = await chatService.SendMessageAsync(userInput);
        Console.WriteLine($"Assistant: {response}\n");
    }
    catch (Exception ex)
    {
        Console.WriteLine($"Error: {ex.Message}\n");
    }
}

Production-Ready C# Examples

Advanced: Streaming Responses

For better UX, stream responses token-by-token:


public class StreamingChatService(Kernel kernel)
{
    public async IAsyncEnumerable SendMessageStreamAsync(
        string userMessage,
        [EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        var chatCompletionService = kernel.GetRequiredService();
        var chatHistory = new ChatHistory { new(AuthorRole.User, userMessage) };

        await foreach (var chunk in chatCompletionService.GetStreamingChatMessageContentAsync(
            chatHistory,
            kernel: kernel,
            cancellationToken: cancellationToken))
        {
            if (!string.IsNullOrEmpty(chunk.Content))
            {
                yield return chunk.Content;
            }
        }
    }
}

// Usage
var streamingService = serviceProvider.GetRequiredService();
await foreach (var token in streamingService.SendMessageStreamAsync("Hello"))
{
    Console.Write(token);
}

Advanced: Error Handling and Retry Logic

Implement resilient patterns for production:


using Polly;
using Polly.CircuitBreaker;

public class ResilientChatService(Kernel kernel, ILogger logger)
{
    private readonly IAsyncPolicy _retryPolicy = Policy
        .Handle()
        .Or()
        .OrResult(r => string.IsNullOrEmpty(r))
        .WaitAndRetryAsync(
            retryCount: 3,
            sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
            onRetry: (outcome, timespan, retryCount, context) =>
            {
                logger.LogWarning(
                    "Retry {RetryCount} after {Delay}ms",
                    retryCount,
                    timespan.TotalMilliseconds);
            });

    public async Task SendMessageAsync(string userMessage)
    {
        return await _retryPolicy.ExecuteAsync(async () =>
        {
            var chatCompletionService = kernel.GetRequiredService();
            var chatHistory = new ChatHistory { new(AuthorRole.User, userMessage) };

            var response = await chatCompletionService.GetChatMessageContentAsync(
                chatHistory,
                kernel: kernel);

            return response.Content ?? throw new InvalidOperationException("Empty response");
        });
    }
}

Common Pitfalls & Troubleshooting

**Pitfall 1: Token Limit Exceeded**
– **Problem:** Long conversations cause “context window exceeded” errors
– **Solution:** Implement conversation summarization or sliding window approach


public async Task SummarizeConversationAsync(ChatHistory history)
{
    var summaryPrompt = $"""
        Summarize this conversation in 2-3 sentences:
        {string.Join("\n", history.Select(m => $"{m.Role}: {m.Content}"))}
        """;
    
    var chatService = kernel.GetRequiredService();
    var result = await chatService.GetChatMessageContentAsync(
        new ChatHistory { new(AuthorRole.User, summaryPrompt) },
        kernel: kernel);
    
    return result.Content ?? string.Empty;
}

**Pitfall 2: Credentials Exposed in Code**
– **Problem:** Hardcoding API keys in source code
– **Solution:** Always use Azure Key Vault or user secrets in development


// ❌ WRONG
var apiKey = "sk-abc123...";

// ✅ CORRECT
var apiKey = configuration["AzureOpenAI:ApiKey"]
    ?? throw new InvalidOperationException("API key not configured");

**Pitfall 3: Unhandled Async Deadlocks**
– **Problem:** Blocking on async calls with `.Result` or `.Wait()`
– **Solution:** Always use `await` in async contexts


// ❌ WRONG
var response = chatService.SendMessageAsync(message).Result;

// ✅ CORRECT
var response = await chatService.SendMessageAsync(message);

**Pitfall 4: Memory Leaks with Kernel Instances**
– **Problem:** Creating new Kernel instances repeatedly
– **Solution:** Register as singleton in DI container


// ✅ CORRECT
services.AddSingleton(kernel);

## Performance & Scalability Considerations

### Caching Responses

Implement caching for frequently asked questions:


using Microsoft.Extensions.Caching.Memory;

public class CachedChatService(
    Kernel kernel,
    IMemoryCache cache)
{
    private const string CacheKeyPrefix = "chat_response_";
    private const int CacheDurationMinutes = 60;

    public async Task SendMessageAsync(string userMessage)
    {
        var cacheKey = $"{CacheKeyPrefix}{userMessage.GetHashCode()}";

        if (cache.TryGetValue(cacheKey, out string? cachedResponse))
        {
            return cachedResponse!;
        }

        var chatCompletionService = kernel.GetRequiredService();
        var chatHistory = new ChatHistory { new(AuthorRole.User, userMessage) };

        var response = await chatCompletionService.GetChatMessageContentAsync(
            chatHistory,
            kernel: kernel);

        var content = response.Content ?? string.Empty;
        cache.Set(cacheKey, content, TimeSpan.FromMinutes(CacheDurationMinutes));

        return content;
    }
}

### Batch Processing for High Volume

Process multiple requests efficiently:


public class BatchChatService(Kernel kernel)
{
    public async Task<List> ProcessBatchAsync(
        List messages,
        int maxConcurrency = 5)
    {
        var semaphore = new SemaphoreSlim(maxConcurrency);
        var tasks = messages.Select(async message =>
        {
            await semaphore.WaitAsync();
            try
            {
                var chatService = new ChatService(kernel);
                return await chatService.SendMessageAsync(message);
            }
            finally
            {
                semaphore.Release();
            }
        });

        return (await Task.WhenAll(tasks)).ToList();
    }
}

Monitoring and Observability

Add structured logging for production diagnostics:


public class ObservableChatService(
    Kernel kernel,
    ILogger logger)
{
    public async Task SendMessageAsync(string userMessage)
    {
        var stopwatch = System.Diagnostics.Stopwatch.StartNew();
        
        try
        {
            logger.LogInformation(
                "Processing message: {MessageLength} characters",
                userMessage.Length);

            var chatCompletionService = kernel.GetRequiredService();
            var chatHistory = new ChatHistory { new(AuthorRole.User, userMessage) };

            var response = await chatCompletionService.GetChatMessageContentAsync(
                chatHistory,
                kernel: kernel);

            stopwatch.Stop();
            logger.LogInformation(
                "Message processed in {ElapsedMilliseconds}ms",
                stopwatch.ElapsedMilliseconds);

            return response.Content ?? string.Empty;
        }
        catch (Exception ex)
        {
            stopwatch.Stop();
            logger.LogError(
                ex,
                "Error processing message after {ElapsedMilliseconds}ms",
                stopwatch.ElapsedMilliseconds);
            throw;
        }
    }
}

Practical Best Practices

**1. Separate Concerns with Interfaces**


public interface IChatService
{
    Task SendMessageAsync(string message);
    void ClearHistory();
}

public class ChatService : IChatService
{
    // Implementation
}

// Register
services.AddScoped<IChatService, ChatService>();

2. Use Configuration Objects for Settings


public class AzureOpenAIOptions
{
    public string Endpoint { get; set; } = string.Empty;
    public string ApiKey { get; set; } = string.Empty;
    public string DeploymentName { get; set; } = string.Empty;
    public double Temperature { get; set; } = 0.7;
    public int MaxTokens { get; set; } = 2000;
}

// In appsettings.json
{
  "AzureOpenAI": {
    "Endpoint": "https://...",
    "ApiKey": "...",
    "DeploymentName": "gpt-4o",
    "Temperature": 0.7,
    "MaxTokens": 2000
  }
}

// Register with options pattern
services.Configure(configuration.GetSection("AzureOpenAI"));

**3. Implement Unit Testing**


using Xunit;
using Moq;

public class ChatServiceTests
{
    [Fact]
    public async Task SendMessageAsync_WithValidInput_ReturnsNonEmptyResponse()
    {
        // Arrange
        var mockKernel = new Mock();
        var mockChatCompletion = new Mock();
        
        mockChatCompletion
            .Setup(x => x.GetChatMessageContentAsync(
                It.IsAny(),
                It.IsAny(),
                It.IsAny(),
                It.IsAny()))
            .ReturnsAsync(new ChatMessageContent(AuthorRole.Assistant, "Test response"));

        mockKernel
            .Setup(x => x.GetRequiredService())
            .Returns(mockChatCompletion.Object);

        var service = new ChatService(mockKernel.Object);

        // Act
        var result = await service.SendMessageAsync("Hello");

        // Assert
        Assert.NotEmpty(result);
        Assert.Equal("Test response", result);
    }
}

**4. Document Your Plugins**


///
/// Provides mathematical operations for AI function calling.
/// 
public class CalculatorPlugin
{
    ///
    /// Adds two numbers and returns the sum.
    /// 
    ///The first operand
    ///The second operand
    /// The sum of a and b
    [KernelFunction("add")]
    [Description("Adds two numbers together")]
    public static int Add(
        [Description("The first number")] int a,
        [Description("The second number")] int b)
    {
        return a + b;
    }
}

Conclusion

You now have a comprehensive foundation for building AI-native .NET applications. The architecture you’ve learned—combining Semantic Kernel for orchestration, Azure OpenAI for intelligence, and ML.NET for specialized tasks—provides flexibility, maintainability, and production-readiness.

**Next Steps:**

1. **Deploy to Azure:** Use Azure Container Instances or App Service to host your application
2. **Add Monitoring:** Integrate Application Insights for production observability
3. **Implement Advanced Patterns:** Explore agent frameworks and multi-turn planning
4. **Optimize Costs:** Monitor token usage and implement caching strategies
5. **Scale Horizontally:** Design for distributed processing with Azure Service Bus or Azure Queue Storage

The AI landscape evolves rapidly. Stay current by following Microsoft’s Semantic Kernel repository, monitoring Azure OpenAI updates, and experimenting with new model capabilities as they become available.


You might be interest at

AI-Augmented .NET Backends: Building Intelligent, Agentic APIs with ASP.NET Core and Azure OpenAI

Headless Architecture in .NET Microservices with gRPC

AI-Driven .NET Development in 2026: How Senior Architects Master .NET 10 for Elite Performance Tuning

🔗 External Resources You Can Include

Azure OpenAI Service
https://learn.microsoft.com/azure/ai-services/openai/

Semantic Kernel GitHub
https://github.com/microsoft/semantic-kernel

ML.NET Official Docs
https://learn.microsoft.com/dotnet/machine-learning/

AI-Augmented .NET Backends: Building Intelligent, Agentic APIs with ASP.NET Core and Azure OpenAI

UnknownX · January 9, 2026 · Leave a Comment

 

Transform Your Backend into a Smart Autonomous Decision Layer

Executive Summary

Building Intelligent, Agentic APIs with ASP.NET Core and Azure OpenAI

Modern applications need far more than static JSON—they require intelligence, reasoning, and autonomous action. By integrating Azure OpenAI into ASP.NET Core, you can build agentic APIs capable of understanding natural language, analyzing content, and orchestrating workflows with minimal human intervention.

This guide shows how to go beyond basic chatbot calls and create production-ready AI APIs, unlocking:

  • Natural language decision-making

  • Content analysis pipelines

  • Real-time streaming responses

  • Tool calling for agent workflows

  • Resilient patterns suited for enterprise delivery

Whether you’re automating business operations or creating smart assistants, this blueprint gives you everything you need.


Prerequisites

Before writing a single line of code, make sure you have:

  • .NET 6+ (prefer .NET 8 for best performance)

  • Azure subscription

  • Azure OpenAI model deployment (gpt-4o-mini recommended)

  • IDE (Visual Studio or VS Code)

  • API key + endpoint

  • Familiarity with async patterns and dependency injection

Required NuGet packages

Install these packages in your ASP.NET Core project:

“`
dotnet add package Azure.AI.OpenAI
dotnet add package Azure.Identity
dotnet add package Microsoft.Extensions.AI
dotnet add package Microsoft.Extensions.Configuration.UserSecrets
“`

Step 1 — Securely Configure Azure OpenAI

Options class

Start by setting up secure credential management. Create a configuration class to encapsulate Azure OpenAI settings:


namespace YourApp.AI.Configuration;

public class AzureOpenAIOptions
{
    public string Endpoint { get; set; } = string.Empty;
    public string DeploymentName { get; set; } = string.Empty;
    public string ApiKey { get; set; } = string.Empty;
}

Add your credentials to `appsettings.json`:


{
  "AzureOpenAI": {
    "Endpoint": "https://your-resource.openai.azure.com/",
    "DeploymentName": "gpt-4o-mini",
    "ApiKey": "your-api-key-here"
  }
}

For local development, use .NET user secrets to avoid committing credentials:


dotnet user-secrets init
dotnet user-secrets set "AzureOpenAI:Endpoint" "https://your-resource.openai.azure.com/"
dotnet user-secrets set "AzureOpenAI:DeploymentName" "gpt-4o-mini"
dotnet user-secrets set "AzureOpenAI:ApiKey" "your-api-key-here"

Step 2 — Create an AI Abstraction Service

Build a clean abstraction layer that isolates Azure OpenAI details from your business logic:


namespace YourApp.AI.Services;

using Azure;
using Azure.AI.OpenAI;
using Microsoft.Extensions.Options;

public interface IAIService
{
    Task GenerateResponseAsync(string userMessage, CancellationToken cancellationToken = default);
    Task AnalyzeContentAsync(string content, string analysisPrompt, CancellationToken cancellationToken = default);
    IAsyncEnumerable StreamResponseAsync(string userMessage, CancellationToken cancellationToken = default);
}

public class AzureOpenAIService(IOptions options) : IAIService
{
    private readonly AzureOpenAIOptions _options = options.Value;
    private OpenAIClient? _client;

    private OpenAIClient Client => _client ??= new OpenAIClient(
        new Uri(_options.Endpoint),
        new AzureKeyCredential(_options.ApiKey));

    public async Task GenerateResponseAsync(string userMessage, CancellationToken cancellationToken = default)
    {
        var chatCompletionOptions = new ChatCompletionOptions
        {
            Temperature = 0.7f,
            MaxTokens = 2000,
        };

        var messages = new[]
        {
            new ChatMessage(ChatRole.System, "You are a helpful assistant that provides accurate, concise responses."),
            new ChatMessage(ChatRole.User, userMessage)
        };

        var response = await Client.GetChatCompletionsAsync(
            _options.DeploymentName,
            messages,
            chatCompletionOptions,
            cancellationToken);

        return response.Value.Choices.Message.Content;
    }

    public async Task AnalyzeContentAsync(string content, string analysisPrompt, CancellationToken cancellationToken = default)
    {
        var systemPrompt = $"You are an expert analyst. {analysisPrompt}";
        
        var messages = new[]
        {
            new ChatMessage(ChatRole.System, systemPrompt),
            new ChatMessage(ChatRole.User, content)
        };

        var response = await Client.GetChatCompletionsAsync(
            _options.DeploymentName,
            messages,
            cancellationToken: cancellationToken);

        return response.Value.Choices.Message.Content;
    }

    public async IAsyncEnumerable StreamResponseAsync(
        string userMessage,
        [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        var messages = new[]
        {
            new ChatMessage(ChatRole.System, "You are a helpful assistant."),
            new ChatMessage(ChatRole.User, userMessage)
        };

        using var streamingResponse = await Client.GetChatCompletionsStreamingAsync(
            _options.DeploymentName,
            messages,
            cancellationToken: cancellationToken);

        await foreach (var update in streamingResponse.EnumerateUpdatesAsync(cancellationToken))
        {
            if (update.ContentUpdate != null)
            {
                yield return update.ContentUpdate;
            }
        }
    }
}

Step 3 — Register Services in Dependency Injection

 
 

Configure your services in `Program.cs`:


var builder = WebApplicationBuilder.CreateBuilder(args);

// Add configuration
builder.Services.Configure(
    builder.Configuration.GetSection("AzureOpenAI"));

// Register AI service
builder.Services.AddScoped<IAIService, AzureOpenAIService>();

// Add HTTP client for downstream integrations
builder.Services.AddHttpClient();

builder.Services.AddControllers();
builder.Services.AddOpenApi();

var app = builder.Build();

if (app.Environment.IsDevelopment())
{
    app.MapOpenApi();
}

app.UseHttpsRedirection();
app.MapControllers();

app.Run();

Step 4 — Build REST Intelligence Endpoints

 
 

Create a controller that exposes AI capabilities as REST endpoints:


namespace YourApp.Controllers;

using Microsoft.AspNetCore.Mvc;
using YourApp.AI.Services;

[ApiController]
[Route("api/[controller]")]
public class IntelligenceController(IAIService aiService) : ControllerBase
{
    [HttpPost("analyze")]
    public async Task AnalyzeContent(
        [FromBody] AnalysisRequest request,
        CancellationToken cancellationToken)
    {
        if (string.IsNullOrWhiteSpace(request.Content))
            return BadRequest("Content is required.");

        var analysis = await aiService.AnalyzeContentAsync(
            request.Content,
            request.AnalysisPrompt ?? "Provide a detailed analysis.",
            cancellationToken);

        return Ok(new { analysis });
    }

    [HttpPost("chat")]
    public async Task Chat(
        [FromBody] ChatRequest request,
        CancellationToken cancellationToken)
    {
        if (string.IsNullOrWhiteSpace(request.Message))
            return BadRequest("Message is required.");

        var response = await aiService.GenerateResponseAsync(
            request.Message,
            cancellationToken);

        return Ok(new { response });
    }

    [HttpPost("stream")]
    public async IAsyncEnumerable StreamChat(
        [FromBody] ChatRequest request,
        [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken cancellationToken)
    {
        if (string.IsNullOrWhiteSpace(request.Message))
            yield break;

        await foreach (var chunk in aiService.StreamResponseAsync(request.Message, cancellationToken))
        {
            yield return chunk;
        }
    }
}

public record AnalysisRequest(string Content, string? AnalysisPrompt = null);
public record ChatRequest(string Message);

Step 5 — Enable Agentic Behavior (Tool Calling)

 
 

Create an advanced service that enables the AI to call functions autonomously:


namespace YourApp.AI.Services;

using Azure.AI.OpenAI;

public interface IAgentService
{
    Task ExecuteAgentAsync(string userRequest, CancellationToken cancellationToken = default);
}

public class AgentService(IAIService aiService, IHttpClientFactory httpClientFactory) : IAgentService
{
    public async Task ExecuteAgentAsync(string userRequest, CancellationToken cancellationToken = default)
    {
        var conversationHistory = new List
        {
            new ChatMessage(ChatRole.System, 
                "You are an intelligent agent. When asked to perform tasks, use available tools. " +
                "Available tools: GetWeather, FetchUserData, SendNotification."),
            new ChatMessage(ChatRole.User, userRequest)
        };

        var response = await aiService.GenerateResponseAsync(userRequest, cancellationToken);

        // In production, implement actual tool calling logic here
        // This would involve parsing the AI response for tool calls and executing them

        return new AgentResponse
        {
            InitialResponse = response,
            ExecutedActions = new List(),
            FinalResult = response
        };
    }
}

public class AgentResponse
{
    public string InitialResponse { get; set; } = string.Empty;
    public List ExecutedActions { get; set; } = new();
    public string FinalResult { get; set; } = string.Empty;
}

## Production-Ready C# Examples

Production-Ready C# Enhancements

Retry + resilience using Polly


namespace YourApp.AI.Services;

using Polly;
using Polly.CircuitBreaker;
using Azure;

public class ResilientAzureOpenAIService(
    IOptions options,
    ILogger logger) : IAIService
{
    private readonly AzureOpenAIOptions _options = options.Value;
    private OpenAIClient? _client;
    private IAsyncPolicy<Response>? _retryPolicy;

    private OpenAIClient Client => _client ??= new OpenAIClient(
        new Uri(_options.Endpoint),
        new AzureKeyCredential(_options.ApiKey));

    private IAsyncPolicy<Response> RetryPolicy =>
        _retryPolicy ??= Policy
            .Handle(ex => ex.Status >= 500)
            .Or()
            .OrResult<Response>(r => !r.GetRawResponse().IsError)
            .WaitAndRetryAsync(
                retryCount: 3,
                sleepDurationProvider: attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
                onRetry: (outcome, timespan, retryCount, context) =>
                {
                    logger.LogWarning(
                        "Retry {RetryCount} after {DelayMs}ms due to {Reason}",
                        retryCount,
                        timespan.TotalMilliseconds,
                        outcome.Exception?.Message ?? "rate limit");
                });

    public async Task GenerateResponseAsync(
        string userMessage,
        CancellationToken cancellationToken = default)
    {
        var messages = new[]
        {
            new ChatMessage(ChatRole.System, "You are a helpful assistant."),
            new ChatMessage(ChatRole.User, userMessage)
        };

        var chatCompletionOptions = new ChatCompletionOptions { MaxTokens = 2000 };

        try
        {
            var response = await RetryPolicy.ExecuteAsync(
                async () => await Client.GetChatCompletionsAsync(
                    _options.DeploymentName,
                    messages,
                    chatCompletionOptions,
                    cancellationToken),
                cancellationToken);

            return response.Value.Choices.Message.Content;
        }
        catch (Azure.RequestFailedException ex) when (ex.Status == 429)
        {
            logger.LogError("Rate limit exceeded. Implement backoff strategy.");
            throw;
        }
    }

    public async Task AnalyzeContentAsync(
        string content,
        string analysisPrompt,
        CancellationToken cancellationToken = default)
    {
        // Implementation similar to GenerateResponseAsync
        throw new NotImplementedException();
    }

    public IAsyncEnumerable StreamResponseAsync(
        string userMessage,
        CancellationToken cancellationToken = default)
    {
        throw new NotImplementedException();
    }
}

Content Analysis Pipelines

 
 

namespace YourApp.Features.ContentAnalysis;

using YourApp.AI.Services;

public interface IContentAnalyzer
{
    Task AnalyzeAsync(string content, CancellationToken cancellationToken = default);
}

public class ContentAnalyzer(IAIService aiService, ILogger logger) : IContentAnalyzer
{
    public async Task AnalyzeAsync(
        string content,
        CancellationToken cancellationToken = default)
    {
        logger.LogInformation("Starting content analysis for {ContentLength} characters", content.Length);

        var sentimentTask = aiService.AnalyzeContentAsync(
            content,
            "Analyze the sentiment. Respond with: positive, negative, or neutral.",
            cancellationToken);

        var summaryTask = aiService.AnalyzeContentAsync(
            content,
            "Provide a concise summary in 2-3 sentences.",
            cancellationToken);

        var keywordsTask = aiService.AnalyzeContentAsync(
            content,
            "Extract 5 key topics or keywords as a comma-separated list.",
            cancellationToken);

        await Task.WhenAll(sentimentTask, summaryTask, keywordsTask);

        return new ContentAnalysisResult
        {
            Sentiment = await sentimentTask,
            Summary = await summaryTask,
            Keywords = (await keywordsTask).Split(',').Select(k => k.Trim()).ToList(),
            AnalyzedAt = DateTime.UtcNow
        };
    }
}

public class ContentAnalysisResult
{
    public string Sentiment { get; set; } = string.Empty;
    public string Summary { get; set; } = string.Empty;
    public List Keywords { get; set; } = new();
    public DateTime AnalyzedAt { get; set; }
}

 Common Pitfalls & Troubleshooting

Pitfall 1: Hardcoded Credentials

Problem: Storing API keys directly in code or configuration files committed to version control.

Solution: Always use Azure Key Vault or .NET user secrets:


// In production, use Azure Key Vault
builder.Services.AddAzureAppConfiguration(options =>
    options.Connect(builder.Configuration["AppConfig:ConnectionString"])
        .Select(KeyFilter.Any, LabelFilter.Null)
        .Select(KeyFilter.Any, builder.Environment.EnvironmentName));

 Pitfall 2: Unhandled Rate Limiting

Problem: Azure OpenAI enforces rate limits; exceeding them causes request failures.

Solution: Implement exponential backoff and circuit breaker patterns (shown in the resilient example above).

 Pitfall 3: Streaming Without Proper Cancellation

Problem: Long-running streaming operations don’t respect cancellation tokens, consuming resources.

Solution: Always pass `CancellationToken` through the entire call chain and use `EnumeratorCancellation` attribute.

Pitfall 4: Memory Leaks from Unclosed Clients

**Problem:** Creating new `OpenAIClient` instances repeatedly without disposal.

**Solution:** Use lazy initialization or dependency injection to maintain a single client instance:


private OpenAIClient Client => _client ??= new OpenAIClient(
    new Uri(_options.Endpoint),
    new AzureKeyCredential(_options.ApiKey));

### Pitfall 5: Ignoring Token Limits

**Problem:** Sending prompts that exceed the model’s token limit, causing failures.

**Solution:** Implement token counting and truncation:


private const int MaxTokens = 2000;
private const int SafetyMargin = 100;

private string TruncateIfNeeded(string content)
{
    // Rough estimate: 1 token ≈ 4 characters
    var estimatedTokens = content.Length / 4;
    if (estimatedTokens > MaxTokens - SafetyMargin)
    {
        var maxChars = (MaxTokens - SafetyMargin) * 4;
        return content[..maxChars];
    }
    return content;
}

## Performance & Scalability Considerations

### 1. Connection Pooling

Reuse HTTP connections by maintaining a single `OpenAIClient` instance per application:


// ✓ Good: Single instance
private OpenAIClient Client => _client ??= new OpenAIClient(...);

// ✗ Bad: New instance per request
var client = new OpenAIClient(...);

### 2. Async All the Way

Never block on async operations:


// ✓ Good
var result = await aiService.GenerateResponseAsync(message);

// ✗ Bad
var result = aiService.GenerateResponseAsync(message).Result;

### 3. Implement Caching for Repeated Queries


public class CachedAIService(IAIService innerService, IMemoryCache cache) : IAIService
{
    private const string CacheKeyPrefix = "ai_response_";
    private const int CacheDurationSeconds = 3600;

    public async Task GenerateResponseAsync(
        string userMessage,
        CancellationToken cancellationToken = default)
    {
        var cacheKey = $"{CacheKeyPrefix}{userMessage.GetHashCode()}";

        if (cache.TryGetValue(cacheKey, out string? cachedResponse))
            return cachedResponse!;

        var response = await innerService.GenerateResponseAsync(userMessage, cancellationToken);

        cache.Set(cacheKey, response, TimeSpan.FromSeconds(CacheDurationSeconds));

        return response;
    }

    // Other methods...
}

### 4. Batch Processing for High Volume


public class BatchAnalysisService(IAIService aiService)
{
    public async Task<List> AnalyzeBatchAsync(
        IEnumerable items,
        string analysisPrompt,
        int maxConcurrency = 5,
        CancellationToken cancellationToken = default)
    {
        var semaphore = new SemaphoreSlim(maxConcurrency);
        var tasks = new List<Task>();

        foreach (var item in items)
        {
            await semaphore.WaitAsync(cancellationToken);

            tasks.Add(Task.Run(async () =>
            {
                try
                {
                    return await aiService.AnalyzeContentAsync(item, analysisPrompt, cancellationToken);
                }
                finally
                {
                    semaphore.Release();
                }
            }, cancellationToken));
        }

        var results = await Task.WhenAll(tasks);
        return results.ToList();
    }
}

### 5. Regional Deployment for Low Latency

Deploy your ASP.NET Core application in the same Azure region as your OpenAI resource to minimize network latency.

## Practical Best Practices

### 1. Structured Logging


logger.LogInformation(
    "AI request completed. Model: {Model}, Tokens: {Tokens}, Duration: {Duration}ms",
    _options.DeploymentName,
    response.Usage.TotalTokens,
    stopwatch.ElapsedMilliseconds);

### 2. Input Validation and Sanitization


private void ValidateInput(string userMessage)
{
    if (string.IsNullOrWhiteSpace(userMessage))
        throw new ArgumentException("Message cannot be empty.");

    if (userMessage.Length > 10000)
        throw new ArgumentException("Message exceeds maximum length.");

    // Prevent prompt injection
    if (userMessage.Contains("ignore previous instructions", StringComparison.OrdinalIgnoreCase))
        throw new ArgumentException("Invalid message content.");
}

### 3. Testing with Mocks


public class MockAIService : IAIService
{
    public Task GenerateResponseAsync(string userMessage, CancellationToken cancellationToken = default)
    {
        return Task.FromResult("Mock response for testing");
    }

    public Task AnalyzeContentAsync(string content, string analysisPrompt, CancellationToken cancellationToken = default)
    {
        return Task.FromResult("Mock analysis");
    }

    public async IAsyncEnumerable StreamResponseAsync(string userMessage, [System.Runtime.CompilerServices.EnumeratorCancellation] CancellationToken cancellationToken = default)
    {
        yield return "Mock ";
        yield return "streaming ";
        yield return "response";
    }
}

### 4. Monitoring and Observability


builder.Services.AddApplicationInsightsTelemetry();

// In your service
using var activity = new Activity("AIRequest").Start();
activity?.SetTag("model", _options.DeploymentName);
activity?.SetTag("message_length", userMessage.Length);

try
{
    var response = await Client.GetChatCompletionsAsync(...);
    activity?.SetTag("success", true);
}
catch (Exception ex)
{
    activity?.SetTag("error", ex.Message);
    throw;
}

## Conclusion

You’ve now built a production-grade AI-augmented backend with Azure OpenAI and ASP.NET Core. The architecture you’ve implemented provides:

– **Abstraction layers** that isolate AI logic from business logic
– **Resilience patterns** that handle failures gracefully
– **Scalability mechanisms** for high-volume scenarios
– **Security practices** that protect sensitive credentials
– **Observability** for monitoring and debugging

**Next steps:**

1. Deploy your application to Azure App Service or Azure Container Instances
2. Implement Azure Key Vault for credential management
3. Set up Application Insights for production monitoring
4. Experiment with different models (gpt-4, gpt-4o) to optimize cost vs. capability
5. Build domain-specific agents that leverage your business data
6. Implement fine-tuning for specialized use cases

The foundation is solid. Now extend it with your domain expertise.

—

## Frequently Asked Questions

### Q1: How do I choose between gpt-35-turbo, gpt-4o-mini, and gpt-4?

**A:** This is a cost-vs-capability tradeoff:

– **gpt-35-turbo**: Fastest and cheapest. Use for simple tasks like classification or summarization.
– **gpt-4o-mini**: Balanced option. Recommended for most production applications.
– **gpt-4**: Most capable but expensive. Use for complex reasoning, code generation, or specialized analysis.

Start with gpt-4o-mini and benchmark against your requirements.

### Q2: What’s the difference between streaming and non-streaming responses?

**A:** Streaming returns tokens progressively, enabling real-time UI updates and perceived faster responses. Non-streaming waits for the complete response. Use streaming for user-facing chat applications; use non-streaming for backend analysis where you need the full result before proceeding.

### Q3: How do I prevent prompt injection attacks?

**A:** Implement strict input validation, use system prompts that define boundaries, and never concatenate user input directly into prompts. Instead, use structured formats:


// ✗ Vulnerable
var prompt = $"Analyze this: {userInput}";

// ✓ Safe
var messages = new[]
{
    new ChatMessage(ChatRole.System, "You are an analyzer. Only respond with analysis."),
    new ChatMessage(ChatRole.User, userInput)
};

### Q4: How do I handle Azure OpenAI quota limits?

**A:** Monitor your usage in the Azure Portal, implement request throttling with `SemaphoreSlim`, and use exponential backoff for retries. Consider requesting quota increases for production workloads.

### Q5: Can I use Azure OpenAI with other .NET frameworks like Blazor or MAUI?

**A:** Yes. The Azure.AI.OpenAI SDK works with any .NET application. For Blazor, call your ASP.NET Core backend API instead of directly accessing Azure OpenAI from the browser (for security). For MAUI, use the same patterns shown here.

### Q6: How do I optimize costs for high-volume AI requests?

**A:** Implement caching for repeated queries, batch similar requests together, use gpt-4o-mini instead of gpt-4 when possible, and monitor token usage. Consider implementing a request queue with off-peak processing.

### Q7: What’s the best way to handle long conversations with context?

**A:** Maintain conversation history in memory or a database, but truncate old messages to stay within token limits. Implement a sliding window approach:


private const int MaxHistoryMessages = 10;

private List TrimHistory(List history)
{
    if (history.Count > MaxHistoryMessages)
        return history.Skip(history.Count - MaxHistoryMessages).ToList();
    return history;
}

### Q8: How do I test AI functionality without hitting Azure OpenAI every time?

**A:** Use the `MockAIService` pattern shown earlier. Inject `IAIService` as a dependency, allowing you to swap implementations in tests. Use xUnit or NUnit with Moq for unit testing.

### Q9: What should I do if the AI response is inappropriate or harmful?

**A:** Implement content filtering using Azure Content Safety API or similar services. Add a validation layer after receiving the response:


private async Task IsContentSafeAsync(string content)
{
    // Call Azure Content Safety API
    // Return true if safe, false otherwise
}

### Q10: How do I monitor token usage and costs?

**A:** Log token counts from the response object and aggregate them:


var response = await Client.GetChatCompletionsAsync(...);
var totalTokens = response.Value.Usage.TotalTokens;
var promptTokens = response.Value.Usage.PromptTokens;
var completionTokens = response.Value.Usage.CompletionTokens;

logger.LogInformation(
    "Tokens used - Prompt: {PromptTokens}, Completion: {CompletionTokens}, Total: {TotalTokens}",
    promptTokens,
    completionTokens,
    totalTokens);

Send this data to Application Insights for cost tracking and optimization.

Master Effortless Cloud-Native .NET Microservices Using DAPR, gRPC & Azure Kubernetes Service

Headless Architecture in .NET Microservices with gRPC

AI-Driven .NET Development in 2026: How Senior Architects Master .NET 10 for Elite Performance Tuning

.NET Core Microservices and Azure Kubernetes Service

External Resources

1️⃣ Microsoft Learn – ASP.NET Core Documentation
https://learn.microsoft.com/aspnet/core

2️⃣ Azure OpenAI Service Overview
https://learn.microsoft.com/azure/ai-services/openai/overview

3️⃣ Azure OpenAI Chat Completions API Reference
https://learn.microsoft.com/azure/ai-services/openai/reference

  • « Go to Previous Page
  • Page 1
  • Page 2
  • Page 3
  • Page 4
  • Page 5
  • Go to Next Page »

Primary Sidebar

Recent Posts

  • Modern Authentication in 2026: How to Secure Your .NET 8 and Angular Apps with Keycloak
  • Mastering .NET 10 and C# 13: Ultimate Guide to High-Performance APIs 🚀
  • The 2026 Lean SaaS Manifesto: Why .NET 10 is the Ultimate Tool for AI-Native Founders
  • Building Modern .NET Applications with C# 12+: The Game-Changing Features You Can’t Ignore (and Old Pain You’ll Never Go Back To)
  • The Ultimate Guide to .NET 10 LTS and Performance Optimizations – A Critical Performance Wake-Up Call

Recent Comments

No comments to show.

Archives

  • January 2026

Categories

  • .NET Core
  • 2026 .NET Stack
  • Enterprise Architecture
  • Kubernetes
  • Machine Learning
  • Web Development

Sas 101

Copyright © 2026 · saas101.tech · Log in