Reference

SASE Architecture Diagram Explained: Traffic Flow and Inspection

January 14, 20269 min readUpdated Feb 2026

TL;DR

SASE architecture routes all traffic through globally distributed PoPs that run a single-pass inspection pipeline: TLS decryption, SWG, CASB, DLP, and ZTNA in one pass. SD-WAN handles path selection at the branch. The key architectural decisions are PoP proximity (latency), inspection depth (throughput), and management plane convergence (operational cost).

SASE architecture is a cloud-delivered network and security framework where all user and site traffic is routed through globally distributed Points of Presence (PoPs) that apply a unified inspection pipeline of DNS security, SWG, CASB, ZTNA, FWaaS, and DLP before forwarding traffic to its destination. The architecture follows a sources-to-PoP-to-destinations model: traffic originates from users (endpoint agents), branch offices (SD-WAN tunnels), or cloud workloads (cloud connectors), enters the nearest PoP for inspection and policy enforcement, and exits toward internet destinations, SaaS applications, or private applications via ZTNA connectors. Understanding this traffic flow is essential for troubleshooting, capacity planning, and policy design.

Traffic sources: where connections originate

Remote Users (Endpoint Agent)

The SASE endpoint agent installed on laptops, desktops, and mobile devices is the primary traffic source for remote and office-based users. The agent operates at the network stack level, intercepting DNS queries, HTTP/HTTPS traffic, and non-web TCP/UDP flows. When the user's device boots up and the agent initializes, it authenticates the user against the configured identity provider (Okta, Microsoft Entra ID, Ping Identity) using SAML or OIDC, then establishes an encrypted tunnel to the nearest SASE PoP based on anycast DNS resolution or geographic PoP selection. The agent steers traffic based on routing rules: web traffic goes to the SWG, private application traffic goes to ZTNA, and all other traffic goes to FWaaS. Some traffic may be excluded via split-tunnel rules, typically for latency-sensitive applications like video conferencing where the performance penalty of proxy inspection is unacceptable.

Branch Offices (SD-WAN Tunnels)

Branch offices connect to the SASE PoP through IPsec or GRE tunnels originating from SD-WAN edge appliances (Cisco Catalyst 8000, Fortinet FortiGate, Palo Alto ION) or standard routers. These tunnels carry all site traffic, covering every device on the branch LAN without requiring per-device agent installation. The SD-WAN appliance performs application identification at the branch edge, applies local QoS policies, and selects the best WAN transport (MPLS, broadband, LTE/5G) for each application based on real-time path quality metrics. Traffic destined for the internet or SaaS applications is forwarded through the tunnel to the PoP for security inspection. Direct internet breakout from the branch without PoP inspection is configurable but generally discouraged for security-critical traffic.

Cloud Workloads (Cloud Connectors)

Applications running in IaaS environments (AWS, Azure, GCP) connect to the SASE fabric through cloud connectors, which are lightweight virtual appliances or containers deployed in your cloud VPCs. Cloud connectors register with the SASE control plane and establish outbound tunnels to the nearest PoP. This enables two key functions: first, the PoP can apply security inspection to traffic between users and cloud-hosted private applications; second, workload-to-workload traffic between different cloud environments or between cloud and on-prem can be routed through the PoP for east-west inspection and policy enforcement.

The PoP: where inspection happens

The Point of Presence is the core of SASE architecture. Each PoP is a fully self-contained instance of the entire SASE inspection pipeline, capable of processing traffic from source to destination without depending on any other PoP. PoPs are deployed in colocation facilities (Equinix, Digital Realty, NTT) or on hyperscaler infrastructure (GCP, AWS) with direct peering to major ISPs and cloud providers. The number and distribution of PoPs directly determines user-experienced latency: more PoPs in more locations means shorter geographic distance between users and inspection points.

Layer 1: DNS Security

DNS resolution is the first inspection point. Before a TCP connection is established, the DNS query is evaluated against threat intelligence feeds. Known command-and-control domains, newly registered domains (less than 30 days old, statistically correlated with malicious activity), DGA (domain generation algorithm) patterns, and DNS tunneling signatures are blocked at this layer. DNS security is the fastest and lowest-overhead inspection because it operates on a single small UDP packet before any connection is established. Blocking at the DNS layer prevents the TCP handshake from ever occurring, which means no bandwidth is consumed by the blocked connection and no further inspection resources are needed.

Layer 2: SWG (HTTP/HTTPS Inspection)

For HTTP and HTTPS traffic, the SWG performs TLS termination using the organization's deployed root CA certificate. The SWG presents its own certificate to the client, decrypts the traffic, and inspects the cleartext payload. The inspection pipeline within the SWG layer includes URL categorization (checking the destination against a database of classified URLs), reputation scoring (evaluating the domain's age, registration data, hosting infrastructure, and historical behavior), content filtering (blocking or allowing based on content category, file type, and MIME type), antivirus scanning (signature-based detection against known malware), and behavioral analysis (heuristic detection of suspicious payload characteristics that do not match known signatures).

Layer 3: CASB (SaaS Application Control)

Traffic identified as destined for a SaaS application passes through the inline CASB engine. CASB applies application-aware policies that understand the semantics of each SaaS application's API and URL structure. For Microsoft 365, the CASB distinguishes between viewing a document, downloading it, sharing it externally, and exporting it. For Salesforce, it distinguishes between viewing a record, running a report, and performing a bulk data export. This granularity enables activity-level controls: allow viewing but block downloading, allow internal sharing but block external sharing, allow uploading to the corporate tenant but block uploads to personal accounts. The CASB also enforces tenant restrictions, ensuring users access only the organization's corporate SaaS tenant rather than personal accounts.

Layer 4: ZTNA (Private Application Access)

When a user requests access to a private application (one hosted in the corporate data center or IaaS, not on the public internet), the ZTNA broker at the PoP evaluates the request against zero trust policy. It verifies the user's identity (authenticated via the IdP), checks the device's posture (OS version, patch level, disk encryption, EDR status, firewall state), evaluates contextual signals (geographic location, time of day, network type), and then either grants or denies access. If granted, the broker establishes an encrypted micro-tunnel between the user's device and the specific application, mediated through the ZTNA connector deployed inside the application's network. The user never receives network-level access to the application's environment and cannot discover or communicate with any other system.

Layer 5: FWaaS (Non-Web Traffic Inspection)

Non-HTTP traffic (SSH, RDP, SMB, database protocols, DNS, SMTP, and custom applications) passes through the FWaaS engine. FWaaS performs stateful packet inspection, tracking connection state and validating protocol compliance. The application identification engine classifies traffic by its behavioral characteristics rather than relying on port numbers, correctly identifying applications even when they run on non-standard ports. The IPS engine applies vulnerability signatures and behavioral detections to identify exploits, command-and-control traffic, and lateral movement attempts. For DNS traffic specifically, FWaaS applies DNS security checks including DNS tunneling detection, DNS over HTTPS (DoH) handling, and malicious domain blocking.

Layer 6: DLP (Content Inspection)

DLP operates as a cross-cutting layer that inspects content at every other inspection point. Within the SWG pipeline, DLP scans web uploads and form submissions. Within the CASB pipeline, DLP inspects SaaS uploads and data transfers. Within ZTNA tunnels, DLP examines data flowing to and from private applications. Within FWaaS, DLP inspects non-web file transfers. The DLP engine applies pattern matching, exact data matching, ML classification, and OCR across all of these channels using a single unified policy. This cross-cutting design is the fundamental advantage of SASE-integrated DLP over standalone DLP products that only see one channel.

Traffic destinations: where connections go

Internet (Direct Egress)

Traffic destined for general internet websites exits the PoP through direct peering with ISPs and content delivery networks. The PoP re-encrypts the inspected traffic with a new TLS session to the destination server and forwards it. The response traffic returns through the PoP, is inspected again for threats (malware in downloads, exploit payloads in responses), and is forwarded to the user through the established tunnel.

SaaS Applications (Optimized Egress)

Traffic destined for major SaaS applications often exits through optimized peering arrangements between the SASE vendor and the SaaS provider. For example, many SASE PoPs have direct peering with Microsoft's network for M365 traffic, with Salesforce's infrastructure, and with Google's edge network. This peering reduces the number of network hops between the PoP and the SaaS application, improving performance compared to generic internet routing.

Private Applications (ZTNA Connectors)

Traffic destined for private applications exits the PoP through ZTNA connectors. The connector is a lightweight agent running inside the application's network (on-prem data center, cloud VPC) that maintains a persistent outbound tunnel to the PoP. Traffic from authorized users is forwarded through this tunnel to the specific application endpoint. Because the connector initiates the tunnel outbound, no inbound ports are opened on the application's network. This is the dark cloud model: the application has no internet-facing attack surface.

Single-pass vs. multi-pass inspection

A critical architectural distinction among SASE vendors is whether the inspection pipeline processes traffic in a single pass or multiple passes. In a single-pass architecture (used by Palo Alto and Fortinet), the traffic is decrypted once and simultaneously evaluated by all security engines in parallel: URL filtering, threat prevention, application identification, DLP, and CASB policies are all applied on the decrypted payload in a single processing cycle. In a multi-pass architecture, the traffic moves sequentially through separate inspection stages, potentially being decrypted, re-encrypted, and decrypted again at each stage. Single-pass is faster (lower latency per transaction) and more efficient (less compute overhead) but requires tight integration between all security engines. Multi-pass is easier to build (each engine operates independently) but adds cumulative latency.

When evaluating SASE architecture, ask the vendor to trace a single HTTPS request through their entire inspection pipeline, from endpoint agent to destination server. Ask how many times the traffic is decrypted and re-encrypted, how many separate processing stages it passes through, and what the measured per-transaction latency is at the PoP. The answers reveal whether the architecture is truly integrated or a chain of loosely coupled services.

High availability and failover

SASE PoPs are designed for high availability with no single point of failure. Within a PoP, the inspection pipeline runs across multiple redundant compute nodes, and the loss of any single node is handled transparently by load balancing traffic to surviving nodes. Between PoPs, the endpoint agent maintains awareness of multiple PoPs and automatically fails over to the next-nearest PoP if the primary PoP becomes unreachable. SD-WAN tunnels from branch offices similarly maintain secondary tunnel endpoints to backup PoPs. The failover time from primary to secondary PoP is typically 10-30 seconds for endpoint agents and sub-second for SD-WAN tunnels using pre-established backup tunnels.

The ZTNA connectors inside your network also maintain connections to multiple PoPs for redundancy. If a connector loses connectivity to its primary PoP, it fails over to a secondary PoP, and user sessions are re-established through the new path. For branch offices, SD-WAN appliances maintain tunnels to two or more PoPs simultaneously, with traffic distribution based on performance metrics and automatic failover on link degradation or failure.

Sources & further reading

Gartner, "Defining SASE Architecture" — gartner.com/reviews/market/single-vendor-sase
Cloudflare, "What is SASE? Secure Access Service Edge" — cloudflare.com/learning/access-management/what-is-sase
Cisco, "SASE Architecture White Paper" — cisco.com/c/en/us/solutions/enterprise-networks/sase
Palo Alto Networks, "Prisma SASE Architecture" — paloaltonetworks.com/sase
NIST SP 800-207, "Zero Trust Architecture" — nist.gov/publications/zero-trust-architecture

Frequently asked questions

No. Local area network traffic between devices on the same subnet is not routed through the SASE PoP. The endpoint agent only intercepts traffic destined for external networks (internet, SaaS, private applications in other locations). Printer traffic, local file server access, and local network discovery remain on the LAN. However, some organizations configure ZTNA to also broker access to local applications for zero trust consistency.

The endpoint agent and SD-WAN tunnels automatically fail over to the next-nearest PoP. Most SASE vendors maintain a PoP selection list on the agent that includes multiple alternatives ranked by latency. Failover typically completes within 10-30 seconds for endpoint agents and sub-second for SD-WAN tunnels with pre-established backup paths. During failover, active sessions may experience a brief interruption depending on the application's tolerance for connection drops.

At a well-provisioned PoP, the inspection pipeline adds 5-15 milliseconds of processing latency per transaction. The more significant latency factor is the geographic distance from the user to the PoP. If the nearest PoP is 50 miles away, the network transit adds minimal latency. If the nearest PoP is 2,000 miles away due to coverage gaps, the transit latency dominates. This is why PoP count and geographic distribution are critical evaluation criteria.