SASE Day-2 Operations: Monitoring, Troubleshooting, and Tuning
Day-2 ops is where SASE deployments succeed or fail. Set up DEM baselines in week one. Review TLS bypass list monthly — it only grows unless you prune it. Run quarterly policy audits against your application inventory. Track three metrics: P95 latency to top 10 SaaS apps, policy exception count, and mean time to resolve access issues. If any metric trends wrong for 2 consecutive weeks, investigate immediately.
Every SASE vendor sells you the deployment. Nobody talks about what happens on day 31. The platform is live, users are flowing through SSE PoPs, SD-WAN tunnels are up, and the project team moves to the next initiative. Six months later, the TLS bypass list has grown from 12 entries to 47, half the DLP policies fire so many false positives that the SOC ignores them, and nobody can explain why latency to Salesforce increased by 40ms last Thursday. Day-2 operations is the discipline that prevents this entropy. It is not glamorous, but it is the difference between a SASE deployment that delivers ongoing value and one that slowly decays into an expensive pipe.
Establish baselines in week one
The single most important day-2 task is establishing performance baselines before the project team disbands. Use your DEM (Digital Experience Monitoring) tooling to capture baseline metrics for your top 10 SaaS applications by user count: P50 and P95 latency, DNS resolution time, TLS handshake time, and time-to-first-byte. Also capture baseline metrics for your top 5 private applications accessed through ZTNA. Store these baselines in a shared document or dashboard that the operations team can reference for the next 12 months.
Why baselines matter: when a user reports that Salesforce is slow, you need to compare current P95 latency against the baseline. If the baseline was 180ms and current is 185ms, the problem is not SASE. If current is 340ms, something changed — a PoP routing issue, a policy change that added inspection overhead, or an ISP path change. Without baselines, every performance complaint becomes a finger-pointing exercise between the network team, the security team, and the SaaS vendor.
Weekly operational cadence
| Task | Frequency | Owner | Tool |
|---|---|---|---|
| Review DEM latency dashboards for top 10 apps | Weekly | Network ops | DEM dashboard (ThousandEyes, Zscaler DEX, Netskope Proactive DEM) |
| Review DLP incident queue — triage true positives | Daily → weekly | Security ops | SSE DLP dashboard |
| Review CASB shadow IT report — new unsanctioned apps | Weekly | Security ops | CASB discovery dashboard |
| Check tunnel health across all branch sites | Weekly | Network ops | SD-WAN orchestrator |
| Review posture non-compliance — devices failing policy | Weekly | Endpoint team | ZTNA posture dashboard |
| Audit TLS bypass list — remove entries no longer needed | Monthly | Security ops | SSE policy console |
| Full policy audit against application inventory | Quarterly | Security architect | SSE policy console + CMDB |
| Failover testing — controlled link kills at 2-3 sites | Quarterly | Network ops | SD-WAN + SSE dashboards |
TLS bypass list management
The TLS inspection bypass list is the single biggest source of security debt in SASE deployments. During deployment, every application that breaks under TLS inspection gets added to the bypass list as a quick fix. Six months later, you have 40-50 domains bypassing inspection, and nobody remembers why half of them were added. Some of those domains may be serving traffic that should be inspected — certificate pinning issues that the application vendor fixed in a subsequent update, or applications that were replaced by web-based alternatives.
Implement a monthly review process: export the bypass list, check each entry against a justification document (why was it added? what breaks with inspection enabled?), and test re-enabling inspection for entries older than 6 months. Many will pass without issues because the underlying application was patched or upgraded. Set a target: bypass list should contain fewer than 20 entries for a typical enterprise. If yours is above 40, you have a hygiene problem.
Every bypass entry should have an owner (the application team responsible), a justification (the specific technical reason inspection breaks), an expiration date (when to re-test), and a remediation plan (what needs to change for inspection to work). Without this documentation, the bypass list becomes permanent — and every bypassed domain is a domain where your DLP, malware scanning, and URL filtering are blind.
Troubleshooting slow applications
When a user reports slow application access through SASE, follow this diagnostic sequence:
- Check DEM end-to-end path visualization. Identify which segment is slow: endpoint to PoP, PoP inspection latency, PoP to application, or application response time. This immediately narrows the investigation from 'SASE is slow' to a specific segment.
- Compare current latency against baseline. If within 10% of baseline, the problem is likely not SASE. Direct the user to the application team or their local ISP.
- Check for recent policy changes. A new TLS inspection rule, DLP policy, or URL category change can add latency. Correlate the user's complaint timeline with the policy change log.
- Check PoP health. If multiple users at the same site report slowness, check if the primary SSE PoP is degraded. Look at the vendor's status page and your tunnel health metrics.
- Check SD-WAN path selection. If the issue is site-specific, the SD-WAN may have failed over to a secondary path with higher latency. Check path selection logs and WAN link health.
- Packet capture as last resort. If the above steps do not identify the issue, capture at the endpoint and at the ZTNA connector to compare. Look for TCP retransmissions, TLS handshake failures, or DNS resolution delays.
Policy tuning after deployment
SASE policies are not set-and-forget. Application landscapes change, new SaaS tools are adopted, departments reorganize, and threat patterns evolve. Run a full policy audit quarterly. Compare your SSE policy set against your current application inventory: are all applications covered? Are there policies for applications that were decommissioned (orphaned rules)? Are DLP patterns still aligned with your data classification scheme?
The most common post-deployment policy issues: DLP false positive fatigue (overly broad regex patterns that flag legitimate business documents — tune the patterns, do not disable DLP), CASB shadow IT noise (hundreds of low-risk apps generating alerts — create a sanctioned app list and only alert on truly risky categories like file sharing and AI tools), and ZTNA posture drift (devices gradually falling out of compliance as OS updates lag — work with the endpoint team on enforcement timelines, not just reporting).
Operational metrics dashboard
Build a single-pane dashboard with these metrics. Review it weekly as a team:
| Metric | Target | Red flag |
|---|---|---|
| P95 latency — top 10 SaaS apps | Within 15% of baseline | > 25% above baseline for 2+ days |
| SSE tunnel uptime | > 99.9% | Any tunnel below 99.5% in a week |
| Policy exception count (TLS bypasses) | < 20 | > 40 or growing month-over-month |
| DLP true positive rate | > 60% | < 30% (noise is drowning real incidents) |
| ZTNA posture compliance | > 95% of devices | < 85% (too many non-compliant devices accessing apps) |
| Mean time to resolve access issues | < 30 minutes | > 2 hours average |
| SD-WAN path failover events per week | < 5 per site | > 20 per site (unstable WAN links) |
Sources & further reading
- Gartner, "Best Practices for SASE Operations" — gartner.com/reviews/market/single-vendor-sase
- Zscaler, "Digital Experience Monitoring Best Practices" — zscaler.com/products/digital-experience-monitoring
- Cisco ThousandEyes, "Network Troubleshooting Guide" — thousandeyes.com/resources
- Netskope, "SSE Policy Management Guide" — netskope.com/products/security-service-edge
Frequently asked questions
Related on sase.cloud
How to build managed SASE services: multi-tenant architecture, vendor MSP readiness, per-tenant isolation, licensing, an...
Phase-by-phase guide to migrating from MPLS to SD-WAN: circuit planning, overlay deployment, application-aware routing, ...
Structured framework for a SASE proof of concept: success criteria, test scenarios, evaluation scorecard, common PoC tra...
One email per publish. Unsubscribe anytime.