Security Architecture Review
Start with the Attacker, Not the Architecture
Here is a real interview question from a fintech company's Senior Staff loop: "You are reviewing a new service that accepts credit card payments and stores transaction history. Walk me through your security review."
A weak answer starts listing controls: "I would add TLS, encrypt the database, use OAuth2 for auth." That is a grocery list, not a security review.
A strong answer starts with the threat: "The most valuable asset in this system is the stored card numbers. An attacker who reaches the transaction database has everything they need for fraud. So my first question is: does this service actually need to store card numbers, or can we tokenize them through Stripe or Adyen and never hold PAN data ourselves? That decision alone determines whether we are in PCI-DSS scope or out of it, which changes the entire architecture."
That opening does three things. It shows you think about security in terms of what an attacker wants, not what controls you can add. It makes a concrete architectural recommendation. And it connects the security decision to a business outcome (PCI-DSS scope reduction).
How to Walk Through a Threat Model
When the interviewer gives you a system to review, draw the data flow first. Literally ask: "Can I sketch the data flows?" Then trace the path of the most sensitive data through the system. At each boundary crossing, ask yourself the STRIDE questions, but do not robotically list all six for every component. Focus on the threats that actually matter for this system.
For a payment service, the high-priority threats are:
Information disclosure at the data layer. Can the database be accessed from services that should not touch payment data? Is the data encrypted at rest with customer-managed KMS keys, or is the cloud provider's default encryption good enough? (For PCI, you need explicit key management.)
Spoofing at the service boundary. How does the API gateway verify that the calling service is who it claims to be? mTLS between services, or JWT with short-lived tokens? If JWT, where is the signing key stored and how is it rotated?
Elevation of privilege through the admin path. Who can query the transaction database directly? Is there a break-glass procedure, or does every SRE have standing access? At Coinbase, they implemented just-in-time access through Teleport, where production access requires approval and automatically expires. That is the caliber of detail interviewers want.
Good Answer vs. Bad Answer: Auth Design
Question: "Design service-to-service auth for 50 microservices."
Bad answer: "I would use JWT tokens for service-to-service communication. Each service would validate the token and check permissions."
This is bad because it hand-waves the hard parts. Where do tokens come from? How are they scoped? What happens when a token is compromised?
Good answer: "I would use mTLS for transport-level authentication, because it gives us mutual identity verification without application code changes. For authorization, I would centralize policy in OPA (Open Policy Agent) with per-service policy bundles. Each service mesh sidecar (Envoy via Istio or Linkerd) handles the mTLS handshake, and OPA evaluates whether service A is allowed to call service B's specific endpoint. The policy is version-controlled in Git and deployed through CI. For user-context propagation, I would pass a signed JWT from the edge gateway that interior services can read but not forge, with a 5-minute expiry and no refresh. If we need to revoke access fast, we push a policy update to OPA rather than trying to invalidate tokens."
The difference: the good answer names specific tools, explains the layering between transport auth and application auth, and addresses token lifecycle.
Zero-Trust in Practice
Zero-trust sounds clean in a slide deck. In practice, migrating an existing infrastructure to zero-trust is a multi-quarter project full of surprises. The interviewers who ask this question have lived through it.
The phased approach that actually works:
-
Inventory. You cannot secure what you cannot see. Map every service-to-service communication path. At Shopify, they discovered 40% more internal API calls than anyone expected during this phase.
-
mTLS in permissive mode. Deploy service mesh sidecars that establish mTLS but do not enforce it. Log all non-TLS traffic. This gives you a baseline without breaking anything.
-
Alert on violations. Start flagging services that communicate without mTLS. Give teams a deadline to fix their configurations. This is where organizational buy-in matters more than technology.
-
Enforce per-service. Move services to strict mode one at a time, starting with the least-connected services. Save the high-fanout services (like your API gateway) for last because they have the most dependencies to validate.
The mistake most candidates make is describing zero-trust as a state you achieve, not a process you execute. Talk about the migration path, not just the end state.
Operational Security: The Differentiator
Technical controls get most of the airtime in interviews, but operational security is where Staff+ candidates separate themselves. Bring up these questions before the interviewer does:
Secrets management. Are secrets in environment variables (bad), in a secrets manager like HashiCorp Vault or AWS Secrets Manager (better), or injected at runtime with automatic rotation (best)? How often are database credentials rotated? At many companies the answer is "never," which is a finding you should call out.
Access governance. When an engineer leaves the company, how quickly is their access revoked? Is production access role-based with automatic expiration, or does it accumulate over time? Discuss the concept of access reviews and just-in-time elevation.
Audit trails. Can you reconstruct who did what and when? Are audit logs immutable and stored separately from the systems they monitor? If an attacker compromises a service, can they also delete the evidence?
These questions show you understand security as an ongoing practice, not a one-time design exercise. That is exactly the mindset interviewers are evaluating at Senior Staff level.
Sample Questions
Walk me through how you would conduct a security architecture review for a new service that handles payment data. What would you look for?
This tests your ability to think about security systematically. Interviewers want to see a structured threat modeling approach, not a checklist of security features.
Design an authentication and authorization system for a microservices architecture with 50 services. How do you handle service-to-service auth?
Auth architecture is a common deep-dive topic. Strong answers discuss the trade-offs between centralized and distributed authorization, token propagation, and the challenges of service mesh auth.
How would you implement zero-trust networking in an existing cloud infrastructure? What are the biggest challenges?
Zero-trust is a buzzword, but implementing it is genuinely hard. Interviewers want to see practical knowledge of mutual TLS, identity-based access, and the migration challenges from perimeter-based security.
Evaluation Criteria
- Applies a structured threat modeling methodology (STRIDE or equivalent) rather than ad-hoc security thinking
- Discusses security at multiple layers: network, application, data, identity
- Shows understanding of the trade-offs between security controls and developer velocity
- Demonstrates practical knowledge of authentication and authorization patterns (OAuth2, OIDC, JWT, mTLS)
- Addresses the operational side of security: key rotation, certificate management, audit logging
Key Points
- •The strongest security interview answers start with 'What is the most valuable thing an attacker could reach from here?' not with a checklist of controls to apply.
- •Threat modeling is about prioritization. STRIDE gives you categories, but your job is to rank the threats by likelihood and impact, then focus your design on the top three.
- •Most real-world breaches exploit misconfigurations, not sophisticated attacks. Discussing automated config validation (Open Policy Agent, AWS Config Rules) signals practical experience.
- •Zero-trust migrations fail when teams try to enforce mTLS globally on day one. The winning pattern is permissive mode first, alerting second, enforcement last, service by service.
- •If you discuss encryption but skip key management, you have designed half a system. Who rotates keys? What happens when a KMS region goes down? That is where the hard problems live.
Common Mistakes
- ✗Listing security features (WAF, encryption, MFA) without connecting them to specific threats. Controls without threat context are just checkboxes.
- ✗Designing security in isolation from developer experience. A security model that engineers routinely bypass because it slows them down is worse than no model.
- ✗Ignoring the blast radius question. If one service is compromised, what else can the attacker reach? Lateral movement is the real risk in microservices.
- ✗Forgetting compliance as a design constraint. PCI-DSS scope, SOC2 audit trails, and GDPR data residency rules shape architecture in ways you cannot retrofit.