Episode 31 — Detect identity anomalies by understanding normal authentication behaviors
Spotting anomalies by knowing what normal identity looks like is one of those skills that feels subtle until you realize it drives the quality of almost every identity investigation. If you do not know what normal looks like for your users, administrators, and service identities, every unusual event becomes either a panic or a shrug. Neither reaction is helpful, because panic creates noise and mistrust, and shrugging creates blind spots that attackers love. In this episode, we focus on the discipline of building mental and operational baselines, then using those baselines to investigate deviations calmly and effectively. The goal is not to memorize every login your organization ever performs, but to understand the typical patterns that make up healthy authentication behavior. Once those patterns are clear, anomalies become easier to spot, easier to prioritize, and easier to explain to stakeholders without turning every investigation into a personal accusation. When you get this right, identity detection becomes more precise and less exhausting.
Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.
Normal authentication patterns differ for users, admins, and services because their work rhythms, privilege levels, and interaction styles are fundamentally different. End users typically authenticate through interactive sessions, often from known devices, with predictable productivity peaks during business hours, and with activity concentrated in a limited set of applications. Administrators authenticate less frequently but perform higher impact actions, often involving privileged consoles, sensitive configuration changes, and elevated permissions that are rarely needed for routine work. Service identities authenticate in ways that reflect automation, such as consistent frequency, predictable endpoints, and stable call patterns tied to scheduled tasks or deployment workflows. These differences matter because an event that is normal for a service identity, like dozens of token requests in a short interval, could be a red flag for a human user. Conversely, a pattern that is normal for a human user, like signing in from multiple networks throughout a day, could be suspicious for an automation identity that should run from a fixed environment. Establishing normal patterns means you start by classifying identity types and then describing what typical looks like for each. Without this separation, detection rules become either too noisy or too permissive, and you end up missing the signals that actually matter.
Baseline signals such as location, device, time, and action types provide a practical way to describe normal authentication without drowning in details. Location can mean geography, region, or network origin, and for many users it is relatively stable with expected variation around travel or remote work. Device signals capture whether the login originates from a known managed device, a new device, or a device profile that does not match the user’s typical footprint. Time includes not only the hour of day but also the rhythm of activity across weekdays and weekends, because many roles have consistent schedules even in global organizations. Action types matter because authentication events are often tied to what the identity is trying to do, such as interactive sign-in, token refresh, privileged elevation, or access to sensitive administrative functions. When you combine these signals, you create a baseline envelope that captures normal variation without needing perfect precision. The baseline is not a cage; it is a boundary of expectations that helps you recognize when behavior falls outside what is typical. These signals are also easy to explain, which matters during triage when you need to justify why an alert is worth attention.
A scenario involving impossible travel and repeated token refreshes illustrates how baseline signals combine into a clearer picture. Impossible travel occurs when the same identity appears to authenticate from two distant locations in a timeframe that does not make physical sense, which can indicate credential reuse by someone else. Repeated token refreshes can show up when a session is unstable, when a device is misbehaving, or when an attacker is attempting to maintain access by continuously renewing tokens. If you see impossible travel plus repeated token refreshes, especially followed by access to unusual applications or administrative functions, the probability of compromise rises sharply. However, you still need to consider the baseline context, because certain network architectures, proxies, and travel patterns can produce confusing location signals. The value of the scenario is that it shows you how anomalies often arrive in clusters rather than as a single definitive indicator. An attacker’s activity tends to create a pattern that spans authentication events, session behavior, and subsequent actions, and that pattern is what you want to recognize quickly. When you view the events as part of a story rather than isolated alerts, you become far more effective at separating true incidents from noise.
One-off travel, automation bursts, and noisy alerts are pitfalls that can distort anomaly detection if you treat every deviation as equally suspicious. One-off travel is common in many organizations, and a legitimate user traveling can trigger location anomalies that look dramatic in isolation. Automation bursts are also common, especially during deployments or scaling events, and they can produce large volumes of token requests and authentication events that are normal for service identities. Noisy alerts happen when detection rules are too sensitive, too generic, or unaware of identity type, leading to frequent false positives that train teams to ignore warnings. These pitfalls matter because they push teams toward alert fatigue, which reduces the chance that a real anomaly receives timely attention. The solution is not to stop alerting, but to tune based on baselines and to treat anomalies as hypotheses that require context. Good detection logic acknowledges expected variability, such as travel windows and deployment cycles, and distinguishes those from patterns that indicate misuse. If you do not account for these pitfalls, you will either waste time chasing normal behavior or you will become numb to the alerts that should matter most.
Grouping events into sessions and narratives is a quick win because it transforms raw logs into understandable behavior. Authentication events are often noisy and repetitive, especially when modern applications use token-based access that generates frequent refresh events. When you group events into sessions, you treat a set of related events as one user story, such as a login from a device, token issuance, and subsequent application access within a bounded timeframe. Narratives then explain what happened in terms of sequence and intent, such as a user signing in, failing a challenge, retrying, and then accessing a set of services. This approach reduces alert volume because you are not reacting to every refresh event, but to the overall session shape, including where it originated and what followed. Session-based grouping also helps you spot anomalies, because unusual sequences and unusual transitions stand out more clearly when the noise is reduced. It also improves communication, because you can explain a narrative to a stakeholder far more easily than you can explain a thousand individual log lines. When identity detection is narrative-driven, investigations become faster and less stressful.
Distinguishing user error from credential theft indicators is a practical triage skill that improves both security and user trust. User error often looks like repeated failed logins, forgotten passwords, or authentication challenges being denied because the user is confused or experiencing device issues. Credential theft indicators often include successful authentication from unusual contexts, repeated token refreshes from unexpected networks, or access to services the user does not normally use, especially when followed by sensitive actions. The difference is often in the combination and the direction of behavior, because user error tends to produce failures and frustration, while credential theft tends to produce successful access and expansion. Another indicator is the speed of activity, because attackers often move quickly once they gain access, enumerating resources and attempting privilege changes. User error also tends to correlate with user reports, such as calls to support or complaints about login issues, while theft may correlate with silence because the legitimate user is unaware. The key is to avoid treating a user as the problem while still treating the event as potentially serious until proven otherwise. When you can separate these patterns, you reduce unnecessary disruption to legitimate work and focus containment actions on scenarios where they are truly needed.
Admin activity needs separate baselines and thresholds because administrative identities operate with higher privilege and lower frequency, which changes what normal looks like and what risk looks like. Administrators might authenticate only occasionally, but when they do, the actions they perform can change access policies, alter logging, modify network exposure, or create new privileged identities. Because the impact is higher, the tolerance for anomalies should be lower, and your detection thresholds should be tighter. Admin baselines also need to account for privileged workflows, such as elevation events, sensitive configuration changes, and access to management consoles that normal users should never touch. It is also common for admin identities to be used from more controlled devices and networks, which means deviations in device or location can be more meaningful. Another important point is that admin activities often happen during incidents or maintenance windows, which can create bursts of activity that are legitimate but still need careful oversight. Separate baselines ensure you do not underreact to admin anomalies by applying user-level expectations. When you treat admin activity distinctly, you align detection and response to the true risk of privileged access.
Correlation between authentication events and sensitive resource access is where identity anomaly detection becomes more than a login problem and starts becoming a security outcome problem. An authentication anomaly that is followed by low-impact activity might warrant monitoring and validation, but an anomaly followed by access to sensitive data stores, identity policy changes, or key management actions should trigger higher urgency. Correlation helps you prioritize because it ties the anomaly to potential consequence, which is a more reliable triage method than anomaly alone. For example, a suspicious login followed by a series of data access events is more concerning than a suspicious login that immediately logs out and does nothing. Similarly, repeated token refresh activity followed by role assumption or permission changes suggests intent to maintain access and expand capability. Correlation also helps investigations because it builds a timeline that can show whether the identity moved from authentication into exploration, escalation, and exploitation. This narrative is what allows responders to decide whether they need containment, how broad the scope might be, and what evidence they should preserve. When correlation is done well, it reduces false urgency while increasing the chance that real incidents are detected early.
Escalation criteria define when anomalies require rapid containment, and without clear criteria teams often hesitate or overreact. Rapid containment is disruptive, so it should be reserved for scenarios where the risk of continued activity outweighs the cost of interruption. Criteria often include high-confidence indicators like impossible travel with successful authentication, admin login anomalies, repeated authentication attempts coupled with access to sensitive resources, or suspicious activity involving new devices and new locations at the same time. Another criterion is evidence of privilege escalation, such as role changes, policy edits, or creation of new credentials, because those actions increase attacker capability and persistence. Escalation also depends on the identity type, because a service identity compromise can affect production systems quickly, while a low-privilege user compromise might be contained with less urgency. Clear criteria help responders act confidently because they are not improvising under pressure; they are following agreed principles. They also help organizations maintain consistency, which builds trust across teams and reduces the feeling that security actions are arbitrary. When escalation criteria are clear, response becomes faster and more effective.
Baseline first, then investigate deviations calmly is a memory anchor that protects teams from both panic and complacency. Baseline first reminds you to start from what is typical for that identity type, so you do not misinterpret normal behavior as malicious. Investigate deviations calmly reminds you that anomalies are signals, not verdicts, and that you should gather context before taking irreversible actions. Calm investigation does not mean slow investigation; it means structured investigation that prioritizes evidence and impact. This anchor also helps maintain professional interactions with users and teams, because it keeps the tone factual rather than accusatory. It encourages the responder to ask what changed, what followed, and whether the behavior aligns with known operational events, while still being ready to contain if indicators point to misuse. Over time, this mindset improves detection quality because responders become better at recognizing patterns and less likely to chase noise. The anchor is effective because it is simple and aligns with how good incident response works, where clear thinking is a competitive advantage.
Anomaly patterns that deserve immediate attention tend to share the theme of high confidence misuse or high impact identity type. Impossible travel paired with successful authentication is high confidence enough to justify rapid validation and often rapid containment, especially when it involves privileged identities. New device and new location appearing together for an admin identity is also high priority, because the combination suggests a new access path that should be verified immediately. Repeated token refresh activity from unexpected networks can indicate stolen token replay, which can sustain attacker access even if passwords change. Sudden spikes in authentication activity for a service identity outside normal deployment windows can indicate misuse or misconfiguration that could quickly affect production. Authentication anomalies followed closely by access to sensitive resources, policy edits, or creation of new credentials should also trigger urgent investigation because they suggest progression toward impact. These patterns stand out because they combine unusual context with meaningful consequence, which is exactly what a responder should prioritize. The objective is not to create a long list of patterns, but to recognize the few that reliably indicate real risk. When these show up, speed and decisiveness matter.
A triage script that says confirm, contain, verify impact provides a spoken structure that helps responders act consistently. Confirm means quickly establishing what the anomaly is, what identity is involved, and whether the context is genuinely unusual based on baseline signals like device, location, and time. Contain means taking proportionate action to stop further risky activity when indicators cross escalation criteria, such as restricting the session, limiting access, or enforcing a stronger verification step. Verify impact means scoping what actions occurred after the anomaly, focusing on sensitive resource access, privilege changes, and any evidence of persistence. This sequence is practical because it prevents two common mistakes: containing without understanding what is happening, and investigating endlessly while an attacker continues operating. It also helps communication because you can describe your steps to stakeholders in a clear order, which builds confidence and reduces confusion. The script does not replace judgment, but it provides a default flow that reduces cognitive load during stressful situations. When teams rehearse this script, they respond faster and with less variation in quality.
Choosing one baseline metric and defining acceptable variance is a practical conclusion because detection improves most when you sharpen one signal at a time. Pick a metric that is meaningful for your environment, such as typical login locations for a privileged group, typical device profiles for administrators, or typical token refresh frequency for a critical service identity. Then define what normal variance looks like, including how you handle expected exceptions like travel, on-call work, or deployment bursts. Acceptable variance should be explicit enough that responders can use it to judge anomalies consistently, but flexible enough that normal business does not constantly trigger alerts. Once variance is defined, you can tune detection thresholds and escalation criteria to match the risk of the identity type involved. Over time, you repeat this process for other metrics, building a baseline library that makes anomaly detection both calmer and more effective. This is how you move from reactive alert chasing to confident identity monitoring grounded in real behavior. Select one baseline metric, define its variance, and you have taken a concrete step toward spotting anomalies by knowing what normal identity looks like.