Episode 58 — Harden serverless functions to block persistence, reinfection, and silent reuse

In this episode, we harden serverless functions so attackers cannot persist after initial access, because the first compromise is often not the last. In serverless environments, persistence rarely looks like a dropped binary on a host or a hidden scheduled task. Persistence looks like configuration that keeps working for the attacker, code that stays modified, or triggers that let them re-enter whenever they want. The platform’s convenience can become the attacker’s convenience if update permissions are broad, deployment paths are uncontrolled, and changes are not monitored tightly. Your goal is to treat function updates and trigger changes as privileged actions, because they are the mechanisms that determine what runs and when. When you lock those down and monitor them well, you turn serverless from an easy persistence surface into a controlled execution environment. That is how you prevent reinfection, silent reuse, and repeated exploitation of the same weak pathway. The mindset is simple: it is not enough to stop the first abuse; you must remove the ability to come back quietly.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

Persistence, in this context, is maintaining access or repeated execution capability after an attacker has achieved initial foothold. That can mean the attacker retains the ability to invoke a function, retains the ability to change it, or leaves behind a modification that continues to execute without their direct interaction. Persistence can also mean creating a new trigger that they control, so even if you lock down the original trigger, they still have a path. It can mean adding a dependency that calls out to an external service, creating a backdoor behavior that activates under certain input conditions, or changing environment variables so the function leaks data to attacker-controlled destinations. Persistence is dangerous because it turns a one-time event into an ongoing risk. It also increases uncertainty, because the organization may believe the incident is resolved while the attacker still has a reliable way to re-enter. In serverless, persistence is often a configuration problem rather than a runtime artifact problem. That is why persistence prevention is primarily about controlling who can change code, configuration, and triggers, and about being able to detect any changes quickly. If you can keep function definitions trustworthy, you can break the attacker’s ability to return.

Attackers can persist in serverless by altering code, configuration, triggers, or dependencies, and each path has a different detection profile. Code changes are the most direct: if an attacker can update the function code package or inline code, they can add backdoor behavior that runs under normal invocations. Configuration changes can be just as powerful, such as changing environment variables, changing secret references, modifying network settings, or adjusting concurrency and timeouts to create operational instability or to hide behavior. Trigger changes are especially important because they can create new entry points, such as adding an additional event source that the attacker can influence, or enabling a trigger that bypasses normal access controls. Dependency changes are a more subtle path, where the function relies on a library, layer, or external dependency, and the attacker modifies that dependency or points the function at a malicious version. Because serverless platforms often make updates easy, and because teams may allow many people to update configurations during incident response or troubleshooting, these change paths can be exploited quickly. A mature posture assumes attackers will try to secure a return path by changing something that outlives a session. Your goal is to make those changes difficult, visible, and reversible.

A scenario that illustrates silent reuse is an attacker adding a hidden trigger for later access. Imagine a function that processes internal events, and the attacker gains access to a privileged identity that can edit function configuration. Instead of immediately changing the function’s visible behavior, they add a second trigger connected to an event source they can influence, such as a queue, webhook, or storage location that is less monitored. They may also add a small conditional backdoor, such as a specific header value or input key that causes the function to run privileged behavior only when the attacker wants it. The function continues to operate normally for the organization, and the new trigger does not break anything, so it stays unnoticed. Weeks later, the attacker returns and uses the hidden trigger to execute actions, exfiltrate data, or re-establish access even if the original compromised credentials were rotated. This is a classic persistence pattern in serverless because triggers are the entry points and can be added without obvious runtime artifacts. The lesson is that if you are not monitoring for trigger additions and configuration changes, you can declare victory while the attacker still has a door. Persistence is often created by a small configuration change that seems harmless and is easy to miss.

Two pitfalls make this kind of persistence easier than it should be: wide update permissions and uncontrolled deployment paths. Wide update permissions exist when many identities can edit function code, configuration, or triggers, often because those permissions are bundled into broad developer or operator roles. When update permissions are wide, compromise of any one identity can lead to persistent function tampering. Uncontrolled deployment paths exist when function updates can be performed directly through consoles, ad hoc scripts, or multiple pipelines without strong review and without consistent provenance. In such environments, it becomes difficult to distinguish legitimate updates from attacker-driven changes, and it becomes easier for attackers to insert modifications without being detected quickly. Another pitfall is the lack of separation between who can deploy and who can execute, because if the runtime identity can also update code or configuration, then a compromised runtime context can become a control-plane compromise. Finally, teams often underestimate the persistence value of configuration, focusing only on code packages, even though triggers and environment variables can create long-lived control. These pitfalls are common because teams optimize for speed and flexibility, especially early in adoption. Your task is to add governance that preserves speed while removing silent update power.

Quick wins begin by restricting who can update function code, because code updates are the clearest path to persistent malicious behavior. Restricting updates means shrinking the set of identities that can publish new versions, change deployment packages, or modify the function definition. It also means preventing runtime identities from having update permissions, because the ability to execute code should not imply the ability to change code. Restriction is not only about limiting people; it is also about limiting paths. If your environment allows updates through many tools and consoles, you increase the chance of unreviewed changes and you complicate monitoring. Quick wins include requiring updates to go through a controlled deployment mechanism and limiting direct console edits. Another quick win is requiring that high-impact configuration changes, such as adding triggers or changing secret bindings, be restricted to the same small set of deploy roles. The benefit is immediate: fewer identities can create persistence, and changes become easier to track and audit. When you shrink update capability, you reduce the attacker’s ability to make compromise durable.

Separating deploy roles from runtime execution roles is a core design pattern because it creates a clean trust boundary. Deploy roles are used by build and release systems to publish new function versions, attach triggers, and set configuration. Runtime execution roles are the identities the function uses when it runs to access downstream resources like storage, databases, and secret systems. If these roles are not separated, then any compromise of the runtime context could be used to change the function itself, enabling persistence. Separation also improves auditing because you can clearly attribute changes to the deployment system rather than to random interactive sessions. It also supports least privilege because deploy roles need control-plane permissions but do not need broad data access, while runtime roles need data access but should not have control-plane update permissions. Separation also reduces accidental drift because engineers can’t casually update production code through a runtime identity. This pattern is also helpful for incident response, because you can lock down deploy roles and freeze deployments while you investigate, without breaking normal function execution. When deploy and runtime roles are distinct, a serverless environment is easier to govern and harder to persist in.

Integrity checks through controlled pipelines and review requirements are how you ensure that changes that do occur are trustworthy. A controlled pipeline means function code and configuration updates come from a known process with consistent logging, review, and approval. Review requirements ensure that code changes and high-impact configuration changes are reviewed by another qualified person, which reduces both malicious changes and honest mistakes. Integrity checks can include verifying that artifacts come from approved sources, that the build process is reproducible, and that deployment includes a recorded link to the commit or change request that introduced the update. The point is to create provenance, so every running function version can be traced back to a known change and a known reviewer. Integrity controls also reduce reinfection risk because attackers often rely on being able to reintroduce changes after you roll back, either by using the same weak update path or by compromising the pipeline. If the pipeline is controlled and update permissions are tight, reinfection becomes harder. Integrity checks are also a mindset: you assume that if an attacker wants persistence, they will try to modify what runs, and you treat the process of changing what runs as a security boundary. When integrity is enforced, function code becomes more trustworthy and changes become more defensible.

Monitoring for unexpected code changes and trigger additions is essential because even with tight permissions, mistakes and compromises can happen. Monitoring should capture and alert on changes to function code packages, version updates, environment variable changes, trigger additions or removals, and permission changes related to invocation and runtime access. Trigger additions are particularly high signal because they create new entry points, and they often have no business justification in stable production services without a corresponding change record. Monitoring should also capture changes to dependencies, such as changes to referenced layers or packages, because dependency tampering can be a stealthy persistence mechanism. Alerts should include who made the change, when it happened, and what resource was affected, and they should include enough detail to assess whether the change expands exposure or privilege. Monitoring should also include invocation behavior, because a backdoor trigger often results in invocations from unusual sources, at unusual times, or with unusual patterns. If you can correlate configuration changes with new invocation patterns, detection becomes more reliable. Monitoring is your early warning system that persistence might be forming, and it is also your evidence trail if it has already formed. Without monitoring, you may not know a function was altered until damage is visible downstream.

Rollback plans are the recovery mechanism that restores trusted versions quickly, which is critical because persistence prevention is not perfect. A rollback plan should define what the trusted version is, how to redeploy it rapidly, and how to confirm that the environment is now running the trusted version. Rollback also includes reverting triggers and configuration to known-good states, because code rollback alone may not remove a malicious trigger or an altered environment variable. A good rollback plan also defines how to freeze changes during an incident so the attacker cannot race your remediation by reintroducing their modifications. That can include temporarily restricting update permissions, disabling nonessential deployment paths, and requiring higher assurance for changes. Rollback should be tested, because plans that are not tested often fail under pressure, leading to improvisation that can create new security holes. The goal is to be able to return to a trusted state quickly, because speed matters. The longer a compromised function version runs, the more opportunity exists for exfiltration, tampering, or persistence reinforcement. Rollback is also a deterrent because it reduces the attacker’s payoff; if you can restore quickly and reliably, persistence becomes less valuable.

The memory anchor for this episode is restrict updates, monitor changes, recover fast. Restrict updates means limiting who and what paths can change function code, triggers, configuration, and dependencies, so persistence cannot be created casually. Monitor changes means detecting any modifications quickly, especially those that add triggers, broaden invocation, or change runtime permissions. Recover fast means having a tested rollback plan that restores trusted versions and configurations without guesswork. This anchor works because it addresses the three phases of persistence: the ability to make changes, the ability to hide changes, and the ability to keep changes running. If updates are restricted, the attacker’s ability to establish persistence is reduced. If changes are monitored, hidden persistence becomes harder. If recovery is fast, persistence that does occur can be removed quickly before it causes prolonged harm. The anchor is also easy to teach because it maps to common-sense governance: fewer people can change production code, everyone notices changes, and you can undo bad changes quickly. When these practices are in place, serverless environments become significantly harder to persist in.

A mini-review of persistence prevention steps in a simple spoken order helps teams apply the pattern consistently. You start by listing which identities can update function code, configuration, and triggers, and you shrink that set to the minimum required. You ensure that runtime execution identities cannot update the function, and that deploy roles are distinct and tightly controlled. You enforce that changes go through a controlled pipeline with review requirements and traceability to a change record. You enable logging and alerting for any code update, trigger change, environment variable change, and permission change related to invocation or runtime access. You maintain a known-good version baseline and a tested rollback process that can be executed quickly during incidents. You also include periodic validation that test functions are not deployed to production and that any necessary diagnostic functions are locked down as tightly as production code. Finally, you ensure that ownership is clear, so there is no ambiguity about who is responsible for maintaining these controls. This order is intentionally practical because you want it to be applied during platform operations, not only during audits. It can be spoken during reviews and executed as a routine. The goal is repeatability, because persistence prevention is not a one-time effort.

When function tampering is suspected, containment steps should focus on stopping further changes, preserving evidence, and restoring trusted behavior. The first step is to freeze update capability by restricting deploy roles, revoking suspicious sessions, and blocking direct console edits if possible. The second step is to preserve evidence by capturing configuration change logs, deployment pipeline logs, and the current function version and trigger configuration state, including environment variables and dependency references. The third step is to assess scope by determining what changed, when it changed, and whether invocation patterns suggest abuse, such as unusual trigger use or spikes in execution. The fourth step is to revert to a trusted version and configuration using a tested rollback path, and then confirm that the function is running the expected code and that triggers match the approved baseline. The fifth step is to rotate any secrets that might have been exposed, because tampered functions may have logged or exfiltrated credentials. The final step is to restore normal operations carefully, ensuring monitoring is tuned to detect reintroduction of the tampering, and to perform a lessons learned review that tightens the control plane path that allowed the change. The goal is to stop the attacker’s ability to persist and to remove any backdoor they may have created, while keeping enough evidence to understand what happened. Speed and discipline matter because attackers can use tampered functions to escalate quickly.

To conclude, identify one update permission and tighten it today, because narrowing update capability is one of the highest leverage persistence controls. Choose a permission that allows function code updates, trigger changes, environment variable edits, or dependency reference updates, and identify who currently has that permission. Reduce it to the smallest set of deploy roles that are required for normal operations, and ensure those roles are protected with strong authentication and context constraints. Remove update permissions from runtime execution identities and from broad developer roles that do not need production change capability. Then enable or tune alerts so any future use of that permission is visible and reviewable, especially outside normal change windows. Finally, confirm that you have a rollback path that works if a bad change slips through, because tightening permissions reduces risk but does not eliminate it. The decision rule is simple: if an identity can change what a production function runs or how it is triggered without strong checks and visible logging, that identity represents a persistence pathway, so tighten the permission until updates are restricted, monitored, and recoverable.

Episode 58 — Harden serverless functions to block persistence, reinfection, and silent reuse
Broadcast by