The CISO's guide to Threat-Led Penetration Testing - Blog 3: Where TLPT trajectories go wrong, and how to avoid it

📋 For the boardroom A TLPT is a nine-month process with a significant budget attached. The quality of the Red Team matters, and the strict qualification requirements in TIBER-EU exist for good reason. But in our experience, and from what we hear across the industry, the most consequential pitfalls have to do with preparation, governance, and provider selection. This blog identifies the most common pitfalls across all three phases of the TIBER-EU process and explains how to get it right. The decisions made in the first few weeks of a TLPT trajectory often determine whether the findings are strategic or superficial.


Introduction

A TLPT does not fail because the Red Team was not skilled enough. Poor Red Team quality is a real risk, and the strict qualification requirements in TIBER-EU exist for exactly that reason. But in our experience, the more common causes of failure are organizational: the organization was not ready, the wrong provider was selected, the Control Team was overwhelmed, or the findings never made it past the closure meeting into actual change.

We have seen all of these patterns. Some are talked about openly in the industry. Others are not, because admitting them would reflect poorly on the parties involved. This blog covers both.

We have organized the pitfalls by phase. Not because they always occur in isolation, but because understanding where in the process things go wrong makes it easier to prevent them.


Preparation phase

Pitfall 1: Scope that excludes what matters most

The scope of a TLPT is defined in the Scope Specification Document (SSD), the formal document that the Control Team produces to describe the organization's Critical or Important Functions (CIFs), the underlying systems and services that support them, and the flags that the Red Team should attempt to achieve. This document forms the foundation of the entire test: the TIP uses it to develop the Targeted Threat Intelligence Report, and the RTT uses it to build the attack scenarios.

One version of this pitfall is defining scope too conservatively. Organizations sometimes exclude systems they consider too sensitive to test, too complex to include, or too disruptive to target. The Test Manager actively challenges scope decisions and will push back where critical functions are excluded without good reason. In the end, however, the organization has the final say, and that is where the risk lies. The result of an overly narrow scope is a TLPT that does not test what actually matters.

The second, less visible version of this mistake is failing to include third-party suppliers in scope. Supply chain attacks are one of the most consequential attack vectors across sectors, and the financial sector is no exception. If a critical function is partially or fully outsourced to an ICT provider, that provider's systems, processes, and people are a legitimate part of the attack surface. TIBER-EU explicitly requires the inclusion of ICT third-party providers where they underpin critical functions. Including them does increase the complexity of the test. However, as we discussed in Blog 2, a well-executed pre-TLPT strategy builds exactly the relationships and familiarity with third-party environments that make this manageable. Leaving them out does not reduce risk. It just means the test does not find the risk that is actually there.

The fix: the SSD is thorough, honest about the organization's actual attack surface, and includes critical third-party dependencies from the start. This takes more effort and more internal alignment, but it is the only way to make the TLPT meaningful.


Pitfall 2: Underestimating the SSD preparation work

The SSD is not a form to fill in. It requires the Control Team to conduct a structured analysis of which functions are critical, which systems underpin those functions, what a realistic compromise of each system would look like, and what flags the Red Team should achieve as evidence of successful compromise.

Organizations consistently underestimate how much time and internal expertise this requires. A Control Team that lacks operational IT knowledge will struggle to produce an SSD that is specific enough to be useful. If the flags are vague, the threat intelligence will be vague. If the threat intelligence is vague, the scenarios will be generic. And a TLPT built on generic scenarios tells you very little.

Our advice: include someone with hands-on knowledge of your critical systems in the Control Team, not just senior leadership. The CISO and board member are essential for governance and decision-making. But the person who actually knows how your payment infrastructure or core banking system works is the one who can make the SSD specific enough to matter. One important caveat: anyone added to the Control Team for their technical expertise must fully understand the confidentiality requirement. An SME who inadvertently signals to colleagues that something unusual is coming can compromise the entire test.

The fix: keep the Control Team small, but make sure it contains the right expertise. Include someone with hands-on knowledge of your critical systems from the start. And make sure every member of the Control Team fully understands the confidentiality requirement before they are brought in.


Pitfall 3: Leg-ups not arranged in time

Leg-ups are pre-arranged conditions that support the Red Team during the test when needed: a non-traceable laptop, a standard user account, pre-established network access, or specific system privileges that enable the next phase of an attack. They can serve as a starting position for a scenario, but just as often provide a crucial stepping stone mid-test, allowing the Red Team to progress to a more advanced phase of the kill chain without losing momentum.

What makes leg-ups particularly challenging is the secrecy requirement. The people who need to arrange them, IT teams, procurement, facilities, often cannot be told why they are needed or what they will be used for. In well-secured organizations, existing controls and approval processes make it even harder to arrange leg-ups quietly, which is precisely the kind of environment a TLPT is designed to test. That requires careful orchestration by the Control Team, who must find ways to arrange realistic access and devices without revealing that a test is taking place.

While some leg-ups are predictable enough to arrange in advance, the definitive list can only be confirmed after the Targeted Threat Intelligence report is delivered. This is why sufficient time between the TI phase and the start of the Red Team phase is critical. In practice, several weeks are needed to translate the threat intelligence into an attack plan and arrange the required leg-ups. Organizations that set aggressive timelines and push to keep momentum often compress exactly this window, with predictable results.

When leg-ups cannot be arranged on time, organizations sometimes request to pause the test. The Test Manager will typically support this to protect the quality of the assessment. In practice, however, the delay creates real pressure on the Red Team's planning and timeline, and that pressure rarely disappears, it just shifts.

The fix: start early. Use the pre-TLPT period to identify which leg-ups are likely to be needed, map out who needs to arrange them, and build the internal processes to do so discreetly. Build in sufficient time between the TI phase and the start of the Red Team phase. Treat that window as fixed, not as slack to be compressed when timelines get tight.


A TLPT involves professional attackers operating against live production systems. Before that happens, a significant amount of legal groundwork needs to be in place: scoping agreements, liability arrangements, rules of engagement, non-disclosure agreements, and the formal authorization documents that give the Red Team legal cover to operate. In TIBER-EU terms, this includes the "get out of jail" documentation that defines what is permitted and under what conditions.

In our experience, organizations routinely underestimate how long this takes. Legal teams are not familiar with this type of engagement. Procurement processes are not designed for it. And every day spent waiting for legal sign-off is a day that delays the test, compresses timelines, and increases pressure on the phases that follow.

The fix: legal preparation starts in parallel with provider selection, not after it. Bring your legal and compliance team into the process early. They will need time to understand what they are authorizing.


Test phase

Pitfall 5: The Blue Team starts to suspect something

It happens on tests. At some point during a nine-month engagement, someone notices something unusual. An anomaly in network traffic. A ticket that does not look right. A colleague who mentions something offhand.

The question is not whether this will happen. The question is whether the Control Team has a plan for it. Having the head of the Blue Team in the Control Team helps here: incidents naturally flow through them, making it far easier to provide a credible, natural response to any anomaly the Blue Team flags, rather than relying on someone the Blue Team rarely interacts with.

Without a prepared response, the Control Team faces an impossible choice in real time: acknowledge the anomaly and risk compromising the test, or dismiss it and potentially allow the Blue Team to miss a real detection opportunity. Neither is acceptable. The right response, a pre-agreed cover story and escalation path for exactly this situation, needs to be established before the test begins.

The fix: the Control Team has a documented response plan for Blue Team suspicion before the Red Team phase starts. That plan should be reviewed by the Test Manager.


Pitfall 6: Escalation protocols that exist on paper but fail in practice

The Red Team Test Plan (RTTP) is required to include a clear escalation protocol: what happens if the Red Team achieves access to a system that is genuinely critical to business continuity, or if something goes wrong during execution. The Test Manager reviews and approves this before the test begins. In our experience, Red Teams regularly achieve access to systems that are genuinely critical to business continuity during a TLPT, so this is not a hypothetical.

The pitfall is not the absence of a protocol. It is the gap between what is written and what happens in practice. Even when the escalation path is clearly documented and agreed upon, the Control Team Lead may not have sufficient authority to ensure it is followed when the moment arrives. This becomes particularly complex when third-party providers are involved: a decision to pause or stop the test may require coordination across organizational boundaries that were not fully thought through in advance.

The fix: escalation thresholds and decision protocols are documented before the test begins. The CTL has a direct, secure channel to the Test Manager and to senior management where needed. The authority of the CTL to make real-time decisions, including across third-party boundaries, is explicitly agreed upon before the test starts. Daily check-ins between the Red Team and the Control Team are the operational mechanism that keeps this rhythm intact throughout the test.


Pitfall 7: Too rigid adherence to the attack plan

The attack scenarios developed from the threat intelligence are not a script. They are a starting point. Real attackers adapt. They follow the path of least resistance. They change approach when a door is closed and find a window instead.

A Red Team that executes predefined scenarios mechanically, without adapting to what they actually find in the environment, produces findings that reflect the plan rather than the reality. That is not what a TLPT is for.

Adaptation is legitimate and often valuable, but it requires sign-off from the Test Manager. Scenarios that blend together because the Red Team consistently follows the path of least resistance can undermine the structured learning value that three distinct scenarios are designed to provide. The best Red Teams exercise judgment about when to adapt, communicate those adaptations clearly to the Control Team through the daily check-in process, and seek Test Manager approval where the deviation is significant.

The fix: the rules of engagement allow for adaptation within the defined scope. The Red Team documents every deviation clearly. The Control Team is informed in real time.


Pitfall 8: Poor operational logging by the Red Team

The Red Team Test Report is one of the most important documents in the entire TLPT process. It forms the basis of the Purple Teaming sessions, the Remediation Plan, and the attestation. The Purple Teaming sessions depend on the quality of both the Red Team's operational logs and the Blue Team's own detection records. Neither alone tells the full story.

Every action the Red Team takes during the test needs to be timestamped, documented, and traceable. Not because the Test Manager will audit every line, but because the Purple Teaming sessions depend on it. The Blue Team's own detection records are equally important and can supplement gaps in the Red Team's logs, but an undocumented action on either side is a learning opportunity lost.

The fix: the Red Team maintains a detailed operational log throughout the test. Daily check-in reporting to the Control Team keeps this discipline consistent. The logs feed directly into the Red Team Test Report and the Purple Teaming preparation.


Learning and Closure phase

Pitfall 9: Findings that do not land at board level

The Red Team Test Report is a technical document. It describes attack paths, exploitation techniques, detection failures, and flag achievements. It is written by security professionals for security professionals.

The board is not a security audience. They think in risk, continuity, and business impact. The gap between the report and the board conversation is one of the most consistently underestimated challenges in the closure phase. If the CISO cannot translate the technical findings into business language, the board meeting becomes a formality. Leadership nods, the report is filed, and nothing changes.

The fix: the closure phase includes a dedicated board-level presentation that translates findings into business risk and investment terms. A useful approach here is Gold Teaming: a structured session that brings senior leadership into the findings in a way that makes the impact of the Red Team's actions tangible and credible at the decision-making level. Where a Purple Team session shows the Blue Team what they missed, a Gold Team session shows the board what it means.


Pitfall 10: Purple Teaming treated as a checkbox

We have seen this happen: the Purple Teaming sessions are rushed, compressed into a single day, and treated as a formality rather than the highest-value sessions in the entire TLPT process.

The Purple Teaming sessions are where the Blue Team finally learns what happened during the test. The Red Team walks through their actions. The Blue Team maps what they saw, what they flagged, and what they missed. Together, they understand the detection gaps and build a shared picture of where the defenses held and where they failed.

For a TLPT, multiple sessions are typically needed, not just one. The Red Team needs complete logs. The Blue Team needs time to prepare their own observations. Each session needs a structure that allows both teams to go deep on the most consequential scenarios.

Done properly, Purple Teaming does not just close the loop on a single TLPT. It establishes a working relationship between Red and Blue Teams that continues beyond the test. The organizations that extract the most from their TLPT are the ones that use the closure phase as the beginning of an ongoing Purple Teaming program, testing detection and response capabilities quarterly, systematically improving over time.

The fix: Purple Teaming is planned from the start of the trajectory, not organized in the final weeks. The Red Team's operational logs are structured with Purple Teaming in mind. The sessions have dedicated time, a clear structure, and output that feeds directly into the Remediation Plan and a forward-looking testing calendar.


Pitfall 11: No ownership of remediation

The Remediation Plan is a formal TIBER-EU deliverable. The entity documents how it will address the findings from the Red Team Test Report. The Test Manager reviews it. The attestation depends on it.

We have seen this happen: the Remediation Plan is produced as a document but not treated as a commitment. Findings are listed. Owners are named. Timelines are written down. And then the urgency evaporates, because the test is over, the pressure is gone, and the findings are competing with everything else on the security backlog.

The TLPT will come around again in three years. The question is whether the organization that faces the next one is meaningfully more resilient than the one that faced this one. That depends entirely on whether the Remediation Plan becomes real work, not a filed document.

The fix: remediation ownership is established during the closure phase, not after it. Progress is tracked. The ongoing Purple Teaming program creates a natural accountability mechanism: if a finding from the TLPT is still exploitable in the next quarterly scenario, the gap between the plan and the reality becomes visible.


In short

Not every TLPT trajectory faces all of these pitfalls. In our experience, and from what we hear across the industry, the most common ones are practical: leg-ups that cannot be arranged on time, and Purple Teaming sessions that do not receive the time and structure they deserve. The others appear less frequently, but when they do, they are consequential.

The organizations that navigate these challenges well are the ones that treat the TLPT as a genuine resilience exercise from the first day of preparation to the last day of remediation. That starts with choosing the right provider: a Red Team that documents what they do, adapts intelligently to what they find, and treats the closure phase as a beginning rather than an end.

In Blog 4, we will walk through what a well-executed TLPT scenario actually looks like from start to finish, using a realistic example from the financial sector.


Missed Blog 1 or Blog 2? Start with understanding what a TLPT is or how to build your pre-TLPT security testing strategy.

Questions or feedback?