Insight · 7 min read

The three failure modes of legal AI pilots

Three failure modes account for the majority of legal AI pilots that quietly stall before production rollout. Each is identifiable in the first month. Each has a fix. None of the fixes are technical.

Most legal AI pilots fail in predictable ways. After running a few dozen across in-house legal teams and law firms, we have come to recognise three failure modes that account for the majority of pilots that quietly stall before production rollout. Each is identifiable in the first month. Each has a fix. None of the fixes are technical.

The first failure mode is the absence of a measurable outcome. A pilot is announced. A vendor is selected. Users are onboarded. Six months later, leadership asks “is it working” and the team realises it has no answer that survives scrutiny. There is no baseline against which to compare. There is no operational metric the pilot was supposed to move. There is only a vague sense that people seem to like it. Liking and using are different things. We will not run a pilot now without a written success criterion that names a number, a date, and a measurement method. That sentence is short and easy to write at the start. It is impossible to invent retroactively.

The second failure mode is the absence of a deployment champion with operational authority. The sponsor is usually a senior partner or a general counsel. They have authority over the budget but not over the daily workflow. The pilot lands on a team whose actual workflow lead is mid-level. That mid-level workflow lead has not been asked, has not been resourced, and has six other priorities. The pilot becomes the fourteenth thing on a list of fourteen. We now require a named champion who controls the workflow and a published time allocation (usually one day per week for the first quarter) before scoping a pilot. If the champion cannot be named or cannot be resourced, the pilot is delayed, not started.

The third failure mode is the absence of a governance call before the first prompt is run. The pilot ships, lawyers use it, and three weeks later someone in compliance discovers that confidential client material has been routed through a third-party LLM with retention terms nobody read. The pilot is paused. The pause becomes a meeting. The meeting becomes a committee. The committee writes a policy. The policy is the new bottleneck. Six months later, no one is using the tool. The fix is to run the governance conversation first: data classification, residency requirements, retention terms in vendor contracts, approved-use catalogue, audit cadence. None of this is hard. It just has to happen before the pilot, not after.

There is a fourth failure mode worth naming even though it is downstream of the first three: the absence of a deployment phase. A pilot ends. The vendor sends a renewal quote. Nobody has built the production architecture, integration to existing systems, training programme, or operating cadence. The pilot was never on a path to production. It was always going to be an evaluation, then a slide deck about the evaluation, then a discussion about the slide deck. Real legal AI rollouts require a production phase that is scoped, dated, and budgeted before the pilot starts. Otherwise the pilot will die in handoff.

Our standing recommendation: if you cannot answer four questions in writing before kickoff (what outcome will be measured, who owns the workflow, what is the governance position, what does production look like), do not run the pilot. Run a four-week scoping engagement instead, deliver the answers, then decide whether the pilot is worth running. The cost of a four-week scoping is a fraction of the cost of a six-month pilot that nobody can defend.