Blog – Future Processing
Home Blog FinDataOps Why one-off cloud audits cannot fix a structural cost problem (and how FinDataOps can)
FinDataOps

Why one-off cloud audits cannot fix a structural cost problem (and how FinDataOps can)

Cloud cost audits tell you where money went last quarter. They cannot tell you why a dataset costing £3,000 a month serves one occasional user. Here is what continuous governance looks like.

Share on:

Table of contents

Share on:

The cloud audit came back clean. Tagging coverage at 78%, three idle instances flagged for right-sizing, one unused reserved commitment worth reviewing. The engineers acted on most of it. Six months later, the cloud bill was 11% higher than before the engagement.

That outcome is not unusual. The audit did what it was built to do: it photographed the infrastructure at a point in time, measured it against known benchmarks, and proposed improvements within the existing frame.

The analysts were not wrong, but the frame was.

Key takeaways

  • A cloud audit confirms that spend exists and where it lands. It cannot confirm whether that spend is generating a return - because cost allocation stops at the infrastructure boundary and never reaches the data product level
  • The most expensive problems in a data platform look normal from the outside. Misconfigured retention, inefficient pipeline algorithms, and quality-driven reruns all register as standard activity on an infrastructure report.
  • Tracking absolute cloud spend in a growing data environment is the wrong metric. Unit costs - per pipeline run, per GB stored - expose inefficiency that total figures hide.
  • Cloud cost governance applied four times a year to an environment changing daily is retrospective accounting, not governance.

What a cloud cost audit actually tells you (and where its visibility ends)

A cloud audit maps infrastructure spend: compute, storage, network transfer, services – broken down by team, project, and commitment tier.

It identifies whether resources are provisioned appropriately, whether reserved capacity is actually being used, and whether anything is running with no apparent owner. These are real problems worth addressing, and a well-executed audit finds them reliably.

The boundary is the infrastructure layer. An audit answers “what does this cost and who owns it” – but at the level of an engineering team or project budget, not at the level of a specific data product or business function.

That distinction carries more weight than most cloud finance conversations give it.

Consider a dataset costing £5,000 per month to maintain, process, and serve. From an infrastructure perspective, it is active – being queried, generating reports, passing every health check. The project it belongs to is within budget. The department that owns it shows reasonable cloud spend. Every FinOps dashboard shows green.

What the audit cannot see: that dataset has one regular consumer who also has three other sources of the same information, and queries this one occasionally for reassurance rather than decision-making.

The data product cost gap - why infrastructure billing is not enough

This is the core of what the Black Hole Syndrome describes: costs go in, but explanations do not come out. The billing data is accurate and the cost reporting is detailed.

And yet finance teams frequently cannot explain a significant portion of cloud spending in terms any business stakeholder would find meaningful – because cost allocation stops at the infrastructure boundary and never reaches the data layer.

Most organisations can tell you how much they spent on BigQuery or Databricks last month. Fewer can identify which data products are driving that bill, which products are recovering their cost through business value, and which are candidates for rationalisation.

Research puts the figure bluntly: only 43% of large enterprises can calculate cost-per-data-product. That figure is before asking whether those calculations are connected to any measure of business value, or whether they inform any actual decision.

Cloud governance that stops at the infrastructure layer produces accurate-looking numbers that describe inputs rather than outcomes. A finance team reviewing that data is not equipped to make architectural decisions.

An engineering team reviewing infrastructure utilisation is not asking whether the product it maintains should exist. The question falls between the two, which is why it tends not to get answered.

Whether you need quick insight into cost drivers or a full optimisation roadmap, FinDataOps follows the Triple-A model and adapts to your business priorities.

In just 3 minutes, see if your organisation fits the profile where FinDataOps can reduce data and AI costs, and by how much.

Assess your FinDataOps readiness

A £36,000 problem that never appeared on a cost report

A client had a data lakehouse with time-travel and retention policies configured. On paper, the setup was correct: retention windows defined, historical versions of data marked for expiry after a set period.

In practice, physically removing expired data required a VACUUM process to run. Nobody had automated it, because nobody had noticed.

From the infrastructure side, the picture looked entirely normal: a temporal dataset growing steadily. Incremental growth in a data platform is expected behaviour. Nothing in the billing reports flagged an anomaly.

The problem surfaced during a routine backup transfer. A job that tallies data volumes before copying reported a figure roughly 30 times larger than the logical table sizes would suggest.

Investigation found the mismatch immediately: the table metadata reported around 20 TB of active data; the underlying storage contained 100 TB of historical versions that should have been removed months earlier.

Adding an automated job to trigger VACUUM on the largest data products resolved the issue in approximately two days of engineering time. The saving came to roughly £100 per day – just under £36,000 per year.

A standard cloud cost audit would not have found this. It would have seen a growing dataset in active use and treated it as normal. Identifying the problem required looking at whether the retention mechanism was actually functioning, not merely configured.

A market report for European Technology Leaders

72% of organisations still exceed their cloud budgets and the root cause is structural. This material gives your leadership team a common language and a 7-principle action plan to act on in 90 days.

Three cost drivers that infrastructure audits consistently miss

The retention issue is one recognisable pattern. Others appear frequently enough to be worth naming directly.

Algorithmic inefficiency in pipeline execution. A pipeline processing at quadratic complexity where a linear approach is available runs longer per execution and therefore costs more at scale. The infrastructure records that the pipeline ran and compute was consumed. It does not evaluate whether that compute was used well.

For a daily workload, the cost difference between an efficient and an inefficient algorithm accumulates over months without any alert being triggered – because the usage looks normal.

Data quality failures triggering pipeline reruns. A pipeline processing at £300 per run encounters a data quality issue upstream – missed at ingestion because there are no validation checks in place.

The problem surfaces a week later when a downstream user notices incorrect figures. The fix requires re-executing the pipeline across the affected period: roughly £1,400 in direct reprocessing costs, before any engineering time is counted.

The billing entry reads as normal pipeline activity. There is no category labelled “avoidable rework” on the cloud invoice.

Active pipelines serving no active consumers. Data products built for a use case that has since changed continue to run because nobody explicitly retired them. Scheduled jobs execute, storage grows, compute is charged.

Without a data catalogue connected to actual usage metrics, the platform has no mechanism to flag that the downstream consumer has been gone for six months.

Periodic audits can surface obvious cases; quietly underused assets are harder to detect without observability at the data layer.

All three are detectable - and preventable - with data-layer visibility and appropriate governance. None appear as anomalies on a FinOps dashboard.

KPIs that prove the operating model is working - quarter after quarter

The instinct is to measure success by whether the total cloud bill decreases. For organisations with growing data volumes, that metric produces misleading results: the bill will increase over time because data volumes increase.

The question is whether cost is growing in proportion to the value being generated, or faster than it.

Useful metrics operate at the unit level:

  • Cost per pipeline run. For workloads processing incremental data, this figure should remain broadly stable. A pipeline costing twice as much per execution in Q3 as it did in Q1 – with no corresponding change in data volume – is a signal worth investigating before it compounds.
  • Cost per GB stored. Absolute storage spend will grow with data. Unit cost per gigabyte should not. Where it does, the likely causes are retention failures, duplication, or storage in a tier not matched to the actual access pattern.
  • Forecast accuracy. The proportion of cloud spend that can be predicted and attributed with reasonable precision a month in advance. Low forecast accuracy is not a billing problem – it indicates that cost decisions are being made without visibility into their downstream financial consequences.

Only about a third of companies enforce cost tagging through automated policies, which makes accurate forecasting structurally difficult regardless of how sophisticated the FinOps tooling is.

These metrics shift the budget conversation from "the bill went up again" to "our unit economics are stable, and here is the evidence."

That framing holds up in a CFO review in a way that "cloud spend increased but we have a plan" does not.

What a continuous operating model actually requires

The structural change is treating cloud cost governance as an operating function rather than an intervention. Two things make that concrete.

First, cost ownership needs to be assigned at the data product level, not just the infrastructure level.

Every data product should have an owner who is accountable for whether the cost it generates is proportionate to the value it delivers. That accountability requires tooling that makes costs visible at that granularity – and a process for acting on what the data shows. Without it, cost awareness lives in the finance team and nowhere near the teams making the decisions that drive spend.

Second, governance needs to be embedded in how the platform operates, not reviewed against it periodically.

Tagging enforced at deployment, not checked in a quarterly audit. Retention automated, not dependent on someone remembering to run a job. Anomalies in unit costs triggering alerts, not discovered during a budget review three months later.

Two engagement models deliver this:

  • In a partnership structure, an external team maintains ongoing involvement – active during the initial implementation phase, then in a reduced cadence during maintenance, providing the data-layer perspective and cost-monitoring continuity that internal teams often cannot sustain alongside delivery commitments.
  • In a capability transfer model, the focus is building internal competency so the organisation runs the operating model independently after the engagement concludes.

    Neither model is universally superior. Both require the same underlying commitment: cloud cost governance treated as a permanent feature of how the platform operates, with defined ownership, continuous measurement, and a process for acting on what the data shows.

Turn cloud, data and AI spend into predictable business outcomes.

We help organisations regain visibility over cloud and data spend, improve forecast accuracy, and embed governance directly into delivery workflows.

First decision-ready insights are typically delivered within 10 working days.

Why the audit calendar is the wrong governance structure

The cloud has not failed organisations struggling with unexplained bills. The expectation that periodic reviews can manage a continuous operating environment has.

Cloud spend responds to engineering decisions made daily by teams that often have no visibility into the financial consequences of those decisions.

A 2024 Forrester study commissioned by Boomi confirmed that 72% of organisations exceeded their cloud budgets – not because the technology is unpredictable by nature, but because the governance structures applied to it are too infrequent to intercept the decisions that drive overspend.

The risk a CFO carries is not a rising number. Rising costs in a growing business can be justified. The risk is being unable to explain them – unable to connect a cloud invoice to the products generating revenue, the data products serving those teams, and the architectural decisions that determined the cost structure.

When that connection does not exist, every budget conversation becomes a negotiation based on assumptions rather than evidence.

Governance applied four times a year to an environment changing daily is not governance, but just a retrospective accounting, carried out too late to change anything.

Value we delivered

72

cost reduction after a seamless migration (within a 20-day timescale)

Let’s talk

Contact us and transform your business with our comprehensive services.