One of the most common questions in early data conversations is also one of the hardest to answer honestly: how long will this take?
The honest answer is: it depends. But that's not particularly useful, so here's a more useful version: a well-scoped, properly run data integration project for a mid-sized company typically takes four to eight weeks for the initial build. Getting to a point where the business can make decisions you couldn't make before usually happens within the first month, if the scope is focused.
What takes much longer, and what most companies don't fully account for, is everything that happens before the build starts.
The part that takes longest isn't the build
The build phase (actually connecting systems, writing transformations, setting up pipelines, testing the output) is usually the fastest part of the project. Good engineers can move quickly once they have what they need.
What slows everything down is access, clarity, and scope.
Access means getting connected to the systems involved. This sounds trivial and isn't. ERPs, CRMs, and legacy systems often require IT tickets, vendor approval, or security reviews before an external party can connect to them. In large organisations, this alone can take weeks. In smaller ones, it's faster, but there's often no clear person to ask. Getting this sorted before the project starts, not during it, is one of the most effective things you can do to shorten the timeline.
Clarity means knowing what you actually want the integration to do. This is where vague briefs cause the most damage. "We want our CRM and ERP to be connected" is not a sufficient specification. Which data flows in which direction? What does a match look like when a CRM company name doesn't exactly match the ERP account name? What happens when a deal is partially refunded? What's the latency requirement, real-time sync or daily batch? Every unanswered question during scoping becomes a conversation that stops the build.
Scope means agreeing on what's in and what's out. Scope creep kills data projects more reliably than technical problems. The instinct to add "while we're in there, can we also..." is natural and understandable, and it's the single fastest way to double the timeline. A project with tight scope and clear deliverables almost always delivers on time. A project where the definition of done keeps expanding almost never does.
A rough breakdown by phase
These numbers aren't guarantees. But here's what a reasonable project looks like in phases.
- 01Discovery and audit: one to two weeks. This is the data health audit phase, where you map the current state, identify the sources, assess quality, and define the problem precisely. If you've already done this work, you can skip it.
- 02Design and alignment: three to five days. This is where the technical design is documented and agreed: what connects to what, what the transformation rules are, what the output looks like, what tests will confirm it's working.
- 03Build and test: two to four weeks, depending on the number of sources and complexity of transformations. The simpler the scope, the faster this moves.
- 04Deployment and handoff: three to five days. Getting the pipeline running in production, setting up monitoring, making sure whoever owns it internally understands how it works and what to do when something goes wrong.
For a focused two-system integration with a clean scope, you're looking at five to seven weeks start to finish. For a more complex multi-source project, eight to twelve weeks is realistic. For anything involving significant legacy systems, on-premise infrastructure, or heavily customised ERPs, add time for access and technical complexity.
What causes projects to run long
The honest answer, based on how these projects actually go, is almost never the technical work. It's almost always one of these four things.
- Delayed access to systems. The project is ready to move and IT hasn't approved the connection yet. The most effective mitigation is starting the access request process as early as possible, ideally before the contract is signed.
- Changing requirements mid-build. Someone realises midway through that they actually need a different output than what was specified. This isn't malicious, it's what happens when specifications are written abstractly rather than concretely. The fix is working through enough real examples during scoping that everyone can see what the output will look like before building it.
- Unclear ownership on the client side. Someone needs to make decisions about transformation rules, approve designs, or provide access to information, and it's not clear who that person is. Data projects need a clear owner on the client side with the authority to make calls.
- Data quality that's worse than expected. The audit should surface this, but sometimes there are surprises once you're inside the systems. This is an argument for doing the audit properly before committing to a build timeline, not an argument for skipping the audit because it adds time.
What you can do to make it faster
Start the access process before the project formally begins. Have a clear decision-maker on your side. Do the audit first. Define what "done" looks like in concrete terms before the build starts, ideally with a real example of the output you expect.
These aren't complicated things. They're just discipline. And they make a bigger difference to timelines than any technology choice.