Most failed cloud migrations don't fail in the cloud. They fail in the planning, where teams skip discovery, lift-and-shift a tangle of undocumented dependencies, and discover three months in that their TCO went up, not down. This guide assumes you've decided to move to AWS and walks through the sequencing that actually de-risks the work: assess, build the landing zone, plan in waves, move data carefully, cut over with a rollback in hand, then optimise.

Discovery and assessment first

You cannot migrate what you cannot see. Before any workload moves, you need three artefacts: an application portfolio, a dependency map, and a TCO baseline.

Agentless discovery (AWS Application Discovery Service, or third-party tools like Cloudamize/Migration Evaluator) collects CPU, memory, IOPS and — critically — network flow data. Network flows are how you discover the undocumented chatter: the batch job that hits a database nobody owns, the licence server three racks over.

# Inventory exported from Discovery Service into Migration Hub
aws discovery start-export-task \
  --export-data-format CSV \
  --filters name=AgentId,values=o-1234567890,condition=EQUALS

aws discovery describe-export-tasks --export-ids export-abc123 \
  --query 'exportsInfo[].configurationsDownloadUrl' --output text

Run discovery for a minimum of two to four weeks. A single week misses month-end batch peaks and weekly reporting loads — exactly the workloads that blow your right-sizing assumptions.

The TCO baseline must include the costs people forget on-prem: power, cooling, rack space, hardware refresh amortisation, hypervisor licensing, and the staff time spent patching firmware. Compare like-for-like against a right-sized cloud estate, not a like-for-like VM count. Most over-provisioned on-prem VMs map to instances one or two sizes smaller.

The 7 Rs: choosing a strategy per application

There is no single migration strategy — there is one per application. The 7 Rs framework forces that decision explicitly.

Strategy Effort Benefit When to use
Retire Minimal Removes cost & attack surface App is unused or duplicated; ~10–20% of portfolios are dead weight
Retain Minimal Avoids forced rework Mainframe, hard compliance pin, or app being decommissioned soon
Relocate Low Fast exit, no refactor VMware estates moving to VMware Cloud on AWS as-is
Rehost Low Quick datacentre exit Lift-and-shift via MGN; large volume, tight deadline
Replatform Medium Managed-service wins Move self-managed DB to RDS; tomcat to ECS without code changes
Repurchase Medium Drops maintenance burden Replace self-hosted CRM/email with SaaS
Refactor High Cloud-native scaling & cost Strategic apps where serverless/containers pay back the rework

The pragmatic pattern: rehost the long tail to hit the datacentre exit date, then replatform/refactor the strategic 20% afterwards. Trying to refactor everything up front is how migrations slip by a year. AWS MGN (Application Migration Service) is the default rehost engine — block-level replication into a staging area, then test-launch and cut over.

# Mark source servers ready, then launch test instances for validation
aws mgn start-test --source-server-id s-0a1b2c3d4e5f
# After validation, the production cutover launch:
aws mgn start-cutover --source-server-id s-0a1b2c3d4e5f

Landing zone before workloads

Do not migrate a single production server into a hand-built account. Build the landing zone first — a multi-account organisation with guardrails baked in — so that every workload lands in a governed environment.

AWS Control Tower provisions the baseline: an organisation with separate accounts for security/log archive/audit, plus account factory for vending workload accounts with SCPs (Service Control Policies) attached. Separate accounts per environment (prod/non-prod) per business unit is the standard blast-radius boundary.

Network topology is the part people under-design. Use a Transit Gateway as the hub rather than a mesh of VPC peers — peering does not scale past a handful of VPCs and offers no transitive routing.

resource "aws_ec2_transit_gateway" "hub" {
  description                     = "central-hub"
  auto_accept_shared_attachments  = "enable"
  default_route_table_association = "disable" # explicit routing per segment
  default_route_table_propagation = "disable"
}

resource "aws_dx_gateway" "main" {
  name            = "onprem-dxgw"
  amazon_side_asn = 64512
}

resource "aws_dx_gateway_association" "tgw" {
  dx_gateway_id         = aws_dx_gateway.main.id
  associated_gateway_id = aws_ec2_transit_gateway.hub.id
  allowed_prefixes      = ["10.20.0.0/16"]
}

Hybrid connectivity is the umbilical cord during the migration. Direct Connect gives consistent latency and bandwidth for bulk data movement and the dual-running period; a Site-to-Site VPN over the internet is the cheaper fallback and the resilient backup path. Order Direct Connect early — physical cross-connects can take weeks, and that lead time sits on your critical path.

Establish overlapping CIDR hygiene before you connect anything. On-prem 10.0.0.0/8 ranges colliding with VPC CIDRs is the single most common reason hybrid connectivity silently fails to route.

Wave planning and the migration factory

Group applications into waves by dependency cluster — never split a tightly-coupled app and its database across waves. A wave is everything that must move together because they chat over low-latency links.

The migration factory is the repeatable assembly line: a standard runbook, automated tooling (MGN/DMS), and a fixed cadence (e.g. one wave per fortnight). Early waves should be low-risk, low-dependency apps — they shake out the process. Save the crown-jewel database for when the factory is humming.

Data migration mechanics

Match the tool to the data shape and the available window:

Tool Best for Notes
DataSync File/NFS/SMB bulk + ongoing sync Online; throttle to protect the link
Snowball Edge TBs–PBs over poor links Physical shipping; days of latency
Storage Gateway Hybrid file/volume during transition Keeps on-prem cache, backs to S3
DMS Databases (homogeneous & hetero) Full load + CDC for near-zero downtime
# DataSync task: throttle to avoid saturating Direct Connect
aws datasync create-task \
  --source-location-arn arn:aws:datasync:eu-west-2:111122223333:location/loc-onprem \
  --destination-location-arn arn:aws:datasync:eu-west-2:111122223333:location/loc-s3 \
  --options BytesPerSecond=104857600,VerifyMode=POINT_IN_TIME_CONSISTENT

The rule of thumb: if shipping the data over your link takes longer than shipping a Snowball physically, ship the Snowball. A 100 TB dataset over a 1 Gbps Direct Connect is roughly ten days of saturated link — Snowball wins.

Cutover and rollback

Every cutover needs a defined go/no-go, a freeze window, and a rollback that has been tested, not merely documented. Keep the source running and replicating until you have signed off post-cutover validation — typically 24–72 hours of dual-running. The rollback is simply: stop directing traffic to AWS, point DNS back, source is still authoritative.

A rollback you have never executed is a hope, not a plan. Rehearse it in the test launch.

Post-migration optimisation

Lift-and-shift gets you out of the datacentre; it does not get you the cloud's economics. The optimisation phase is where the business case is actually realised:

  • Right-size against real CloudWatch data after two weeks of production load, not discovery estimates.
  • Commit to Savings Plans / Reserved Instances only once usage has stabilised — premature commitment locks in oversized instances.
  • Replatform the long tail opportunistically: self-managed databases to RDS/Aurora, static assets to S3+CloudFront, batch to Fargate.

Security and compliance from day one

Guardrails belong in the landing zone, not as a retrofit. Enforce encryption at rest and in transit via SCPs, centralise findings in Security Hub, enable GuardDuty across the organisation, and ship all CloudTrail logs to a write-once log-archive account.

aws organizations create-policy --type SERVICE_CONTROL_POLICY \
  --name deny-unencrypted-ebs --content file://deny-unencrypted-ebs.json

Treating security as a wave-zero deliverable — before the first production workload lands — is far cheaper than auditing it back in once you have 200 servers running.


Planning an on-prem exit and want a partner who has run the factory before? i2zone designs landing zones, builds the migration factory, and de-risks your cutover end to end — get in touch.