The majority of enterprise cloud migrations that fail do not fail because the technology does not work. They fail because the migration was planned at too high a level of abstraction, the data migration was underestimated, or there was no mechanism to control costs after the migration completed. Understanding why migrations fail is the most useful starting point for planning one that does not.
Why Cloud Migrations Fail
The lift-and-shift trap is the most common failure mode. A lift-and-shift migration moves an application to the cloud without modifying it, running it on a virtual machine that mimics the on-premises server. The result is a cloud environment that costs more than the data center it replaced, does not benefit from cloud elasticity, and requires the same operational model as before. The migration captured none of the cloud's actual advantages.
Data migration underestimation is pervasive. Teams plan the application migration carefully and treat data migration as an afterthought. In practice, migrating a large relational database across cloud environments involves database compatibility assessment, schema migration, data volume transfer (terabytes over a network takes time), referential integrity validation, application cutover coordination, and rollback planning. Each of these steps takes longer than planned when you encounter the realities of production data.
Cloud cost governance failures are visible three to six months after migration when the first post-migration cloud bills arrive. Development teams left free to provision compute and storage generate costs that were not budgeted. Without automated cost controls (budget alerts, idle resource detection, right-sizing recommendations), cloud bills grow unboundedly.
The 6 Rs Framework
The 6 Rs framework categorizes every application in your portfolio by the appropriate migration strategy. Applied rigorously, it prevents the mistake of treating all applications the same.
- Rehost (lift-and-shift). Move the application to cloud infrastructure without modification. Fastest and cheapest upfront; does not capture cloud economics. Appropriate for applications approaching end-of-life or applications where business agility requirements are low.
- Replatform (lift-and-optimize). Move to the cloud with targeted modifications that capture some cloud advantages, such as moving from a self-managed database to a managed cloud database service. Captures meaningful cost and operational benefits without full refactoring.
- Repurchase. Replace an on-premises application with a SaaS equivalent. Appropriate when a commercial SaaS product meets your requirements and the total cost of ownership (including migration) is favorable.
- Refactor (re-architect). Redesign the application to be cloud-native. The highest cost and longest timeline upfront; the best long-term economics and capabilities. Appropriate for strategic applications with high scalability requirements.
- Retire. Decommission the application because it is no longer needed or is being replaced by another system. Often underestimated: a typical enterprise portfolio review finds 10 to 20 percent of applications that can be retired.
- Retain. Keep the application on-premises for now. Appropriate for applications with regulatory constraints, applications in the middle of major upgrades, or applications where the migration cost exceeds the benefit.
Cloud-Native vs Cloud-Compatible
Cloud-compatible means the application runs in the cloud. Cloud-native means the application was designed to exploit the cloud's specific characteristics: horizontal scaling, managed services, infrastructure as code, immutable deployments, and the consumption-based pricing model.
A cloud-compatible application running on fixed-size VMs does not scale automatically during traffic peaks, does not benefit from managed service economics, and requires the same operational overhead as an on-premises deployment. A cloud-native application scales to match load, uses managed services for databases and messaging and caching, is deployed through automated pipelines, and costs proportionally to actual usage.
The long-term cost implication is significant. Our cloud and DevOps engineering practice consistently finds that cloud-native architectures run at 30 to 60 percent lower cost than equivalent cloud-compatible architectures when the application has variable load patterns.
Application Portfolio Assessment
Applying the 6 Rs systematically requires assessing every application in your estate before making migration decisions. The assessment captures: business criticality, technical complexity, data sensitivity, integration dependencies, compliance requirements, current operational cost, and estimated migration cost. Each application gets a recommendation from the 6 Rs framework based on this assessment.
The output is a prioritized migration roadmap that sequences applications based on their strategic value, migration complexity, and dependencies on other migrations. Applications with many dependencies migrate last. Applications that are quick wins (Retire or Repurchase) migrate first to reduce the portfolio size and build team confidence.
Data Migration: The Hardest Part
Database migrations require understanding the source database engine, the target database engine, schema compatibility between them, and how to handle any incompatibilities. Migrating from an Oracle database to Aurora PostgreSQL is a common scenario; the data types, stored procedures, and SQL dialect differences between the two require careful testing.
Data volume determines the migration approach. Small databases (under a few hundred gigabytes) can be migrated with full dumps during a maintenance window. Large databases (multiple terabytes) require change data capture approaches that keep the target database synchronized with the source during the migration period, enabling a cutover that only requires catching up the final minutes of changes.
The cutover itself is the highest-risk moment. A well-designed cutover plan includes: a detailed runbook with step-by-step instructions, clear go/no-go criteria at each checkpoint, a tested rollback procedure, and a communication plan for stakeholders. Teams that have rehearsed the cutover in staging have significantly better outcomes than those executing it for the first time in production.
Security in the Cloud
The shared responsibility model defines what the cloud provider secures (physical infrastructure, hypervisor, managed service internals) and what the customer secures (data, identities, network configuration, application security). Many security incidents in cloud environments occur because organizations assumed the cloud provider was responsible for security that is actually the customer's responsibility.
IAM configuration is the highest-priority security concern in any cloud environment. The principle of least privilege requires that every identity (human or service) has only the permissions required for its specific function. Overpermissive IAM policies are the most common source of cloud security vulnerabilities.
Encryption at rest and in transit should be configured by default for all data stores and communication paths. Cloud providers make this straightforward; the operational discipline is ensuring it is enforced through policy controls rather than relying on individual engineers to configure it correctly.
Cloud Cost Management: FinOps Basics
FinOps is the discipline of managing cloud costs through visibility, accountability, and optimization. The basics: tag every resource with the team and application it belongs to (enabling cost attribution), set budget alerts that notify before costs exceed targets, review utilization reports monthly to identify oversized or idle resources, and evaluate reserved instance commitments annually.
Compute right-sizing alone typically reduces cloud compute costs by 20 to 30 percent in the first year after migration, because development teams tend to provision generously and rarely downsize. Automated right-sizing recommendations from cloud provider cost tools require only human review and approval to capture most of this saving.
Multi-Cloud vs Single Cloud
Multi-cloud strategies (using two or more cloud providers for different workloads) add operational complexity: separate tooling, separate networking, separate IAM models, and teams that need expertise in multiple environments. The operational cost premium for multi-cloud is real.
Multi-cloud is justified when specific workloads have clear affinity with a specific provider (Google Cloud for BigQuery-based analytics, AWS for specific ML services), when regulatory requirements mandate geographic distribution across providers, or when vendor lock-in risk management is a board-level priority. For most organizations, a primary cloud with a secondary provider for specific services is a more pragmatic approach than full multi-cloud.
AI Workload Considerations
Teams migrating to the cloud to enable AI workloads have additional infrastructure considerations. GPU compute availability varies by region and by instance type; plan capacity requirements before committing to a cloud region. The data and AI platforms that support AI workloads require low-latency access to training data, which has implications for data storage tier and network architecture choices.
For enterprise and public sector organizations with sovereignty requirements, AI model training and inference must occur within specific geographic boundaries. Not all cloud regions support all AI services, and some advanced AI capabilities are not available in sovereign cloud configurations. Verify AI service availability in your required regions before designing AI workload architecture.
A realistic 12-month migration timeline for a 200-application enterprise estate includes: three months of portfolio assessment and roadmap development, nine months of phased migration execution beginning with the lowest-complexity applications. At month twelve, 40 to 60 percent of applications are migrated, the highest-complexity applications are in active migration, and the FinOps function is managing costs on the already-migrated estate.