Changing SAP hosting: how to migrate systems

Started by RoyJones, Sep 07, 2022, 07:30 AM

Previous topic - Next topic

RoyJonesTopic starter

Migration of SAP systems is a complex and painstaking process, for the success of which the coordinated work of all participants is important. And if migration is carried out in a short time, the task becomes much more complicated. Not everyone decides to do this. There may be several reasons. For instance, the process itself is lengthy and organizationally complex. Plus, there is a risk of unplanned system downtime. Or clients are not sure that, having survived such an operation, they will receive benefits commensurate with the efforts spent. However, there are exceptions.



Under the cut, we will talk about the difficulties faced by customers in the process of migration and maintenance of SAP systems, discuss why stereotypes do not always correspond to reality, and share a case of how we managed to migrate the customer's systems to a new infrastructure in just over three months.

SAP systems Hosting

Even some five years ago, it was difficult to imagine that customers would massively start using web hosting resources for SAP applications. In most cases, they were implemented on-premise. However, with the development of outsourcing models and the cloud services market, the worldview of customers began to change. What are the arguments influencing the choice in favor of the cloud for SAP?

    For beginners who have just planned the implementation of SAP, cloud infrastructure is almost a standard choice – the scalability of resources for the current need of the system and the unwillingness to divert resources to the development of non-core competencies.
    In companies with a large system landscape, with the help of SAP systems web  hosting, CIOs reach a qualitatively different level of risk management, since the partner is responsible for SLA.
    The third of the most common arguments is the high cost of building infrastructure to implement high availability and DR scenarios.
    Factor 2027 – the vendor announced the termination of support for legacy systems in 2027.
This means transferring the database to HANA, which entails the cost of upgrading and purchasing new computing power.


The SAP hosting market  can now be considered quite mature. And that provides ample opportunities for customers who want to change their web hosting platforms. However, such projects can rightly cause concerns among businesses due to the complexity of the migration procedure. This forces customers to place increased demands on service providers, who must have not only exceptional web  hosting and maintenance competencies for SAP systems, but also successful experience in the field of migration.

What are the difficulties of changing SAP hosting?

Hosting services are different. Inconsistency with the declared level of service, a lot of "buts" and asterisks with reservations in small text, limited resources and capabilities of the web  hosting provider, lack of flexibility in communication with the client, bureaucracy, technical limitations, low competence of technical support specialists, as well as many other nuances are only a small part of the pitfalls that may be encountered customers in the process of operating their business systems in outsourcing infrastructures.
Often, for the client, all that remains in the shadows, in the wilds of a multi-page contract, and pops up already in the process of using the services.

At some point, it becomes obvious to the customer that the level of service he receives is far from his expectations. This is a kind of catalyst for finding solutions to correct the situation and in case of failure, when problems accumulate to the limit and it becomes quite painful, they move to active actions to work out alternative options in the direction of changing the service provider.

Why are they dragging on until the last one? The reason is simple — the process of transferring systems for customers is not always transparent and understandable. It is difficult for the client to assess the actual risks associated with the migration process. We can say that migration for customers is a kind of black box: it is unclear the price, downtime of systems, risks and how to level them, and in general it is dark and scary. Here, after all, if it doesn't work out, then the heads will fly both from the tops and from the performers.

SAP is a corporate—level system, complex and, to put it mildly, not cheap. Decent budgets are spent on their implementation, revision, maintenance, and the vital activity of the enterprise depends on their availability and correct operation. And now imagine the consequences of stopping some large-scale production. These are financial losses, which can be calculated in numbers with a large number of zeros, as well as reputational and other equally significant risks.

Let's analyze the difficulties that may arise at each of the stages in the SAP systems migration case of one of our customers.

Preparation and design

Migration is a formula with many different components. And one of the most important is the stage of designing and preparing the target (new) infrastructure.

We needed to dive into the existing implementation of the systems, their architecture. In the target infrastructure, we have repeated existing solutions somewhere, supplemented and improved at some points, redesigned somewhere, thought through and selected solutions to ensure fault tolerance and availability, and consolidated all resources as much as possible.

During the design process, many different exercises were performed, which eventually made it possible to prepare for migration as much as possible and take into account all sorts of nuances and pitfalls (more on them later).

What we got as a result is an individually designed private cloud infrastructure based on our data center:

    dedicated physical servers for SAP HANA;
    VMware virtualization platform for application servers and infrastructure services;
    duplicated communication channels between data centers for L2 VPN;
    two main storage systems for separating productivity and "everything else";
    IBS based on Veritas Netbackup with a separate server, disk shelf and tape library.


image

And here's how they implemented all that from a technical point of view.

SAP

    For efficient use of storage for productive HANA, shared disks were used without system database replication using SAP tools. All that was wrapped up in an Active-Standby SUSE HAE cluster based on Pacemaker. Yes, the recovery time is a little longer than with replication, but we get twice the storage space savings and, as a result, the customer's budget savings.
    In pre-productive environments, HANA clusters were abandoned, but technically repeated the productive configuration.
    The test and development environments were distributed to several more servers without clusters in the MCOS configuration.
    All application servers were virtualized and hosted in VMware.


Networks

    We physically separated the contours of control networks and productive networks with switch stacks, wrapping the productive ones towards the customer's data center.
    We have laid down a sufficient number of network interfaces so as not to mix large traffic flows.
    To transfer data from the storage, classic FC SAN factories were made.


Storage

    SAP's productive and pre-productive load was left on the all-flash array.
    Developers' test environments and infrastructure services were placed on a separate hybrid array.


IBS

    Made on the basis of Veritas Netbackup.
    We added some built-in scripts to backup MCOS configurations.
    Operational copies were put on a disk shelf to recover quickly, and for long-term storage we use tapes.

Monitoring

    All hardware, OS and SAP were brought under Zabbix.
    We have collected a lot of useful dashboards in Grafana.
    When an alert occurs, Zabbix is able to start an application in the incident management system, we have it implemented on Jira. The information is also duplicated in the Telegram channel.

Infrastructure services

    To service internal namespaces, a cluster of DNS servers was raised, which is synchronized with the customer's servers.
    We made a separate file server for data exchange.
    To store various configurations, Gitlab was added.
    For various Sensitive information, we took HashiCorp Vault.


Migration process

In general, the migration process consists of the following steps:

    preparation of all necessary project documentation;
    negotiations with the current provider — solving organizational issues;
    purchase, delivery and installation of new equipment for the project;
    test migration and debugging of the process;
    transfer of systems, combat migration.


At the end of October, we signed a contract, then designed the architecture, and after its approval with the customer, we ordered the necessary equipment.

What you need to pay attention to first of all is the delivery time of the equipment. On average, the delivery of certified hardware for SAP NAHA that meets the requirements of the software manufacturer for hardware platforms takes 10-12 weeks. And taking into account the seasonality (the implementation of the project fell right on the new year) — that period could increase by another month. Accordingly, it was necessary to speed up the process as much as possible: we worked with a distributor-supplier, agreed on accelerated delivery by aircraft (instead of land and sea routes).

November and December were spent preparing for migration and getting a piece of equipment. We conducted the preparation at a test stand in our public cloud, where we worked out all the basic steps and caught possible difficulties and problems:

    we have prepared a detailed plan for the interaction of project team members with minute-by-minute timings;
    we built a test bench for the database and application servers in much the same way as in the target infrastructure;
    we have set up the necessary communication channels and infrastructure services to check the work of integrations;
    worked out cutover scenarios;
    the cloud also helped us to create pre-configured virtual machine templates, which we subsequently simply imported and deployed in the target landscape.


Shortly before the New Year holidays, the first batch of equipment arrived to us. This made it possible to deploy part of the systems on real hardware. Since not all of them arrived, we connected the replacement equipment, the supply of which we managed to negotiate with the vendor and distributors.
We received the remnants of the target infrastructure already at the final stage.
In order to make it on time, our engineers had to sacrifice the New Year holidays and start work on preparing the target infrastructure on January 2, in the midst of the holidays. Yes, that sometimes happens when it burns and there are simply no other options. At stake was the operability of the systems on which the vital activity of the enterprise depends.

The general order of migration looked like this: first of all — the least critical systems (development landscape, testing landscape), then — productive systems. The final stage of migration took place in late January-early February.

image

The migration process was scheduled to the minute. This is a cutover plan with a list of all tasks, completion time and responsible persons. All the steps had already been worked out on the test migration, so in the combat migration it was just necessary to follow the plan and coordinate the process.

image

Migration was carried out systemically in several stages. There are two systems in each stage.

The result of the three-month sprint was a system fully functioning in the CROC data center. In general, a positive result was obtained thanks to joint work, the contribution and dedication of all participants in the process was maximum.

The role of the customer in the project

It was not easy to communicate with the provider that our client was leaving. It is understandable, they were the last in the list of people interested in the successful completion of the project. The customer took on the task of escalating and pedaling all communication issues and coped with it by 100500%. Special thanks to him for this. Without such a feasible participation in the process, the result of the project could have been completely different.

Due to the formalization of the processes on the side of the "former" provider, specialists were engaged in infrastructure maintenance, literally far from the problems, at that time still their customer. For instance, the process of exporting the same database could take from an hour to five.
Then it seemed that that was some kind of magic, a secret that was never revealed to us. Probably the technical support engineers indulged in meditation between the cases,  engineers without New Year's salads, the customer is crying and suffering...

Project results

The final chord of the migration was the transfer of systems for maintenance.

Now we provide a single window service for customer requests and close the entire scope of tasks for the maintenance of infrastructure components and SAP basis together with a partner — itelligence. The client has been living in a private cloud for six months. Here are the statistics on service cases during that time:

    90 incidents (20% resolved without involving the customer)
were resolved within the framework of SLA – 100%
    Unscheduled system shutdowns – 0
  •  

Charlesth

With HANA replication, the recovery time is actually equal to the time of processing the takeover command on the secondary DB, and usually it is minutes.
In your version, when / if suddenly a productive storage system lies down and does not get up, you will restore the data backup(s) and log backups (and it's good if there is where) and these are definitely not minutes for large databases (what volumes, by the way, are we talking about?). What RTO is prescribed in the SLA with that customer? Is he aware of such a "feature" of the implementation of a productive landscape?
  •  

shane

I didn't see any unique difficulties worthy of topic. You have described a typical migration process. And even worse — you apparently migrated through full backups (judging by the screenshot of the cutover plan), without migrating via standby (there are no mentions), which would allow you to migrate not in hours, but in minutes. Greetings from a colleague who participated in the migration of one of the largest ERP from Europe with a size of 130 Tb. The timeout was about 15 minutes with all the cases, while the channel was 10 gbit with a delay of 70-100 ms
  •