Building Products Supplier CSR Reduces Downtime During Database Migration
When migrating business data to a new database platform, companies expect a few technical hiccups, but persevere by keeping their sights on the potential long-term benefits. Business executives bank on cost savings, and IT organizations look forward to simpler, more streamlined back-end environments. Australian building products supplier CSR Limited had high expectations for its data migration and wanted to put a strong plan in place to combat any technical glitches that could affect its SAP environment during the migration effort.
Any sporadic outages in CSR’s SAP environment could seriously impact the business in a number of ways: Sales might suffer if CSR retail outlets are unable to print receipts; daily revenue figures would be negatively affected if overnight billing runs couldn’t complete; and because dispatches come out of the SAP system, product deliveries would get backlogged as shipping at CSR factories would halt.
With 2,000 SAP users across the organization relying on SAP solutions every day, CSR’s IT department wanted to ensure the operational excellence of its SAP system on the new database platform. It did so by implementing a robust and flexible automated monitoring solution that not only tells the IT department when its SAP system is likely to go (or has already gone) down, but detects incidents with non-SAP IT systems as well. It has also helped improve average transaction times across the SAP environment.
Outing the Outages
An SAP customer since 1995, CSR has an experienced SAP team and broad user base. Like most companies, the business is always looking for ways to increase the value of its IT investments and reduce its maintenance costs. To that end, in 2007, the company decided to migrate its SAP data from an Oracle database to a Microsoft SQL database. During the migration, CSR began to experience IT stability issues, including some SAP system outages — the longest outage lasted eight hours and affected a number of business functions.
“Our SAP ERP system is mission-critical to us,” explains Adam Bunn, SAP Basis Team Lead at CSR. “It literally opens our cash register drawers in our retail outlets. And some of our factories dispatch products 24 hours a day. So if the system is down, we can’t process the dispatch paperwork and trucks back up at the docks while we’re forced to manually write out the proper documentation. Those sorts of issues have a direct impact on our ability to do business.”
“Our SAP ERP system is mission-critical to us. If the system is down, we can’t process the dispatch paperwork and trucks back up at the docks while we manually write out the proper documentation. Those sorts of issues have a direct impact on our ability to do business.”
— Adam Bunn, SAP Basis Team Lead, CSR
An outage will be extended if the right people are not alerted early enough. For example, if an outage occurs overnight, it might not be detected until the morning. This delay creates a time crunch to get the right people involved to fix the problem.
CSR needed a monitoring and automation solution that would quickly correlate alerts and incidents for a wider array of technologies beyond SAP applications — namely Microsoft’s System Center Operations Manager (MS SCOM). “We run a lot of Microsoft applications on Windows servers and some Cisco Network technology as well,” Bunn says. “So we were looking for a single solution to integrate with all of our IT systems and provide alerts and incident reports.”
Finding the Proper Alert Balance
CSR went to its SAP account manager to seek guidance on recommended operational management and monitoring solutions. Based on the business’s unique requirements, SAP suggested a trial of SAP IT Process Automation by Cisco, which provided several key advantages. For starters, one important feature was that it would allow CSR to activate only the alerts it wanted to use and customize the thresholds at which alerts are triggered and escalated.
As Bunn explains, the biggest concern was that a monitoring and alerting system would “generate a lot of noise,” providing alerts on issues that didn’t need them while making it difficult to detect important alerts.
“Once we turned our monitoring system on, we didn’t want to get inundated with unnecessary alerts at all hours,” says Bunn. “That would overwhelm our team and could potentially make things worse for us instead of better.” With the SAP IT Process Automation application, CSR could set up alerts to the desired levels — only issues or incidents requiring action would be brought to their attention.
Another major advantage was how easily the application integrated with non-SAP systems, including MS SCOM and various business applications. According to Bunn, a common mistake companies make in implementing a monitoring solution is focusing the monitoring solely on IT infrastructure and not on the applications side. CSR not only uses SAP IT Process Automation to automate the monitoring of its back-end systems, but also sets up best-practice process flows that detect alerts for issues with applications, which can cause serious problems if they go down.
Putting the Processes in Place
Finding the right automated monitoring solution was a big relief for CSR. But receiving an alert about a serious issue is only useful if there is a process in place to address the issue quickly. And that’s where CSR focused its efforts next — defining what best-practice process or reaction each alert should trigger and who should be involved.
“We run a fairly low headcount in our IT organization, so we can’t afford to have people sitting there 24 hours a day looking at alerts,” Bunn says. “Issues need to be managed through an escalation process where the right person is notified quickly, and alerts don’t get hung up in a single person’s inbox.”
To clearly define those processes, CSR developed a matrix that lists each alert, assigns it a priority, indicates where the alert should be sent and if it should be escalated, and provides details on how to handle the incident. For example, today, if transaction response time in the SAP system falls below a certain threshold, a “2 level” priority alert will be sent to the service desk and then escalated to the SAP Basis team as an incident if necessary. The matrix will also indicate that this particular alert is most pressing if it occurs between 6am and 6pm because this is when users are making the most transactions.
In fact, by using SAP IT Process Automation, CSR learned that transaction response times were often an early indicator of system stability issues and could even be a precursor to an outage. The IT team began monitoring the average transaction times in the SAP environment to determine what caused them to vary. Some of the issues that investigation revealed include the following:
- CPU-hogging work processes
- High input/output workload on the storage subsystem
- Poorly coded custom programs
“We realized that long-running transaction response times can be a good indicator of more serious problems with the infrastructure, so we pushed that alert up in terms of importance,” Bunn says.
The Right Combination
According to Bunn, the combination of IT automation and human decisionmaking is what drives business value. “While software tools might help you monitor your systems, it can’t all be automated,” he says. “The human decisions are the most important part. That includes having an initial assessment to determine what to monitor and what not to, what is viewed as critical and what isn’t, and what process kicks off when those critical issues or alerts come up. If you can’t get the people side right, then no software in the world can help you.”
CSR’s IT team continually tries to refine its alerts to match the level with the importance of the incident, ensuring significant issues are identified early and not lost in a sea of less-than-important notifications. Bunn says the SAP IT Process Automation application provides flexibility through customization of alerts that generate incidents.
The immediate effect of the project is that CSR has a more stable IT landscape because it can identify and act on incidents sooner, which increases the value of the business’s IT investments. And the indirect effect has been a substantial improvement in how CSR’s business users rate “system stability” in its annual survey. That rating is closely tied to the business users’ perception of the IT organization and overall operational excellence.
It’s proven to Bunn that the more reliable a company’s IT infrastructure is, the more reliable — and confident — the business is as a whole.
April 01, 2012