Learn more about redprairie discrete WMS and SQL servers work together for your warehouse.

Our clients routinely ask us to help resolve performance issues with their RedPrairie WMS and SQL Server instances. We have worked with every major version of RedPrairie WMS Discrete since Discrete 6.0.2 and every version of SQL Server since SQL Server 2000, and we have seen many of the same common configuration and coding errors cause significant performance degradation. In some cases, these issues have caused complete system lockups.

While we have seen many issues that require advanced SQL Server knowledge and analysis to resolve, sometimes resolutions are as simple as performing common maintenance tasks. In this post, we decided to highlight a few of the most common maintenance and configuration-related issues we have resolved for our clients.

Common RedPrairie WMS and SQL Server Issues

1.) Ineffective/no archiving: You have too much data in your database. RedPrairie WMS is not designed in such a way as to maintain efficiency as data volumes increase. The system may perform well at lower data volumes, but if archiving is neglected and data volumes rise, then system performance will gradually decrease until a trigger point where the server cannot support the system and production outages occur. This point depends on the size of your operation and the size of the server, but every server has a limit and, without archiving, you will likely find it eventually. Neglected archiving is a common reason for system responsiveness to gradually decrease after a go-live period as well. If your system is slow, check to see if your archive jobs are setup. Also, check to see if there is any one table or group of tables that seem to have a significantly higher data volume than the rest of your database.

2.) SQL Server transaction log is not being archived or purged: In its most simplistic storage configuration, SQL Server uses only two files to manage all data: the data file (.MDF) and the log file (.LDF). By default the Log file is configured to grow without bounds by a certain percentage, which it will continue to do as the database logs transactions. Reducing and managing the size of this file requires the configuration of a Transaction Log backup and subsequent purge as a Maintenance Plan in SQL Server. If the Log file grows unchecked, it will eventually reach a size at which the server cannot effectively manage the file. This point varies depending on hardware, but we have seen it happen at several clients who experienced production down time as a result. The largest log file we have ever seen was 787 GB (yes, that’s a G). The client’s server was completely locked up and simply logging into Windows took several minutes.

3.) Missing Indexes: Indexes make searching dramatically faster. Standard RedPrairie includes numerous necessary indexes to keep the system running smoothly. Often, however, a client’s unique operational processes require custom columns or code that uses standard columns differently. Unfortunately, we find that many of our clients experience performance problems because no new indexes were added to the columns being used by the new code. The best way to tell if you have missing indexes is to trace the slow process in MOCA and analyze the trace file for long running SQL. If any single SQL statement exceeds a few hundred milliseconds, it should be considered for analysis unless you already know this query is performing operations on a very large data set. Next, run an explain plan on the slow query (RedPrairie has a really handy command called “sl_explain query” that makes explain plans easy) and search for full table scans in the explain plan. If you see tables with full table scans, that almost always means you need a new index on the join column. (NOTE: It is wise to be careful with indexes because creating an index that should not be created can cause a decrease in performance. Test everything!)

4.) Maximum Degree of Parallelism too High: SQL Server supports parallel processing. In many applications that would be excellent, but in RedPrairie WMS it doesn’t help much and can actually hurt performance. While this seems counterintuitive at first, MOCA is already multithreaded (multiple MOCA Server Processes) and, since users can issue commands at any time, system load is not necessarily consistent or predictable. When SQL Server chooses to parallelize a query, it effectively makes a conscious decision to burn up more CPU time in return for a faster overall query execution. If the system load was static, this would be a good idea, but since RedPrairie WMS supports multiple users with simultaneous execution capability, you could suddenly see a spike in user load. When the spike occurs, the database engine is already using valuable CPU and session resources on previously executed queries, so it cannot respond immediately. The result is that all transactions slow down. Reducing the Maximum Degree of Parallelism to 2 or 1 will leave longer running SQL to execute serially and leave open sessions and CPU resources to handle additional load. The result is that almost all transactions in the system speed up (RF Gun activity included), leaving only the intentionally slow transactions (Pick Release, Allocation, etc.) to run slightly longer. Typically users will perceive the system to be faster for them and the difference is sometimes dramatic.

5.) Disk Contention: Disk contention is an old problem, but still very relevant. A few months ago we had a client experiencing production freezes during the AM hours. The only activity running at that point was Archiving and a few users doing receiving and picking. After looking at everything on the application server, we finally decided to check the database server. They had Prod WMS, Parcel, EMS, Archive, and several other non-RedPrairie databases all running on one drive array. All in all, the SQL Server instance was managing five production databases totaling 150 GB on one RAID 5 array. It was too much and we recommended removing certain databases to other servers, which solved the problem immediately. Obviously a SAN also helps with this situation, but we’ve also seen contention issues with a SAN where one LUN was overloaded. In general, it is a good idea to keep the production WMS database on its own disk array/LUN and, in certain high volume situations, it may even be necessary to put the Data file on a different disk than the Log file. If you are experiencing unexplained production lockups, disk contention is definitely something that should be analyzed and ruled out.

6.) Bad SQL code: The most common issue we see with poor performance and SQL Server is bad code. Bad code usually comes in two forms: long running queries and long running transactions. Long running SQL queries are almost always caused by bad joins that either join on non-indexed columns or do not join two tables completely, resulting in less efficient join processing or a massive Cartesian product. These queries need to be analyzed using the method mentioned above in point 3. The more complicated, and usually more damaging, code problem is a long running transaction that causes table locking and potentially freezes the system. Typically the code responsible for this type of issue is a background job or highly complex custom function that updates multiple tables without a COMMIT. If you are experiencing persistent locking problems, we’ve found one of the best ways to track down the culprit is to analyze the MOCA Console Servers tab by refreshing repeatedly while the issue is occurring. MOCA shows all currently running commands, as well as their Last SQL Execution time. Usually we can find the offending process by looking at what commands are still executing SQL statements and which ones have not executed SQL in quite some time. Even after you have found the offending transaction, it is not always easy to resolve the locking, but this analysis will give you a place to start. This type of issue frequently requires an experienced developer for resolution.

For More Information

Microsoft has written a highly technical, but extremely detailed guide to best practices on Online Transaction Processing (OLTP) (real time user input systems like RedPrairie WMS database design and maintenance. I highly recommend the read if you want to understand more about how to improve SQL Server performance for RedPrairie. NOTE: This article references SQL Server 2005 specifically, but it generally applies to 2008 and 2012 as well: http://technet.microsoft.com/en-us/library/cc966401.aspx

Learn more about Accelogix Cloud here!