Posts Tagged ‘vplex’

Solaris 11 on ESX – Serialized Disk IO bug causes extreme performance degradation #vexpert

Wednesday, March 29th, 2017

In this post, I discuss a newly found performance bug in Solaris 11, that has since Solaris 11 came out in 2011, severely hampered ESX VM disk i/o performance when using the LSI Logic SAS controller. I show how we identified the issue, what tools were used, and what the bug actually is.

In Short:

A bug in the disk controller driver ‘mpt_sas’ as used in Solaris 11, as used by the VMware virtual machine ‘LSI Logic SAS’ controller emulation, was causing disk I/O to only be handled up to 3 i/o at a time.

This causes severe disk i/o performance degradation on all versions of Solaris 11 up to the patched version. This was observed on Solaris 11 VMs on  vSphere 5.5u2, but has not been tested on any other vSphere version.

The issue was identified by myself and Valentin Bondzio of VMware GSS, together with our customer, and eventually Oracle. Tools used: iostat, esxtop, vscsiStats

The issue was patched in patch# 25485763 for Solaris, and in Solaris 12

Bug Report ( Bug 24764515 : Tagged command queuing disabled for SCSI-2 and SPC targets  ) :

Link to Oracle Internal

KB Article: (Solaris 11 guest on VMware ESXI submit only one disk I/O at a time (Doc ID 2238101.1) ) :

Link to Oracle Internal


TLDR below:


EMC VPLEX VS2 to VS6 seamless, non-disruptive hardware upgrade

Tuesday, February 28th, 2017

This post describes our experience with upgrading from EMC VPLEX VS2 to VS6 hardware, in a seamless non-disruptive fashion.

EMC VPlex is a powerful storage virtualization product and I have had several years of experience with it in a active-active metro-storage-cluster deployment. I am a big fan. Its rock-solid, very intuitive to use and very reliable if set up correctly. Check out these 2 videos to learn what it does.

Around August 2016, EMC released VPLEX VS6, the next generation of hardware for the VPLEX platform. In many aspects it is, generally, twice as fast, utilizing the latest Intel chipset and 16Gbe FC, with an Infiniband interconnect between the directors and a boatload of extra cache.

One of our customers recently wanted their VS2 hardware either scaled-out or replaced by VS6 for performance reasons. Going for a hardware replacement was more cost-effective than scaling out by adding more VS2 engines.

Impressively the in-place upgrade of the hardware could be done none-disruptively. This is achievable through the clever way the GeoSynchrony firmware is ‘loosely coupled’ from the hardware. The VS6 hardware is a significant upgrade over the VS2, yet they are able to run the same firmware version of GeoSynchrony without the different components of VPLEX being aware of the fact. This is especially useful if you have VPLEX deployed in a metro-cluster.
So to prepare for a seamless upgrade from VS2 to VS6, your VS2 hardware needs to be on the S6 firmware. The exact same release as the VS6 hardware you will be transitioning to.

VPLEX consists of ‘Engines’ that house 2 ‘directors’. You can think of these as broadly analogous to the service processors in an array. With the main difference being that they are active-active. They share a cache and are able to handle i/o for the same LUN’s simultaneously. If you add another engine with 2 extra directors, now you have 4 directors all servicing the same workload and load-balancing the work.

Essentially the directors form a cluster together, directly over their infiniband, or in metro-cluster, also, partially, over fiber channel across the WAN. Because they are decoupled from the management plane, they can continue operating even when the management plane is temporarily not available. It also means that, if their firmware is the same, even though the underlying hardware is a generation apart, they can still form a cluster together without any of them noticing.  This is what makes the non-disruptive upgrade, even in a metro-cluster configuration, possible. It also means that you can upgrade one side of the VPLEX metro-cluster separately, and a day or even a week apart from the other side. This makes planning an upgrade more flexible. There is a caveat however, and that is a possible slight performance hit on your wan-com replication between the VS2 and VS6 sides, so you don’t want to keep in that state for all too long.


VPLEX VS2 hardware. 1 engine consisting of 2 directors.

VS6 hardware. Directors are now stacked on top of each other. 

Because all directors running the same firmware are essentially equivalent, even though they might be of different hardware generations, you can almost predict what the non-disruptive hardware upgrade looks like. Its more or less the same procedure as if you where to replace a defective director. The only difference is that the old VS2 hardware is now short-circuited to the new VS6 hardware, which enables the new VS6 directors to take over i/o and replication from the old directors one at a time.

The only thing the frontend hosts and the backend storage ever notice, is temporarily losing half their storage paths. So naturally, you need to have your multipathing software on your hosts in order. This will most likely be EMC powerpath, which handles this scenario flawlessly.

The most impressive trick of this transfer, however, is that the new directors will seamlessly take over the entire ‘identity’ of the old directors. This includes -everything- unique about the director, including, crucially, the WWNs.  This is important because transferring the WWNs is the very thing that makes the transition seamless.  It does of course require you to have ‘soft zoning’ in place, in the case of FC. As a director port WWN will suddenly, in the space of about a minute, vanish from 1 port, and pop up on another port. But if you have your zoning set up correctly, you do not even have to touch your switches at all.

And yes, that does mean you need double cabling, at least temporarily. The old VS2 is of course connected to your i/o switches, and the new VS6 will need to be connected simultaneously on all its ports, during the upgrade process.

So have fun cabling those 😉

That might be a bit of a hassle, but its a small price to pay for such a smooth and seamless transition.

To enable the old VS2 hardware, (which used FC to talk to his partner director over local-com), to talk to the new VS6 directors (which use Infiniband) during the migration, it is necessary to temporary insert an extra FC module into the VS6 directors. During a specific step in the upgrade process, the VS2 is connected to the VS6, and for a brief period, your i/o is being served from a combination of a VS2 and VS6 director that are sharing volumes and cache with eachother. This is a neat trick.

Inserting the temp IO modules:

As a final step, the old VS2 management server settings are imported to the new redundant VS6 management modules. In VS6, these management modules are now integrated into the director chassis, and act in a active-passive failover mode. This is a great improvement over the single on-redundant VS2 management server, with its single power supply (!)


Old Management Server:

New management modules:

The new management server hardware completely takes over the identity and settings of the old management server. This even includes IP address, customer cluster names and the cluster serial numbers. The VS6 will adopt the serial numbers of your VS2 hardware. This is important to know from a EMC support point-of-view and may confuse people.

The great advantage is that all local settings and accounts, and all monitoring tools and alerting mechanisms flawlessly work with the new hardware. For example we have a powershell script that uses the API to check the health status. This script worked immediately with the VS6 without having to change anything. Also VIPR SRM only need a restart of the VPLEX collector,whereafter it continued collecting without having to change anything.  The only thing I have found that did not get transferred where the SNMP trapping targets.
After upgrade, the benefit of the new VS6 hardware was immediately noticeable. Here is a graph  of average aggregate director CPU use, from EMC VIPR SRM:

As this kind of product is fundamental to your storage layer, its stability and reliability, especially during maintenance work like firmware and hardware upgrades, is paramount, and is taken seriously by EMC. Unlike other EMC products like VNX, you are not expected or indeed allowed to update this hardware yourself, unless you are a certified partner.  Changes that need to be done to your VPLEX platform go through a part of EMC called the ‘Remote Pro-active’ team. 

There is a process that has to be followed which involves getting them involved early, a round of pre-validation health-checks, and the hands-on action of the maintenance job either remotely via webex, or on sight by local EMC engineers if that is required. A hardware upgrade will always require onsite personal, so make sure they deliver pizza to the datacenter! If an upgrade goes smoothly, expect it to take 4-5 hours. That includes all the final pre-checks, hardware work, cabling, transfer of management identity to the VS6, and decommissioning of the VS2 hardware.

In the end the upgrade was a great success, and our customer had zero impact. Pretty impressive for a complete hardware replacement of such a vital part of your storage infra.

Finally, here is the text of the September2016 VPLEX Uptime bulletin with some additional information about the upgrade requirements. Be aware that this may be deprecated, please consult with EMC support for the latest info.,-September-2016.pdf?language=en_US

There is an EMC community thread where people have been leaving their experiences with the upgrade, have a look here:


EMC VPLEX Performance Monitor v1.0

Friday, November 13th, 2015

EMC have released an OVF appliance that is meant to allow you to store and browse 30 days worth of VPLEX performance statistics. Version1 is limited to just a few metrics, but it is a very welcome addition to the VPLEX monitoring tools that are available! Requires GeoSynchrony 5.5


Today I was looking up some information on vplex on the EMC support site, my eye was quickly drawn to the following entries:

I have seen no mention of this at all on either Twitter or on the VPLEX community space at EMC:

This is typical of EMC in my experience, they are terribad at disseminating support information and making new stuff ‘discoverable’.

So what is this thing?

Up till now, you had several ways to monitor, save, and analyze VPLEX statistics.

  • The GUI, but that only shows live data, no history, and only shows very few metics and only on high level
  • VPLEXCLI: Monitor create, Monitor collect, etc.  Powerfull CLI commands, any statistics can be saved. Can create exportable CSV files. But hard to use and understand, and for live monitoring the implementation is truly horrible, scrolling across your screen in a disruptive way, no ‘top’ kind of function here or anything
  • EMC VIPR SRM. EMCs statistics and analytics suite. Good for all kinds of EMC products, uses a  ‘perpetual’ version of the above mentioned monitor construct. But definitely not a free tool.
  • If you have VMware vROPS: EMC Storage Analytics.  Adapter for vROPS, but again not free. v3 of this adapter supports vROPS 6.x
  • SNMP. Vplex comes with a MIB, but my experience with it so far is that its got some serious compliance (and syntax) issues that are preventing it from working in, for example the vROPS SNMP adapter. (this was my attempt at a ‘cheapo’ EMC Storage Analytics’ 😉

So, nothing we had so far ‘just worked’ as a fast, and -free, gui-based way of seeing some deep statistics. There was something to be said for this not being available in the product itself. It looks like with “EMC VPLEX Performance Monitor” , which is a free OVF download, they are attempting to address this concern.

Lets check the release notes.

Product description

VPLEX Performance Monitor is a stand-alone, customer installable tool that allows you to collect virtual volume metrics from a VPLEX Local or VPLEX Metro system. It allows Storage Administrators to see up to 30 days of historical virtual volume performance data to troubleshoot performance issues and analyze performance trends.

The VPLEX Performance Monitor tool is delivered as an OVA (Open Virtualization Format Archive) file that you deploy as a VMware virtual appliance. The virtual appliance connects to one VPLEX system and collects performance metrics for all virtual volumes that are in storage views. Historical virtual volume metrics are stored in a database within the virtual appliance for 30 days. The virtual appliance has a web application which allows you to view the data in charts that show all 30 days of data at once, or allows you to zoom in on data down to the minute.

The VPLEX Performance Monitor charts the following key virtual volume metrics:

Throughput (total read and write IOPS)
Read Bandwidth (KB/s)
Write Bandwidth (KB/s)
Read Latency (usec)
Write Latency (usec)

Note: The VPLEX Performance Monitor can connect to one VPLEX Local or Metro system at a time. To monitor additional VPLEX systems, deploy a new instance of the tool for each VPLEX Local or Metro system you want to monitor.

Ok, so admittedly, for a version1, not all that much here, no port statistics or backend storage metrics for example. But in most cases, you are gonna be interested in your virtual volumes most of all anyway, so a good start.

Only 1 VPLEX system at a time? We have 2 Metro-Cluster setups in our environment, which translates to 4 engines in total. Does a ‘system’ equate to an engine? I think so, which means I would need 4 of these appliances. Oh well.

30 days is a nice sweet spot for metric saving as far as I am concerned. This appliance is using an embedded database, so don’t expect options to save your data for years. Get VIPR SRM if you want that.

IMPORTANT Version 1.0 cannot be upgraded. When the next release is available, you must delete the current VPLEX Monitor virtual appliance and deploy the new one. All performance data and user information will be lost.

  • The VPLEX Performance Monitor requires a VPLEX Local or Metro system running GeoSynchrony release 5.5 (VS2 hardware only).
  • The VPLEX Performance Monitor is not supported for use with VS1 hardware.
  • This version supports connection to a VPLEX system with a maximum of 4,000 virtual volumes.
  •  This release of the VPLEX Performance Monitor is not FIPS compliant. Contact EMC Customer Support if you encounter any issues installing or using the VPLEX Performance Monitor tool.


Take note of the GeoSynchrony 5.5 requirement. This version only came out recently, so I don’t expect many people to be running this yet.
We don’t in any case, so I can’t provide you with an install demo, yet :p

If you have GeoSynchrony 5.5, go give this a try:

(EMC Support account required)

Update 03:03pm: Was googling for “EMC VPLEX Performance Monitor” to see if anyone else had mentioned it yet, came across this video (with 20 views so far, wow!) that showcases the new tool.