Archive for the ‘Hardware’ Category

EMC VPLEX VS2 to VS6 seamless, non-disruptive hardware upgrade

Tuesday, February 28th, 2017

This post describes our experience with upgrading from EMC VPLEX VS2 to VS6 hardware, in a seamless non-disruptive fashion.

EMC VPlex is a powerful storage virtualization product and I have had several years of experience with it in a active-active metro-storage-cluster deployment. I am a big fan. Its rock-solid, very intuitive to use and very reliable if set up correctly. Check out these 2 videos to learn what it does.

Around August 2016, EMC released VPLEX VS6, the next generation of hardware for the VPLEX platform. In many aspects it is, generally, twice as fast, utilizing the latest Intel chipset and 16Gbe FC, with an Infiniband interconnect between the directors and a boatload of extra cache.

One of our customers recently wanted their VS2 hardware either scaled-out or replaced by VS6 for performance reasons. Going for a hardware replacement was more cost-effective than scaling out by adding more VS2 engines.

Impressively the in-place upgrade of the hardware could be done none-disruptively. This is achievable through the clever way the GeoSynchrony firmware is ‘loosely coupled’ from the hardware. The VS6 hardware is a significant upgrade over the VS2, yet they are able to run the same firmware version of GeoSynchrony without the different components of VPLEX being aware of the fact. This is especially useful if you have VPLEX deployed in a metro-cluster.
So to prepare for a seamless upgrade from VS2 to VS6, your VS2 hardware needs to be on the S6 firmware. The exact same release as the VS6 hardware you will be transitioning to.

VPLEX consists of ‘Engines’ that house 2 ‘directors’. You can think of these as broadly analogous to the service processors in an array. With the main difference being that they are active-active. They share a cache and are able to handle i/o for the same LUN’s simultaneously. If you add another engine with 2 extra directors, now you have 4 directors all servicing the same workload and load-balancing the work.

Essentially the directors form a cluster together, directly over their infiniband, or in metro-cluster, also, partially, over fiber channel across the WAN. Because they are decoupled from the management plane, they can continue operating even when the management plane is temporarily not available. It also means that, if their firmware is the same, even though the underlying hardware is a generation apart, they can still form a cluster together without any of them noticing.  This is what makes the non-disruptive upgrade, even in a metro-cluster configuration, possible. It also means that you can upgrade one side of the VPLEX metro-cluster separately, and a day or even a week apart from the other side. This makes planning an upgrade more flexible. There is a caveat however, and that is a possible slight performance hit on your wan-com replication between the VS2 and VS6 sides, so you don’t want to keep in that state for all too long.

 

VPLEX VS2 hardware. 1 engine consisting of 2 directors.


VS6 hardware. Directors are now stacked on top of each other. 

Because all directors running the same firmware are essentially equivalent, even though they might be of different hardware generations, you can almost predict what the non-disruptive hardware upgrade looks like. Its more or less the same procedure as if you where to replace a defective director. The only difference is that the old VS2 hardware is now short-circuited to the new VS6 hardware, which enables the new VS6 directors to take over i/o and replication from the old directors one at a time.

The only thing the frontend hosts and the backend storage ever notice, is temporarily losing half their storage paths. So naturally, you need to have your multipathing software on your hosts in order. This will most likely be EMC powerpath, which handles this scenario flawlessly.

The most impressive trick of this transfer, however, is that the new directors will seamlessly take over the entire ‘identity’ of the old directors. This includes -everything- unique about the director, including, crucially, the WWNs.  This is important because transferring the WWNs is the very thing that makes the transition seamless.  It does of course require you to have ‘soft zoning’ in place, in the case of FC. As a director port WWN will suddenly, in the space of about a minute, vanish from 1 port, and pop up on another port. But if you have your zoning set up correctly, you do not even have to touch your switches at all.

And yes, that does mean you need double cabling, at least temporarily. The old VS2 is of course connected to your i/o switches, and the new VS6 will need to be connected simultaneously on all its ports, during the upgrade process.

So have fun cabling those 😉

That might be a bit of a hassle, but its a small price to pay for such a smooth and seamless transition.

To enable the old VS2 hardware, (which used FC to talk to his partner director over local-com), to talk to the new VS6 directors (which use Infiniband) during the migration, it is necessary to temporary insert an extra FC module into the VS6 directors. During a specific step in the upgrade process, the VS2 is connected to the VS6, and for a brief period, your i/o is being served from a combination of a VS2 and VS6 director that are sharing volumes and cache with eachother. This is a neat trick.

Inserting the temp IO modules:

As a final step, the old VS2 management server settings are imported to the new redundant VS6 management modules. In VS6, these management modules are now integrated into the director chassis, and act in a active-passive failover mode. This is a great improvement over the single on-redundant VS2 management server, with its single power supply (!)

 

Old Management Server:

New management modules:

The new management server hardware completely takes over the identity and settings of the old management server. This even includes IP address, customer cluster names and the cluster serial numbers. The VS6 will adopt the serial numbers of your VS2 hardware. This is important to know from a EMC support point-of-view and may confuse people.

The great advantage is that all local settings and accounts, and all monitoring tools and alerting mechanisms flawlessly work with the new hardware. For example we have a powershell script that uses the API to check the health status. This script worked immediately with the VS6 without having to change anything. Also VIPR SRM only need a restart of the VPLEX collector,whereafter it continued collecting without having to change anything.  The only thing I have found that did not get transferred where the SNMP trapping targets.
After upgrade, the benefit of the new VS6 hardware was immediately noticeable. Here is a graph  of average aggregate director CPU use, from EMC VIPR SRM:

As this kind of product is fundamental to your storage layer, its stability and reliability, especially during maintenance work like firmware and hardware upgrades, is paramount, and is taken seriously by EMC. Unlike other EMC products like VNX, you are not expected or indeed allowed to update this hardware yourself, unless you are a certified partner.  Changes that need to be done to your VPLEX platform go through a part of EMC called the ‘Remote Pro-active’ team. 

There is a process that has to be followed which involves getting them involved early, a round of pre-validation health-checks, and the hands-on action of the maintenance job either remotely via webex, or on sight by local EMC engineers if that is required. A hardware upgrade will always require onsite personal, so make sure they deliver pizza to the datacenter! If an upgrade goes smoothly, expect it to take 4-5 hours. That includes all the final pre-checks, hardware work, cabling, transfer of management identity to the VS6, and decommissioning of the VS2 hardware.

In the end the upgrade was a great success, and our customer had zero impact. Pretty impressive for a complete hardware replacement of such a vital part of your storage infra.

Finally, here is the text of the September2016 VPLEX Uptime bulletin with some additional information about the upgrade requirements. Be aware that this may be deprecated, please consult with EMC support for the latest info.

https://support.emc.com/docu79516_Uptime-Bulletin:-VPLEX-Edition-Volume-23,-September-2016.pdf?language=en_US

There is an EMC community thread where people have been leaving their experiences with the upgrade, have a look here: https://community.emc.com/message/969664

 

My journey to find how to set EMC VPLEX DNS Settings and how to change your default root password.

Tuesday, December 22nd, 2015

Warning: This is kind of a rant.

Sometimes I really have to wonder if the engineers who build hardware ever even talk to people who use their products.

Though I love the EMC VPLEX, I get this feeling of a ‘disconnect’ between design and use more strongly with this product than with many others.

This post is a typical example.

I noticed that one of my vplex clusters apparently does not have the correct DNS settings set up.

Now, Disclaimer: I am not a Linux guy. But even if I was, my first thought, when dealing with hardware, is not to treat it as an ordinary Linux distro. Those kind of assumptions can be fatal.  When its a complete provided solution, I assume and it is mostly the case,that vendors supply specific configuration commands environments to configure the hardware. It is always best practice to follow vendor guidelines first before you start messing around yourself.   Messing around yourself is often not even supported.

 

So, lets start working the problem:

 

My first go to for most things is of course google:

 

Now I really did try to find anything, any post by anyone, that could tell me how to set up DNS settings. I spent a whole 5 minutes at least on Google :p

But alas, no, lots of informative blog posts, nothing about DNS however.

Ok, to the manuals. I keep a folder of VPLEX documentation handy for exactly this kind of thing:

 

 

 

docu52651_VPLEX-Command-Reference-Guide MARCH2014.pdf

 

 

Uhh.. nope.

docu52646_VPLEX-Administration-Guide MARCH2014.pdf

AHA!

 

 

Uhh.. nope.

 

docu34005_VPLEX-Configuration-Guide MARCH2014.pdf

Nope

🙁

 

 

 

Ok, something more drastic:

docu52707_VPLEX-5.3-Documentation-Portfolio.pdf

3 hits. THREE.. really?

 

Yes.. I know the management server uses DNS. *sigh*

 

 

 

Oh.. well at least I know that it uses standard Bind now, great!

 

 

 

 

oh, hi again!

 

 

Ok, lets try EMC Support site next:

Uhhmm..    only interesting one here is:

( https://support.emc.com/docu34006_VPLEX-with-GeoSynchrony-5.0-and-Point-Releases-CLI-Guide.pdf?language=en_US )

director dns-settings create, eh??

Ok then!

Getting exited now!

\

 

‘Create a new DNS settings configuration’

Uhmm.. you mean like… where I can enter my DNS servers, right? Riiiiight?

 

Oh.. uh.. what?  I guess they removed it in or prior to Geosyncronity 5.3 ?    :p

🙁

Back to EMC support

Nope.

 

 

Nope.

So… there is NO DNS knowledge anywhere in the EMC documentation?  At all???  Anywhere??

 

Wait! Luke, there is another!

 

SolVe (seriously, who comes up with these names) is the replacement to the good ole ‘procedure generator’ that used to be on SupportLink.

Hmm… I dont see DNS listed?

Change IP addresses maybe??

Hmm…  not really.. however I see an interesting command: management-server

Oh… I guess you are too good to care for plain old DNS eh?

 

And this is the point where I have run out of options to try within the EMC support sphere.

And As you can see, I really really did try!

 

So…   the Management server is basically a Suse Linux distro, right?

vi /etc/resolv.conf

Uhm… well fuck.

Now, I am logged into the management server with the ‘service’ account. The highest-level account that is mentioned in any of the documentation. of course, it is not the root account.

sudo su – …  and voila:

There we go!

 

Which brings me to another thing I might as well address right now.

The default root account password for vplex management server is easily Googlable. That is why you should change it. There actually is a procedure for this: https://support.emc.com/kb/211258
Which I am sure no one ever anywhere ever has ever followed.. that at least is usually the case with this sort of thing.

Here is the text from that KB article:

The default password should be changed by following the below procedure. EMC recommends following the steps in this KB article and downloading the script mentioned in the article from EMC On-Line Support.

Automated script: 

The VPLEX cluster must be upgraded to code version 5.4.1 Patch 3 or to 5.5 Patch 1 prior to running the script.

Note: VS1 customers cannot upgrade to 5.5, since only VS2 hardware is capable of running 5.5. VS1 customers must upgrade to 5.4 SP1 P3, and VS2 customers can go to either 5.4 SP1 P3, or 5.5 Patch 1.

The script, “VPLEX-MS-patch-update-change-root_password-2015-11-21-install” automates the workaround procedure and can be found at EMC’s EMC Online Support.

Instructions to run the script: 

Log in to the VPLEX management-server using the service account credentials and perform the following from the management-server shell prompt:

  1. Pull down a copy of the “VPLEX-MS-patch-update-change-root_password-2015-11-21-install” script from the specified location above and then, using SecureCopy (scp), copy the script into the “/tmp/VPlexInstallPackages/” directory on the VPLEX management server.
  2. The permissions need to be changed to allow execution of the script using the command chmod +x.

service@ManagementServer:~> chmod +x /tmp/VPlexInstallPackages/VPlex-MS-patch-update-root_password-2015-11-21-install

  1. Run the script as shown below.

Sample Output:

This script will perform following operation:
– Search and insert the IPMI related commands in /etc/sudoers.d/vplex-mgmt.
– Prompt for the mgmt-server root password change.
Run the script with “–force” option to execute it

service@ManagementServer:~> sudo /tmp/VPlexInstallPackages/VPlex-MS-patch-update-root_password-2015-11-21-install –force

Running the script…

– Updating sudoers
– Change root password
Choose password of appropriate complexity.

Enter New Password:
Reenter New Password:

Testing password strength…

Changing password for root.

Patch Applied

NOTE: In the event that the password is not updated, run the script again with proper password complexity.

  1. Following running of the script, from the management server, verify that password change is successful.

Sample output:

service@ManagementServer:~> sudo -k whoami
root’s password:
root

***Contact EMC Customer Service with the new root password to verify that EMC can continue to support your VPLEX installation. Failure to update EMC Customer Service with the new password may prevent EMC from providing timely support in the event of an outage.

Notice how convoluted this is. Also notice how you need to have at least 5.4.1 Patch 3 in order to even run it.

While EMC KB articles have an attachment section, this script in question is of course not added.

Instead, you have to go look for it yourself, helpfully, they link you to: https://support.emc.com/products/29264_VPLEX-VS2/Tools/

And its right there, for now at least.

What I find interesting here is that it appears both the article, and the script, have been last edited.. .today?
Coincidental. But also a little scary. Does this mean that prior to 5.4.1 Patch 3 there really was no supported way to change the default vplex management server root password? The one that every EMC and VPLEX support engineer knows and is easily Googlable? Really? 

I think the most troubling part of all this is that final phrase:

Failure to update EMC Customer Service with the new password may prevent EMC from providing timely support in the event of an outage.

Have you ever tried changing vendor default backdoor passwords, and see if their support teams can deal with it?  Newsflash: they can not. We tried this once with EMC Clariion support. Changed the default passwords. We dutifully informed EMC support that we changed them. They assured it this was noted down in their administration for our customer.

You can of course guess what happened. Every single time EMC support would try to get in, and complain that they could not. You had to tell them every single time about the new passwords you had set up.  I am sure that somewhere in the EMC administrative system, there is a notes field that could contain our non-default passwords. But no EMC engineer I have ever spoken to would even look there, or even know to look there.

If you build an entire hardware-support infrastructure around the assumption of built-in default password that everyone-and-their-mother knows, you make it fundamentally harder to properly support users who ‘do the right thing’ and change them. And you build in vulnerability by default.

Instead, design you hardware and appliances to generate new and unique strong default passwords on first deployment, or have the user provide them (enforcing complexity). (many VMware appliances now do this). But do NOT bake in backdoor default passwords that users and Google will find out about eventually.

EMC VPLEX Performance Monitor v1.0

Friday, November 13th, 2015

EMC have released an OVF appliance that is meant to allow you to store and browse 30 days worth of VPLEX performance statistics. Version1 is limited to just a few metrics, but it is a very welcome addition to the VPLEX monitoring tools that are available! Requires GeoSynchrony 5.5

—-

Today I was looking up some information on vplex on the EMC support site, my eye was quickly drawn to the following entries:

I have seen no mention of this at all on either Twitter or on the VPLEX community space at EMC: https://community.emc.com/community/products/vplex

This is typical of EMC in my experience, they are terribad at disseminating support information and making new stuff ‘discoverable’.

So what is this thing?

Up till now, you had several ways to monitor, save, and analyze VPLEX statistics.

  • The GUI, but that only shows live data, no history, and only shows very few metics and only on high level
  • VPLEXCLI: Monitor create, Monitor collect, etc.  Powerfull CLI commands, any statistics can be saved. Can create exportable CSV files. But hard to use and understand, and for live monitoring the implementation is truly horrible, scrolling across your screen in a disruptive way, no ‘top’ kind of function here or anything
  • EMC VIPR SRM. EMCs statistics and analytics suite. Good for all kinds of EMC products, uses a  ‘perpetual’ version of the above mentioned monitor construct. But definitely not a free tool.
  • If you have VMware vROPS: EMC Storage Analytics.  Adapter for vROPS, but again not free. v3 of this adapter supports vROPS 6.x
  • SNMP. Vplex comes with a MIB, but my experience with it so far is that its got some serious compliance (and syntax) issues that are preventing it from working in, for example the vROPS SNMP adapter. (this was my attempt at a ‘cheapo’ EMC Storage Analytics’ 😉

So, nothing we had so far ‘just worked’ as a fast, and -free, gui-based way of seeing some deep statistics. There was something to be said for this not being available in the product itself. It looks like with “EMC VPLEX Performance Monitor” , which is a free OVF download, they are attempting to address this concern.

Lets check the release notes.

Product description

VPLEX Performance Monitor is a stand-alone, customer installable tool that allows you to collect virtual volume metrics from a VPLEX Local or VPLEX Metro system. It allows Storage Administrators to see up to 30 days of historical virtual volume performance data to troubleshoot performance issues and analyze performance trends.

The VPLEX Performance Monitor tool is delivered as an OVA (Open Virtualization Format Archive) file that you deploy as a VMware virtual appliance. The virtual appliance connects to one VPLEX system and collects performance metrics for all virtual volumes that are in storage views. Historical virtual volume metrics are stored in a database within the virtual appliance for 30 days. The virtual appliance has a web application which allows you to view the data in charts that show all 30 days of data at once, or allows you to zoom in on data down to the minute.

The VPLEX Performance Monitor charts the following key virtual volume metrics:

Throughput (total read and write IOPS)
Read Bandwidth (KB/s)
Write Bandwidth (KB/s)
Read Latency (usec)
Write Latency (usec)

Note: The VPLEX Performance Monitor can connect to one VPLEX Local or Metro system at a time. To monitor additional VPLEX systems, deploy a new instance of the tool for each VPLEX Local or Metro system you want to monitor.

Ok, so admittedly, for a version1, not all that much here, no port statistics or backend storage metrics for example. But in most cases, you are gonna be interested in your virtual volumes most of all anyway, so a good start.

Only 1 VPLEX system at a time? We have 2 Metro-Cluster setups in our environment, which translates to 4 engines in total. Does a ‘system’ equate to an engine? I think so, which means I would need 4 of these appliances. Oh well.

30 days is a nice sweet spot for metric saving as far as I am concerned. This appliance is using an embedded database, so don’t expect options to save your data for years. Get VIPR SRM if you want that.

IMPORTANT Version 1.0 cannot be upgraded. When the next release is available, you must delete the current VPLEX Monitor virtual appliance and deploy the new one. All performance data and user information will be lost.

  • The VPLEX Performance Monitor requires a VPLEX Local or Metro system running GeoSynchrony release 5.5 (VS2 hardware only).
  • The VPLEX Performance Monitor is not supported for use with VS1 hardware.
  • This version supports connection to a VPLEX system with a maximum of 4,000 virtual volumes.
  •  This release of the VPLEX Performance Monitor is not FIPS compliant. Contact EMC Customer Support if you encounter any issues installing or using the VPLEX Performance Monitor tool.

 

Take note of the GeoSynchrony 5.5 requirement. This version only came out recently, so I don’t expect many people to be running this yet.
We don’t in any case, so I can’t provide you with an install demo, yet :p

If you have GeoSynchrony 5.5, go give this a try:

https://download.emc.com/downloads/DL62040_VPLEX_Performance_Monitor_1.0.ova
https://support.emc.com/docu62030_VPLEX_Performance_Monitor_1.0_Release_Notes.pdf?language=en_US

(EMC Support account required)

Update 03:03pm: Was googling for “EMC VPLEX Performance Monitor” to see if anyone else had mentioned it yet, came across this video (with 20 views so far, wow!) that showcases the new tool.  https://www.youtube.com/watch?v=HiJgmbLkeTU

Lenovo T500 Wont PXE boot

Friday, May 7th, 2010

One of our batch of Lenovo T500 laptops refuses to boot from our Windows Deployment PXE server.
So not easy to get our default company image into it. Hard to explain to Lenovo support what is wrong cause its so very specific. All other network functions are just fine. My colleague ran a trace on the server side, and the Laptop doesn’t even seem to contact it.

If I had to guess I would say its network driver issue and this laptop has a very specific revision of network board .. or something. 

It seems I am not the only one running into PXE boot issues with a Lenovo.

Don’t really have that much time to dig deeply into it. I would like to see what the conversation is between our WDS server, the DHCP server and the laptop. Something wierd going on there.

Big Bang Servermove succesfull

Monday, July 7th, 2008

This Saterday, we switched the subnet over, and moved all the remain criticle systems over to a new serverroom across the country.

I didnt get to make as many pics as I liked, and no video, this was mainly cause I was so bussy of course. 😀

Half the pics below are curtesy of Arnold.

IMG_3656
Back of the Digital Alpha box, refernece sho for recabling.

IMG_3657
Packing up the servers in special locked crates. You could see these movers where the right stuff, big burly guys, but they handled the servers like feathers.

IMG_3658
Justin felt at home at the new location.

Big Bang Photo 005
x3800 waiting to be converted.

Big Bang Photo 001
We layed out the servers in the hall in the order we where building them into the racks.

IMG_3660
Our project operation center, round the corner of the serverroom. From here the managers of the project coordinated the downtime, and the business on-site testing.

IMG_3659
Tom, our WAN guy, on the phone with Mohamed, our Unix guy, who was supporting us remotely.

IMG_3661

Big Bang Photo 004

Big Bang Photo 003
Very good food and snacks where provided by Arnold and Jan. Thank you guys! You know the best way to an engineers heart is through his stomach!

IMG_3658

Big Bang Photo 006
Justin loves Legos, so this pictures seems to make sense.

Big Bang Photo 007
Justin working on the rack conversion kit for the IBM systemx 3800

Big Bang Photo 008
Tom and me discussing the Proxy server and internet line out. Old proxy, new Internet line, and some firewall rules where needed.

Big Bang Photo 010
Arnold:“2 pizza please!” ;  Mustafa is thinking: “I dont like Pizza”…

Big Bang Photo 011
Paul being technical. We hold our collective breath.

Big Bang Photo 009
Sliding in a server into its new home.

IMG_3662
Bottom of racks SR3 and SR4

IMG_3666
Top Left rack, SR4

IMG_3665
A suprise box. This witebox FTP server turned out to be running 4 essential ustomer FTP/EDI flows. We had space for it thankfully, resting on the IBM x3800. Hopefully it will be gone in 2  weeks, but nothing is temporary.  Made this picture of it for the Visio rack diagram.

IMG_3668


IMG_3669
One of the 2 redundant FTP servers failed on transport. Justin and Paul spend 2 hours putting a new one together out of spare parts of old ones. HP Netservers here, P2 machines, a decade old.

IMG_3670
Behind locked door and locked racks, all servers humm quitely, content in their new home.

A day later, back in the first location:

IMG_3674
Empty racks moved out of the server rooms

IMG_3675
Mail Cluster going to get sent back to UK

IMG_3677

IMG_3678
Gertjan visibilly enjoying dismanteling the place. 6 years of mental burden of supporting tihs stuff being dealt with here 😉


IMG_3680
The NAS, all the data for the Netherlands, with all volumes deleted and now unplugged, ready for storage.

Videos of “demolition”:


Mustafa and Gertjan have taken apart the entire second server room in 1 day. (click here for link if you cant see the embed above)


Oooh.. Nobs! Lovely nobs!!  (click here for link if you cant see the embed above)