Archive for the ‘Software and Tools’ Category

Metro-Cluster SDRS datastore tagging and the EnforceStorageProfiles advanced setting

Monday, October 31st, 2016

When doing vSphere Metro Storage Cluster, on the shared storage layer, you often have a ‘fallback’ side. The LUN that will become authoritative for reading and writing in case of a site failure, or a split brain.

This makes VM storage placement on the correct Datastores rather important from an availability perspective.

Up till now, you had to manage intelligent VM storage-placement decisions yourself. And if you wanted the alignment of ‘compute’ -aka where the VM is running, in relation to where its storage falls back, then you also had to take care of this yourself through some kind of automation or scripting.

This problem would be compounded if you also wanted to logically group these storage ‘sides’ into SDRS clusters, which you often do, especially if you have many datastores.

In the past few years, mostly in regard to vSAN and vVOLs, VMware have been pushing the use of Storage Policies, and getting us thinking towards a model of VM-policy based storage management.

Wouldn’t it be great if you could leverage the new Storage Policies, to take care of your metro-cluster datastore placement? For example, by tagging datastores, and building a policy around that.

And what if you could get SDRS to automate and enforce these policy-based placement rules?

The EnforceStorageProfiles advanced setting introduced in 6.0U2 seemed to promise to do this.

However, messing around with Storage Policies, Tagging and in particular that EnforceStorageProfiles advanced setting, I encountered some inconsistent and unexpected GUI and enforcement behavior that show we are just not quite there yet.

This post details my findings from the lab.

————————————-

The summery is as follows:

It appears that if you mix different self-tagged storage capabilities inside a storage-cluster, the cluster itself will not pass the Storage Policy compatibility check on any policy that checks for a tag that is not applied to all datastores in that cluster.

Only if all the datastores inside the storage-cluster share the same tag, will the cluster itself report itself as compatible.

This is despite applying that tag to the storage-cluster object itself! It appears that adding or not adding these tags to the storage-cluster object has no discernible effect on the Storage Compatibility check of the policy.

This contradicts the stated purpose and potential usefulness of the EnforceStorageProfiles advanced setting.

However, individual datastores inside the storage-cluster will correctly be detected as compliant or non-compliant based on custom tags.

The failure of the compatibility check on the storage-cluster will not stop you from provisioning a new VM to that datastore cluster, but the compatibility warnings you get only apply to 1 or more underlying non-compatible data stores. It does not tell you which though, so that can be confusing.

The Advanced setting EnforceStorageProfiles will effect storage-cluster initial placement recommendations, but will not result in SDRS movements on their own when the value is set to 1 (soft enforcement) .
Even EnforceStorageProfiles=2  (hard enforce) does not make SDRS automatically move a VMs storage from non-compatible to compatible datastores in datastore-cluster. It seems to only effect initial placement.  This appears to contradict the way the setting is described to function.

However, even soft enforcement will stop you from moving a VM manually to a non-complaint datastore within that storage-cluster, even though you specified an SDRS override for that VM. That is unexpected, and the kind of behavior one would only expect with a ‘hard’ enforce. Again, this is unexpected behavior.

This may mean that while SDRS will not move a VM that has already been placed,  to correct storage on its own accord after the fact, it will at least prevent the VM from moving to incorrect storage.

Summed up that means that as long as you get your initial placement right, EnforceStorageProfiles  will make sure the VMs storage at least stays there. But it won’t leverage SDRS to fix placements, as the setting appears to have meant to.

 

Now for the details and examples:
————–

I have 4 Datastores in my SDRS cluster:

I have applied various tags to these datastore objects, for example the datastores start with ‘store1’ received the following tags:

datastores start with ‘store2’ received the following tags:

The crucial difference here is the tag “Equalogic Store 1” vs “Equalogic Store 2

In the this default situation, the SDRS Datastore Cluster itself has no storage tags applied at all.

 

I have created a Storage Policy that is meant to match with datastores with the “Equalogic Store 2” tag.  The idea here is that I can assign this policy to VMs, so that inside that datastore cluster those VMs will always reside on ‘Store2’ datastores and not on ‘Store1’ datastores.

I plan to have SDRS (soft) enforce this placement using the advanced option EnforceStorageProfiles=1, introduced in vSphere vCenter Server 6.0.0b

 

 

The match for ‘Equalogic Store 2’  is the only rule in this policy.

 

But when I check the storage compatibility, neither the datastores that have that tag nor the datastore cluster object shows up under the ‘Compatible’ listing.

However, under the ‘Incompatible’ listing, the Cluster shows up as follows:

Notice how the SDRS Cluster object has appeared to have ‘inherited’ the error conditions of both Datastores that do not have the tag.

This was unexpected.

In the available documentation for VM Storage Policies, I have not found any reference to SDRS Clusters directly. My main reference here is Chapter 20 of the vsphere-esxi-vcenter-server-601-storage-guide.  Throughout the documentation, only datastore objects themselves are referenced.

The end of chapter 8 of the vsphere-esxi-vcenter-server-601-storage-guide ; ‘Storage DRS Integration with Storage Profiles’ – explains the use of the EnforceStorageProfiles advanced setting.

 

 

The odd thing is, the documentation for the The PbmPlacementSolver data object (which I asume Storage Policy placement checker is utilizing)  even explicitly states that storage POD’s (SDRS Clusters) is a valid ‘Hub’ for checking against.

But it seems as if the ‘hub’ in the case of being an SDRS cluster, will produce an error for every underlying datastore that throws an error. In cases of mixed-capability datastores in a single SDRS Cluster, depending on how specific your storage profile is, chances are it will always throw an error.

So this seems contradictory!  How can we have an SDRS advanced setting that operates on a per-datastore bases, while the cluster object will likely always stop the compatibility check from succeeding?

 

As a possible workaround for these errors, I tried applying tags to the SDRS Cluster itself.  I applied the “Equalogic Store 1” and “Equalogic Store 2” both to the SDRS Cluster object. The idea being that the compatibility check of the storage policy would never fail to match on either of these tags.

 

 

But alas, it seems to ignore tags you set on the SDRS Cluster itself.

Anyway, its throwing an error, but is it really stopping SDRS from taking the policy into account, or not?

 

Testing SDRS Behaviors

 

 

Provision a new VM

Selecting the SDRS Cluster, It throws the compatibility warning twice, without telling you which underlying datastores it is warning you about. That is not very useful!

However, it will deploy the VM without any issue.

When we check the VM, we can see that it has indeed placed the VM on a compatible Datastore

 

 

Manual Storage-vmotion to non-compliant datastore

In order to force a specific target datastore inside an SDRS Cluster, check the ‘Disable Storage DRS for this virtual machine’ checkbox. This will create an override rule for this VM specifically.  When we do this and select a non-compatible datastore, it throws a warning, as we might expect. But as I have chosen to override SDRS recommendations completely here, I expect to be able to just power on through this selection.

 

No such luck. Remember that EnforceStorageProfiles is still set to only ‘1’, which is a soft enforcement. This is not the kind of behavior I expect from a ‘soft’ enforcement, especially not when I just specified that I wanted to ignore SDRS placement recommendations altogether!

I should be able to ignore these warnings, for above stated reasons. Its a bit inconsistent that I am still prevented from overriding!

There are 2 ways around this.

First of all you can momentarily turn off SDRS completely.

You must now choose a datastore manually. Selecting the non-compatible datastore will give the warning, as expected.

But now no enforcement takes place and we are free to move the VM wherever we want.

The other workaround, which is not so much a workaround, as it is the correct way of dealing with policy-based VM placement, is to change the policy. 
If you put the VMs policy back to default, it doesn’t care where you move it.

 

Storage DRS Movement Behaviors

When EnforceStorageProfiles=1  SDRS does not seem to move the VM, even if it is non-complaint.

Unfortunately, EnforceStorageProfiles=2 (hard enforce) does not change this behavior. I was really hoping here that it would automatically move the VM to the correct storage, but it does not, even when manually triggering SDRS recommendations.

Manual Storage-vmotion to compliant datastore

When the VM is already inside the storage-cluster, but on a non-complaint datastore , you would think it would be easy to get it back onto compliant datastore.
It is not. When you select the datastore-cluster object as the target, it will fault on the same error as manually moving it in the previous example. – explicit movements inside an SDRS-enabled cluster always require an override.

Create the override by selecting the checkbox again.

Dont forget to remove the override again, afterwards.

Manual Storage-vmotion from external datastore to the storage-cluster

Here, SDRS will respect the storage policy and recommend initial placement on the correct compliant datastores.


 

Conclusion.

Tag-based storage policies, and their use in combination with SDRS Clusters, appears to be buggy and underdeveloped. The interface feedback is inconsistent and unclear. As a result, the behavior of the EnforceStorageProfiles setting becomes unreliable.

Its hard to think of a better used case for  EnforceStorageProfiles  than the self-tagged SDRS datastore scenario I tried in the lab. both vSAN and vVOL datastores do not benefit from this setting. It really only applies to ‘classic’ datatores in an SDRS cluster.

I have seen that self-tagging does not work correctly. But I have not yet gone to back to the original use-case of Storage Profiles: VASA properties. However, with VASA advertised properties you are limited to what the VASA endpoint is advertising. Self-tagging is far more flexible, and currently the only way I can give datastores a ‘side’ in a shared-storage metro-cluster design.

Nothing I have read about vSphere 6.5 so far, leads me to believe this situation has been improved. But I will have to wait for the bits to become available.

 

New HA and DRS features in vSphere 6.5 #vmworld2016

Tuesday, October 18th, 2016

Among all the great new features and improvements made to vSphere 6.5, some of the ones I am most exited about are the improvements to DRS and HA. So lets zoom into those briefly.

This information comes mostly from VMware pre-sales marketing material and should be considered preliminary. I hope to try out some of these features in our lab once the bits become available.

vCenter Server Appliance (VCSA) now supports a HA mode + Witness.

This appears to be similar in some respects to the NSX Edge HA function. But with one seriously important addition: a witness.
In any High-Availability, clustering or other kind of continuous-uptime solution, where data integrity or ‘state’ is important, you need a witness or ‘quorum’ function to determine which of the 2 HA ‘sides’ becomes the master of the function, and thus may make authoritative writes to data or configuration. This is important if you encounter the scenario of a ‘split’ in your vSphere environment, where both the HA members could become isolated from each other. The witness helps decide which of the 2 members must ‘yield’ to the other. I expect the loser turns its function off. The introduction of a witness also helps the metro-cluster design. In case of a metro-cluster network split, the witness now makes sure you cannot get a split-brain vcenter.

The HA function uses its own private network with dedicated adapter, that is added during configuration. There is a basic config and an advanced option to configure. I assume the latter lets you twiggle the nobs a bit more.

There are some caveats. At release this feature only works if you are using an external Platform Services Controller. So assume this will not work if you run all the vSphere functions inside 1 appliance. At least not at GA.

It should be noted that the new integrated vSphere Update Manager for the VCSA, will also failover as part of this HA feature.It should also be noted that this feature is only available in Enterprise+

 

Simplified HA Admission Control

vSphere 6.5 sees some improvements to HA admission control. As with many of the vSphere 6.5 enhancements, the aim here is to simplify or streamline the configuration process.
The various options have now been hidden under a general pulldown menu, and combine with the Host Failures Cluster Tolerates number, which now acts as input to whatever mode you select.  In some ways this is now more like the VSAN Failures To Tolerate setting. You can of course, still twiggle the knobs if you so wish.
Additionally to this, the HA config will give you a heads up if it expects your chosen reservation with potentially impact performance while doing HA restarts. You are now also able to guard against this by reserving a resource percentage that HA must guarantee during HA restarts. These options give you a lot more flexibility.
Admission control now also listens to the new levels of HA Restart priority, where it might not restart the lowest levels if they would violate the constraints. These 2 options together give you great new flexibility in controller the HA restart and the resources it takes (or would take).

 

vShere HA Restart Priorities

At long last, vSphere now supports more than 3 priority-levels. This adds a lot more flexibility to your HA design. In our own designs, we already assigned infrastructure components to the previous ‘high’ level, customer production workloads to ‘medium’ and everything else to ‘low’.  What I was missing at the time was differentiate between the Infra components. For example, I would want Active Directory to start -before- many other Infra services that rely on AD authentication. Syslogging is another service you want to get back up as soon as possible. And of course vCenter should ideally come back before many other VMware products that rely on it.  Also allows  you to make some smart sequencing decisions in regard to NSX components. I would restart NSX controllers and the Edge DLR and Edge tenant routers first, for example.  I am sure you can think of your own favorite examples.
As mentioned previously, these new expanded restart levels go hand-in-hand with the new admission control options.

 

vSphere HA Orchestrated Restart

This is another option that I have wanted to see for a very long time. I have seen many HA failover in my time, and always the most time is spent afterwards by the application owners, putting the pieces back together again cause things came up in the wrong order.

vSphere Orchestrated Restart allows you to create VM dependency rules, that will allow a HA Failover to restart the VMs in the order that best serves the application. This is similar to the rule sets we know from SRM.

 

Naturally you will need to engage your application teams to determine these rules. I do wonder about the limits here. In some of the environments we manage, there could potentially be hundreds of these kinds of rules. But you don’t want to make it too hard for HA to calculate all this, right?

 

Proactive HA

This is a ‘new’ feature, in so far that that this is a new deeper level of integration natively to vCenter, and can leverage the new ‘quarantine mode’ for ESX hosts. Similar behavior has already for years been a feature of the Dell Management Plug-in for vCenter, for example; where ‘maintenance mode’ action was triggered as script action from a vCenter alert. By leveraging ‘quarantine mode’ , new modes of conduct are enabled in dealing with partially failed hosts, for example pro-actively migrating off VMs, but based on specific failure rules, instead of an all-or-nothing approach.

 

Quarantine Mode

For years we have only ever had 2 possible host states: Maintenance and.. well, not in maintenance 🙂

Quarantine Mode is is the new middle ground. It can be leverages tightly with the new proactive HA feature mentioned above and integrates with DRS, but is above all just a useful mode to employ operationally.

The most important thing to bare in mind, is that Quarantine mode does not by default guarantee that VMs cannot or will not land on this host. An ESH host in quarantine can and will still be used to satisfy VM demand where needed. Think of reservations and HA failover. DRS, however, will try to avoid placing VMs on this host if possible.
Operationally, this is very similar to what we would already do in many ‘soft’ failure scenarios for hosts: – we will put DRS to semi-auto, and slowly start to evacuate the host, usually ending up putting it in maintenance at the end of the day.

 

DRS Policy Enhancements

Again more streamlining. For us vSphere admins with a case of OCD, the new ‘even distribution’ model is quite relaxing. VMware describes this, endearingly, as the ‘peanut butter’ model. Personally I will refer to it as the Nutella model, because Nutella is delicious!

This of course refers to the ‘even spread’ of VMs across all hosts in your cluster.
This, and the other options added to DRS, are interesting from both a performance and a risk point-of view. You avoid the ‘all your eggs in one basket’ issue, for example. Naturally the CPU over-commitment setting is especially interesting in VDI environments, or any other deployment that would benefit from good continuous CPU response.

 

Network-aware DRS

DRS will now attempt to balance load based also on the network saturation level of host, besides only looking at CPU and RAM. However it will prioritize CPU and RAM above all else. This is on a best-effort basis so no guarantees.

 

 

 

 

 

 

 

 

Slow boot time on Veracrypt

Thursday, September 22nd, 2016

Re-encrypting my work laptop harddrive.
Veracrypt is the successor to Truecrypt and its code has been community-vetted to insure there are no ‘back doors’ in it (and its security can be independently verified).

The only downside it has is that by default, it uses a rather high header key derivation iteration value (a lot higher than truecrypt). Meaning that it can take several minutes to boot your laptop. This is a frequent complaint by new Veracrypt users.

The workaround is simple. As long as you use a password that is longer than 20 characters, you are allowed to reduce the amount of iterations substantially by using a lower multiplier value (called a PIM), that you type in at boot time after your password. The multiplayer may be as low as 1, which will more or less instantly mount your boot partition.

For the purposes of theft-risk-reduction by common criminals, this is probably more than enough protection. However, if you are seeking to thwart the NSA which may try to brute-force your password using a server farm for 5 years, it may not be 😉

EMC VPLEX Performance Monitor v1.0

Friday, November 13th, 2015

EMC have released an OVF appliance that is meant to allow you to store and browse 30 days worth of VPLEX performance statistics. Version1 is limited to just a few metrics, but it is a very welcome addition to the VPLEX monitoring tools that are available! Requires GeoSynchrony 5.5

—-

Today I was looking up some information on vplex on the EMC support site, my eye was quickly drawn to the following entries:

I have seen no mention of this at all on either Twitter or on the VPLEX community space at EMC: https://community.emc.com/community/products/vplex

This is typical of EMC in my experience, they are terribad at disseminating support information and making new stuff ‘discoverable’.

So what is this thing?

Up till now, you had several ways to monitor, save, and analyze VPLEX statistics.

  • The GUI, but that only shows live data, no history, and only shows very few metics and only on high level
  • VPLEXCLI: Monitor create, Monitor collect, etc.  Powerfull CLI commands, any statistics can be saved. Can create exportable CSV files. But hard to use and understand, and for live monitoring the implementation is truly horrible, scrolling across your screen in a disruptive way, no ‘top’ kind of function here or anything
  • EMC VIPR SRM. EMCs statistics and analytics suite. Good for all kinds of EMC products, uses a  ‘perpetual’ version of the above mentioned monitor construct. But definitely not a free tool.
  • If you have VMware vROPS: EMC Storage Analytics.  Adapter for vROPS, but again not free. v3 of this adapter supports vROPS 6.x
  • SNMP. Vplex comes with a MIB, but my experience with it so far is that its got some serious compliance (and syntax) issues that are preventing it from working in, for example the vROPS SNMP adapter. (this was my attempt at a ‘cheapo’ EMC Storage Analytics’ 😉

So, nothing we had so far ‘just worked’ as a fast, and -free, gui-based way of seeing some deep statistics. There was something to be said for this not being available in the product itself. It looks like with “EMC VPLEX Performance Monitor” , which is a free OVF download, they are attempting to address this concern.

Lets check the release notes.

Product description

VPLEX Performance Monitor is a stand-alone, customer installable tool that allows you to collect virtual volume metrics from a VPLEX Local or VPLEX Metro system. It allows Storage Administrators to see up to 30 days of historical virtual volume performance data to troubleshoot performance issues and analyze performance trends.

The VPLEX Performance Monitor tool is delivered as an OVA (Open Virtualization Format Archive) file that you deploy as a VMware virtual appliance. The virtual appliance connects to one VPLEX system and collects performance metrics for all virtual volumes that are in storage views. Historical virtual volume metrics are stored in a database within the virtual appliance for 30 days. The virtual appliance has a web application which allows you to view the data in charts that show all 30 days of data at once, or allows you to zoom in on data down to the minute.

The VPLEX Performance Monitor charts the following key virtual volume metrics:

Throughput (total read and write IOPS)
Read Bandwidth (KB/s)
Write Bandwidth (KB/s)
Read Latency (usec)
Write Latency (usec)

Note: The VPLEX Performance Monitor can connect to one VPLEX Local or Metro system at a time. To monitor additional VPLEX systems, deploy a new instance of the tool for each VPLEX Local or Metro system you want to monitor.

Ok, so admittedly, for a version1, not all that much here, no port statistics or backend storage metrics for example. But in most cases, you are gonna be interested in your virtual volumes most of all anyway, so a good start.

Only 1 VPLEX system at a time? We have 2 Metro-Cluster setups in our environment, which translates to 4 engines in total. Does a ‘system’ equate to an engine? I think so, which means I would need 4 of these appliances. Oh well.

30 days is a nice sweet spot for metric saving as far as I am concerned. This appliance is using an embedded database, so don’t expect options to save your data for years. Get VIPR SRM if you want that.

IMPORTANT Version 1.0 cannot be upgraded. When the next release is available, you must delete the current VPLEX Monitor virtual appliance and deploy the new one. All performance data and user information will be lost.

  • The VPLEX Performance Monitor requires a VPLEX Local or Metro system running GeoSynchrony release 5.5 (VS2 hardware only).
  • The VPLEX Performance Monitor is not supported for use with VS1 hardware.
  • This version supports connection to a VPLEX system with a maximum of 4,000 virtual volumes.
  •  This release of the VPLEX Performance Monitor is not FIPS compliant. Contact EMC Customer Support if you encounter any issues installing or using the VPLEX Performance Monitor tool.

 

Take note of the GeoSynchrony 5.5 requirement. This version only came out recently, so I don’t expect many people to be running this yet.
We don’t in any case, so I can’t provide you with an install demo, yet :p

If you have GeoSynchrony 5.5, go give this a try:

https://download.emc.com/downloads/DL62040_VPLEX_Performance_Monitor_1.0.ova
https://support.emc.com/docu62030_VPLEX_Performance_Monitor_1.0_Release_Notes.pdf?language=en_US

(EMC Support account required)

Update 03:03pm: Was googling for “EMC VPLEX Performance Monitor” to see if anyone else had mentioned it yet, came across this video (with 20 views so far, wow!) that showcases the new tool.  https://www.youtube.com/watch?v=HiJgmbLkeTU

“Multiple Connections” error when importing a Physical machine into VMware

Wednesday, November 4th, 2009

 

As many before us, we ran into the following error in Virtual Center, when we tried to P-to-V a server:

“multiple connections to a server or shared resource by the same user”

This is not an uncommon error, and you might recognise it from other scenarios that involve remotely connecting to a Windows host. 
I found quite a few posts that mentioned this problem, even a mention in the release notes of VMWare Converter itself, and an VMware knowledge base article,  and they go something like this:

An Inter-Process Communication (IPC) named-pipe network connection is already open from the local Windows Redirector to the remote host with different credentials than you are specifying in Converter. The Windows Redirector does not allow IPC connections with multiple different credentials from the same user session. This restriction also applies to mapped network drives as they are also a type of named-pipe connection.

To ensure the Converter agent connection succeeds, perform the following actions on the computer running Converter:

  1. Close any application views or Windows Explorer windows showing files, ActiveX components, or Microsoft Management Console (MMC) snap-ins from the server you are trying to convert.
  2. Open a command prompt. For more information, see Opening a command or shell prompt (1003892).
  3. Type net use \\<remote_hostname>\ * /delete and press Enter.

    Note
    : This disconnects any mapped drives to the remote host.

  4. Check My Computer for any mapped network drives to the remote host and disconnect them.
  5. Log off the server running Converter and log on again.  This disconnects an open IPC named-pipe connections established by any remaining applications.
  6. If the problem persists, restart the server running Converter.
  7. If the problem still persists, and you are using the VirtualCenter Converter plug-in, restart the VirtualCenter server.

So we tried all the above, but to no avail. Try as we may, from both our admin workstations, aswell as on the Virtual Center server itself, we could not get it to run.

In the end, my collegue tried to run the task using the IP adress of the server, itstead of its hostname!

That did the trick! But don’t ask me why! I suspect it has something to with the named-pipe actually being named differently when you do this.