Thefluffyadmin on March 17th, 2017

Here is a summery of my experience of speaking at the NLVMUG for the first time.

For someone who always take pride in knowing just that little bit more than the next guy, it is not surprising that a longstanding desire of mine, was to speak at a public event to some kind of unique knowledge. Public conferences, even vendor-specific conferences like VMWare’s VMUG’s and of course VMworld, are very interesting to me precisely because of this. It tends to attract and concentrate some of the most knowledgeable people, and some of the most cutting edge technological knowledge and experiences.

Last year I was invited by @gekort , a great public speaker in his own right, to present a session at the VMware summerschool in Utrecht, at VMwares Dutch main office. Having never previously spoken publicly like that, this was a pretty big deal for me. The sheer fear of being publicly scrutinized on my knowledge of a subject sends me into fits of anxiety ūüėČ
But it was a great experience, and personally for me a great success. It boosted my confidence in my speaking and presentation abilities quite a bit. The feedback that I got was valuable and I took as much of the experience and advice on board as I could. In any case, I knew I wanted to do more of this!  But the main advantage I had was that I was speaking to a set of subjects I was quite comfortable and knowledgeable about, in that case Metro-Cluster and HA.

When it was time to submit a paper to the NLVMUG, the largest VMware user conference in the world, besides VMWorld, it was obvious to myself and Alexander, our co-founder, that we should speak about our NSX experiences over the last 3 years. It is currently our biggest asset as an infrastructure partner, as we are currently in a rather unique position with it, and to be blunt, we really cannot advertise it enough. I am not in essence a ‘network guy’, so I was a bit nervous about the material. I made doubly sure I had fact-checked every single thing I wanted to talk about. I probably spent over 40 hours doing just that.

Simultaneously, my colleague Robin van Altena also submitted a talk about vRealize Network Insight.

We submitted¬†the NSX talk and the vRNI talk as a ‘lightning session’, which is only about 20 minutes. (My talk at the Summerschool was an hour). There where many, many of those slots available at the NLVUG. In retrospect, I think we could have equally well have pitched a full break-out session of 50 minutes, with the material we had.
As it turned out, there was already a full break-out session just before mine by one of the NLVUG leaders, Joep Piscaer , on OGD’s experience with NSX over the last 3 years. the NLVMUG leaders reached out to all new speakers to help coach them a bit, and me and Robin gracefully accepted.
This was quite a valuable Skype session, and the key point that was inparted on us, was the non-commercial nature of the talks. We where there to talk about our own,¬†personal experiences. While we could acknowledge our companies, it would be bad form to explicitly pitch our company or product. This is relatively easy for me, as having to engage in ‘sales talk’ causes a fair bit of cognitive dissonance in my brain, even though I can do it quite well when needed :p

Practicing your talk is essential, as is getting feedback early. ¬†We occasionally have ‘knowledge sessions’ at Redlogic, where people do little presentations of whatever it is they want to share. This was a perfect opportunity to get early feedback on our sessions.

My talk was pretty dense with NSX information. It took me a few personal practice runs, timing myself on the different parts, to get it all under 20 minutes. And you want a minute or two for questions.

The day itself was awesome. I was quite nervous of course. My talk was at 11:00, and that is a great time slot. Anything after lunch, and you risk the change that people are either falling asleep, or have left. ¬†Joep Piscaer’s talk about NSX at OGD was just prior to mine. I knew I would want to refer to his talk in mine, so I made sure to attend it.

His talk was indeed very interesting. There was a lot of overlap with mine, but our talks where also highly complementary for each other, each touching on unique aspects and experiences. He called me and my talk out specifically as a follow up, which was very gracious, and his final slide even referenced me. As I was going to briefly discuss NSX-T, he mentioned that specifically. This made me somewhat nervous as I was only going to spend maybe half a minute on that. I made it a point to give that subject a little more time at the end of my talk, which I did.

If you want to learn more about OGD’s hosted IAAS platform with NSX, check out¬†http://vmwareemeablog.com/nl/ogd-biedt-klanten-maximale-vrijheid-met-eigen-iaas-platform/ ¬†and¬†https://ogd.nl/blog/post/2016/08/samen-slimmer-met-ogds-eigen-iaas-platform/ (both in Dutch)

The ‘Dexter’ rooms reserved for the lightning talks are all quite small, only fitting about 40-45 people. As there where a record amount of sessions at the NLVMUG this year, the logistics of the venue had a bit of trouble keeping up. Also, all talks where about 10 minutes behind schedule, so I ended up in line for my own talk ūüôā
It is both incredibly encouraging and nerve-racking to see the room filled to capacity, and then another 15¬†or so people trying to get in. It was standing-room only at the back, and the same was true for Robin’s vRNI talk.
Getting started is always the hardest part, but once I was into the swing of it, I forgot about time and nerves and just went all-in on the knowledge. I didn’t even watch the timer counting down. ¬†My talk was pretty dense and I feel I have a pretty intense style of speaking. I try to scan the room and look people in the eye. I hope that keeps peoples attention. One thing I regret is not having some humorous moments in my slide deck. I need to take a page from Joep and include some memes next time :p
I tend to move around a lot, but the size of the room did not allow for much lateral pacing. Probably a good thing. You don’t want to remain hidden behind the lectern, but you don’t want to obscure the beamer either. I will take this into account with my slides next time; leave some space for my ‘shadow’ if needed. ¬†I was very happy the venue had provided fresh water behind the lectern. But a bottle would have been more practical than the glasses we had. I will take a bottle with me next time. Your mouth¬†will dry out :p

To my surprise, I seemed to stay inside the time perfectly, but I was not entirely sure. I was expecting (and dreading) questions, but I only got 1, which was customer-related and kinda drew a black for me in the moment. (why did our customer choose NSX). It was not the kind of question I had been expecting, and regretfully I had to admin on the spot that I did not know. I actually did remember later, but my mind was focused on product facts, not customer politics.

I asked the room for more questions.. silence. “Ideas?” .. ¬†“did you like it?!” .. and the whole room made enthusiastic and acknowledging noises. That was the best moment of the day ūüôā
I heard later, via others, that it had indeed been very well received by people. It also reminded me that is really not enough NSX experience out there right now, and many people are curious.

Also Robin’s talk about vRNI, just after and right next door time mine, was very well attended, with lots of interest. Again a packed and overcrowded room. ¬†He managed to cram in slides and material and exposition,¬†and 4 demo-movies,¬†and¬†stayed right inside 20 minutes. Very impressive! ¬†And demo’s of a product are always very popular, even if they are recorded. It should be noted he recorded these himself, in our own lab. They where not VMware-provided.

The rest of the day was much like any other conference day.. attending sessions, checking hands, live-tweeting, getting plied by vendors, hunting for food and snacks, and networking. I had been invited to a vExpert lunch with Frank Denneman, but I totally forgot about it.  We did have a nice buffet afterwards with the other speakers, and I had some great convos there with folks from ITQ. The day was exhausting but a huge amount of fun, best NLVMUG I have been to, and higher on my list than even VMworld so far.  I will certainly want to speak next year again, and perhaps at other places and events, my mind is already churning with what my next talk will be about!

I will be writing some upcoming blog posts about our NSX experiences, based on my presentation.

Tags: , , , , , , , ,

Thefluffyadmin on March 15th, 2017

—Update 21st March, vBrownbag episode on AWS that these slides are from, is now posted on youtube:¬†https://www.youtube.com/watch?v=u8rWI5tuSq8

— Update 15th March ~5pm CET, added some extra info and clarified some points—

More details regarding¬†VMware Cloud on AWS are starting to come out of VMware. Tonight I attended an awesome ¬†#vBrownbag webinar on #VMWonAWS, ¬† hosted by Chris Williams‚ÄŹ (@mistwire) and¬†¬†Ariel Sanchez‚ÄŹ ( @arielsanchezmor).

Presenting where Adam Osterholt (@osterholta) Eric Hardcastle (@CloudGuyVMware) and Paul Gifford @cloudcanuck

Here are some of the slides and highlights that stood out for me. Information is not NDA and permission was given to repost slides.

—–

VMware Cross-Cloud Architecture. A nice slide that summorises the VMware strategy going forward. Expect VMware cloud to pop up in more places, like IBM Cloud. More info about VMware cloud strategy here

Important to note here, is that this is a complete service offering, meaning its fully licensed. You do not need to bring your own licenses to the table. So you get the full benefit of technologies like vSAN and NSX as part of the offering.

Skillsets.. this is a huge selling point. Many native cloud deployments require your admins to know AWS or cloud-native specific tools and automation scripting languages. VMware Cloud on AWS (VMWonAWS) removes that barrier-to-entry completely. If you can administer a VMware-based cloud stack today , you can administer VMware Cloud on AWS.

You have access to AWS sites around the world to host VMWonAWS. What is to note however is that, because these are vSphere clusters on bare-metal, where you instantiate your VMware environment is where you are bound in certain ways.

Initial roleout will be Oregon. The followed by an EMEA location. Sometime around mid-2017.  (from announcement to GA in about a year.. not bad!!)

With the recent S3 outage in mind, asked specifically about things like stretched-cluster and other advanced high-availability features inside AWS, and these will not be initially part of the offering. However you can always move your VMs off and on VMWonAWS via x-vmotion. More or that later.

VMWonAWS will use customized HTML interfaces throughout. No flash here! ūüôā

But if you are a bit of a masochist and you like the flash/flex client, it will be available to you anyway.

The frontend provisioning component will include its own API interface. What you see below is a mockup and subject to change.

Administering your cluster uses a custom and locked-down version of the already available HTML5 client.

Its important to note here, that VMware will administer, and upgrade their software inside these environments themselves. They will keep an n-1 backward compatibility, but if you have a lot of integration talking against this environment, operationally you will have to keep up with updating your stuff. Think of vRA/vRO workflows and other automation you might have talking to your VMWonAWS instances. This may be a challenge for customers.

Demonstrated below is a typical feature unique to VMWonAWS, the ability to resize your entire cluster on the fly.

Again, above screenshots are mockups/work-in-progress

Your VMware environment is neatly wrapped up in an NSX Edge gateway, which you cannot touch. However, inside your environment, you are able to provision your own NSX networks, manage DFW, edges, etc, and with that all the functionality they offer you. However initially NSX API access will be limited or not available, so it may be hard to automate NSX actions out of the gate.

The Virtual Private Cloud (VPC) you get is divided into 2 pools of resources. Management functions are separated from compute.

Remember that all of this is bare-metal, managed and patched by VMware directly.

VMware manages the VPC with their stuff in it. Your get access to it via your own VPC, and the two are then linked together.

They give you a snazzy web frontend interface with its own API  do the basic connectivity config and provisioning.

So how do you connect up your new VMWonAWS instance with your on-premises infrastructure?

End-to-end, you are bridging via Edges.. but there is obviously a little more involved. Here are the high-level steps that the customer and VMware/Amazon take to hook it all up.

 

The thing to remember here is that your traffic to the VMware VPC is routed¬†through your customer VPC. Its ‘fronts’ the VMware VPC.

Link the vCenters together, and now you can use x-vmotion to move VMs back and forth. And remember, no NSX license is required on-prem to do this.

If you already have NSX, you can of stretch your NSX networks across. this allows live x-vmotions (cross-vcenter vmotion).

If you do not have NSX on-premise, you will deploy a locked-down NSX edge for bridging, but vmotions would be ‘cold’.

Encryption will be available between the Edge endpoints. No details on this yet.

As standard NSX edges are being used on both ends, you can do things like NAT, so you can do overlapping IP spaces if you so choose. That is not something native AWS VPC’s allow you to do.

Because your always have your own native AWS VPC, you can leverage any other native AWS service.

But you can do some crazy-cool things too, that will be familiar to native AWS users.  You can, for example, leverage regional native AWS services, for example S3, inside VMWonAWS VMs. These resources are connected inside AWS, using their own internal routing. So this kind of traffic does not neet to go back out over the internet.

VMs inside VMWonAWS can make use of the Amazon breakout for their internet connectivity. Or you can backflow it through your own on-premises internet.

Some additional notes on APIs:

There is no backup function built into this, so you are expected to backup your own VMs hosted inside VMWonAWS. Do facilitate this, the VADP API for backups  is available to leverage, as per normal.

Some notes on vSAN:

vSAN is used as underlying storage. All Flash. VMware does not yet know what the default setup of this will be in terms of FTT (failure To Tolerate_ level or dedupe. But you will have control over most of it, to decide for yourself what you want.

 

Tags: , , , ,

Thefluffyadmin on March 13th, 2017

Its been a very interesting year so far, career wise. Late last year I figured I had a good shot at the vExpert status. I had written some in-depth blog posts, covered a lot of ground with GSS and even discovered some unique bugs (some of which I have yet to blog about), and tweeted a fair bit. But last years highlight was definitely speaking at the VMWare summer school in Utrecht, ¬†on our experiences with Metro-Cluster. I reached out to over 15 VMware employees across GSS, PSO and the NSBU and every one was willing to sponsor me, which was very nice ūüôā

Having gotten to know some vExperts over the last year, one theme that kept drawing my attention was the exclusive Slack community they are given access too. Yes all¬†the free stuff is nice of course, and you can see many a blog post commenting on that it should not all ‘be about the swag‘, but I can honestly say¬†I don’t care all that much for that. For me its the networking that is by far the most interesting opportunity. And being somewhere lightly on the ‘spectrum’ and generally shy around people, a chat-room seemed more or less a perfect way to connect to other people with high-quality knowledge. And so it indeed has proven to be!
I was slightly worried that with over 1500 vExperts being selected this year (some people are none to happy about this), the chat would be a constant buzz of activity. Which could be good, or bad. But its actually relatively quiet most of the time. At least so far. To give you an idea, there are only about 550 people that are ‘in’ the main vExpert channel. But many of the other channels have far less participants.. vSAN: 200, NSX: 238, AWS: 100 :p ¬†Of those, 80% are lurkers… its just like IRC ūüėÄ

I have already had some very interesting discussions, and it was nice to see that even while feeling like a complete amateur¬†amid¬†all the ‘big names’ that frequent that chat, I still have actual valuable field experience to bring to the table, especially in regard to aspects of NSX. ¬†When the ‘Cr√®me de la cr√®me’ of the VMware community has next to no experience with, say, NSX load-balancing, then even lowly me can add value. And that makes me a lot less shy about participating!

Generally though, once you are part of the VMware community, and especially if you are a vExpert, that community  aspect of it really starts to become important. Seeing what people are talking and posting and tweeting about, being on the inside-track of a lot of those talks, mingling with thought-leaders and VMware product owners, pushes you to become even more involved. One place where this has really ignited in me, is podcasts. I used to listed to podcasts a lot. But for the last few months that interest has revived, and its revolving around the VMware community. I now even try to take the time to attend the live recordings of several. I will dedicated a separate post about my favorite podcasts to follow.

Ok, lets talk about swag anyway :p
The most interesting things in this lineup are a year free Pluralsight subscription, 35% off VMware Press titles and advanced previews and webinars about unreleased or upcoming technology. (for a complete list of the kind of benefits, this post is good) Certain companies like Rubrik, Pluralsight and Veeam really put an effort into supporting vExperts and offer software, training and other goodies for free. It seems like you get a lot of extra benefit from visiting VMworld, not just from a ‘stuff’ perspective, but mainly from a networking angle. ¬†But unfortunately it is by no means certain I can attend every year.

As for Pluralsight, their catalog is intimidating. I am looking to get more into Docker and associated things like Kubernetes. So these will be the first things I will look to. For example Getting Started with Docker ( https://www.pluralsight.com/courses/docker-getting-started ), which is a course given by @nigelpoulton who has also produced a short book  on Docker I highly recommend!

It seems that once you get into the vExpert community, it seems pretty straightforward to stay in it, year after year, as the¬†momentum of participation carries you forward. Whether it be through blog posts or speaking at events,¬†I have a feeling such things will tend to become a natural and expected part of being at ‘this level’. Lets hope I can keep it up. Well, speaking for the first time this year at the NLVMUG should certainly help ūüėČ

 

Tags:

Thefluffyadmin on February 28th, 2017

This post describes our experience with upgrading from EMC VPLEX VS2 to VS6 hardware, in a seamless non-disruptive fashion.

EMC VPlex is a powerful storage virtualization product and I have had several years of experience with it in a active-active metro-storage-cluster deployment. I am a big fan. Its rock-solid, very intuitive to use and very reliable if set up correctly. Check out these 2 videos to learn what it does.

Around August 2016, EMC released VPLEX VS6, the next generation of hardware for the VPLEX platform. In many aspects it is, generally, twice as fast, utilizing the latest Intel chipset and 16Gbe FC, with an Infiniband interconnect between the directors and a boatload of extra cache.

One of our customers recently wanted their VS2 hardware either scaled-out or replaced by VS6 for performance reasons. Going for a hardware replacement was more cost-effective than scaling out by adding more VS2 engines.

Impressively the in-place upgrade of the hardware could be done none-disruptively. This is achievable through the clever way the GeoSynchrony firmware is ‘loosely coupled’ from the hardware. The VS6 hardware is a significant upgrade over the VS2, yet they are able to run the same firmware version of¬†GeoSynchrony without the different components of VPLEX being aware of the fact. This is especially useful if you have VPLEX deployed in a metro-cluster.
So to prepare for a seamless upgrade from VS2 to VS6, your VS2 hardware needs to be on the S6 firmware. The exact same release as the VS6 hardware you will be transitioning to.

VPLEX consists of ‘Engines’ that house 2 ‘directors’. You can think of these as broadly analogous to the service processors in an array. With the main difference being that they are active-active. They share a cache and are able to handle i/o for the same LUN’s simultaneously. If you add another engine with 2 extra directors, now you have 4 directors all servicing the same workload and load-balancing the work.

Essentially the directors form a cluster together, directly over their infiniband, or in metro-cluster, also, partially, over fiber channel across the WAN. Because they are decoupled from the management plane, they can continue operating even when the management plane is temporarily not available. It also means that, if their firmware is the same, even though the underlying hardware is a generation apart, they can still form a cluster together without any of them noticing. ¬†This is what makes the non-disruptive upgrade, even in a metro-cluster configuration, possible. It also means that you can upgrade one side of the VPLEX metro-cluster separately, and a day or even a week apart from the other side. This makes planning an upgrade more flexible. There is a caveat however, and that is a possible slight performance hit on your wan-com replication between the VS2 and VS6 sides, so you don’t want to keep in that state for all too long.

 

VPLEX VS2 hardware. 1 engine consisting of 2 directors.


VS6 hardware. Directors are now stacked on top of each other. 

Because all directors running the same firmware are essentially equivalent, even though they might be of different hardware generations, you can almost predict what the non-disruptive hardware upgrade looks like. Its more or less the same procedure as if you where to replace a defective director. The only difference is that the old VS2 hardware is now short-circuited to the new VS6 hardware, which enables the new VS6 directors to take over i/o and replication from the old directors one at a time.

The only thing the frontend hosts and the backend storage ever notice, is temporarily losing half their storage paths. So naturally, you need to have your multipathing software on your hosts in order. This will most likely be EMC powerpath, which handles this scenario flawlessly.

The most impressive trick of this transfer, however, is that the new directors will seamlessly take over the entire ‘identity’ of the old directors. This includes -everything- unique about the director, including, crucially, the WWNs. ¬†This is important because transferring the WWNs is the very thing that makes the transition seamless. ¬†It does of course require you to have ‘soft zoning’ in place, in the case of FC. As a director port WWN will suddenly, in the space of about a minute, vanish from 1 port, and pop up on another port. But if you have your zoning set up correctly, you do not even have to touch your switches at all.

And yes, that does mean you need double cabling, at least temporarily. The old VS2 is of course connected to your i/o switches, and the new VS6 will need to be connected simultaneously on all its ports, during the upgrade process.

So have fun cabling those ūüėČ

That might be a bit of a hassle, but its a small price to pay for such a smooth and seamless transition.

To enable the old VS2 hardware, (which used FC to talk to his partner director over local-com), to talk to the new VS6 directors (which use Infiniband) during the migration, it is necessary to temporary insert an extra FC module into the VS6 directors. During a specific step in the upgrade process, the VS2 is connected to the VS6, and for a brief period, your i/o is being served from a combination of a VS2 and VS6 director that are sharing volumes and cache with eachother. This is a neat trick.

Inserting the temp IO modules:

As a final step, the old VS2 management server settings are imported to the new redundant VS6 management modules. In VS6, these management modules are now integrated into the director chassis, and act in a active-passive failover mode. This is a great improvement over the single on-redundant VS2 management server, with its single power supply (!)

 

Old Management Server:

New management modules:

The new management server hardware completely takes over the identity and settings of the old management server. This even includes IP address, customer cluster names and the cluster serial numbers. The VS6 will adopt the serial numbers of your VS2 hardware. This is important to know from a EMC support point-of-view and may confuse people.

The great advantage is that all local settings and accounts, and all monitoring tools and alerting mechanisms flawlessly work with the new hardware. For example we have a powershell script that uses the API to check the health status. This script worked immediately with the VS6 without having to change anything. Also VIPR SRM only need a restart of the VPLEX collector,whereafter it continued collecting without having to change anything.  The only thing I have found that did not get transferred where the SNMP trapping targets.
After upgrade, the benefit of the new VS6 hardware was immediately noticeable. Here is a graph  of average aggregate director CPU use, from EMC VIPR SRM:

As this kind of product is fundamental to your storage layer, its stability and reliability, especially during maintenance work like firmware and hardware upgrades, is paramount, and is taken seriously by EMC. Unlike other EMC products like VNX, you are not expected or indeed allowed to update this hardware yourself, unless you are a certified partner. ¬†Changes that need to be done to your VPLEX platform go through a part of EMC called the ‘Remote Pro-active’ team.¬†

There is a process that has to be followed which involves getting them involved early, a round of pre-validation health-checks, and the hands-on action of the maintenance job either remotely via webex, or on sight by local EMC engineers if that is required. A hardware upgrade will always require onsite personal, so make sure they deliver pizza to the datacenter! If an upgrade goes smoothly, expect it to take 4-5 hours. That includes all the final pre-checks, hardware work, cabling, transfer of management identity to the VS6, and decommissioning of the VS2 hardware.

In the end the upgrade was a great success, and our customer had zero impact. Pretty impressive for a complete hardware replacement of such a vital part of your storage infra.

Finally, here is the text of the September2016 VPLEX Uptime bulletin with some additional information about the upgrade requirements. Be aware that this may be deprecated, please consult with EMC support for the latest info.

https://support.emc.com/docu79516_Uptime-Bulletin:-VPLEX-Edition-Volume-23,-September-2016.pdf?language=en_US

There is an EMC community thread where people have been leaving their experiences with the upgrade, have a look here: https://community.emc.com/message/969664

 

Tags: , ,

When doing vSphere Metro Storage Cluster, on the shared storage layer, you often have a ‘fallback’ side. The LUN that will become authoritative for reading and writing in case of a site failure, or a split brain.

This makes VM storage placement on the correct Datastores rather important from an availability perspective.

Up till now, you had to manage intelligent VM storage-placement decisions yourself. And if you wanted the alignment of ‘compute’ -aka where the VM is running, in relation to where its storage falls back, then you also had to take care of this yourself through some kind of automation or scripting.

This problem would be compounded if you also wanted to logically group these storage ‘sides’ into SDRS clusters, which you often do, especially if you have many datastores.

In the past few years, mostly in regard to vSAN and vVOLs, VMware have been pushing the use of Storage Policies, and getting us thinking towards a model of VM-policy based storage management.

Wouldn’t it be great if you could leverage the new Storage Policies, to take care of your metro-cluster datastore placement? For example, by¬†tagging datastores, and building a policy around that.

And what if you could get SDRS to automate and enforce these policy-based placement rules?

The EnforceStorageProfiles advanced setting introduced in 6.0U2 seemed to promise to do this.

However, messing around with Storage Policies, Tagging and in particular that EnforceStorageProfiles advanced setting, I encountered some inconsistent and unexpected GUI and enforcement behavior that show we are just not quite there yet.

This post details my findings from the lab.

————————————-

The summery is as follows:

It appears that if you mix different self-tagged storage capabilities inside a storage-cluster, the cluster itself will not pass the Storage Policy compatibility check on any policy that checks for a tag that is not applied to all datastores in that cluster.

Only if all the datastores inside the storage-cluster share the same tag, will the cluster itself report itself as compatible.

This is despite applying that tag to the storage-cluster object itself! It appears that adding or not adding these tags to the storage-cluster object has no discernible effect on the Storage Compatibility check of the policy.

This contradicts the stated purpose and potential usefulness of the EnforceStorageProfiles advanced setting.

However, individual datastores inside the storage-cluster will correctly be detected as compliant or non-compliant based on custom tags.

The failure of the compatibility check on the storage-cluster will not stop you from provisioning a new VM to that datastore cluster, but the compatibility warnings you get only apply to 1 or more underlying non-compatible data stores. It does not tell you which though, so that can be confusing.

The Advanced setting EnforceStorageProfiles will effect storage-cluster initial placement recommendations, but will not result in SDRS movements on their own when the value is set to 1 (soft enforcement) .
Even EnforceStorageProfiles=2  (hard enforce) does not make SDRS automatically move a VMs storage from non-compatible to compatible datastores in datastore-cluster. It seems to only effect initial placement.  This appears to contradict the way the setting is described to function.

However, even soft enforcement will stop you from moving a VM manually to a non-complaint datastore within that storage-cluster,¬†even though you specified an SDRS override for that VM.¬†That is unexpected, and the kind of behavior one would only expect with a ‚Äėhard‚Äô enforce. Again, this is unexpected behavior.

This may mean that while SDRS will not move a VM that has already been placed,  to correct storage on its own accord after the fact, it will at least prevent the VM from moving to incorrect storage.

Summed up that means that as long as you get your initial placement right, EnforceStorageProfiles  will make sure the VMs storage at least stays there. But it won’t leverage SDRS to fix placements, as the setting appears to have meant to.

 

Now for the details and examples:
————–

I have 4 Datastores in my SDRS cluster:

I have applied various tags to these datastore objects, for example the datastores start with ‘store1’ received the following tags:

datastores start with ‘store2’¬†received the following tags:

The crucial difference here is the tag “Equalogic Store 1” vs “Equalogic Store 2

In the this default situation, the SDRS Datastore Cluster itself has no storage tags applied at all.

 

I have created a Storage Policy that is meant to match with datastores with the¬†“Equalogic Store 2” tag. ¬†The idea here is that I can assign this policy to VMs, so that inside that datastore cluster those VMs will always reside on ‘Store2’ datastores and not on ‘Store1’ datastores.

I plan to have SDRS (soft) enforce this placement using the advanced option EnforceStorageProfiles=1, introduced in vSphere vCenter Server 6.0.0b

 

 

The match for ‘Equalogic Store 2’¬† is the only rule in this policy.

 

But when I check the storage compatibility, neither the datastores¬†that have that tag nor the datastore cluster object shows up under the ‘Compatible’ listing.

However, under the ‘Incompatible’ listing, the Cluster shows up as follows:

Notice how the SDRS Cluster object has appeared to have ‘inherited’ the error conditions of both Datastores that do not have the tag.

This was unexpected.

In the available documentation for VM Storage Policies, I have not found any reference to SDRS Clusters directly. My main reference here is Chapter 20 of the vsphere-esxi-vcenter-server-601-storage-guide.  Throughout the documentation, only datastore objects themselves are referenced.

The end of chapter 8 of the¬†vsphere-esxi-vcenter-server-601-storage-guide ; ‘Storage DRS Integration with Storage Profiles’ – explains the use of the¬†EnforceStorageProfiles advanced setting.

 

 

The odd thing is, the¬†documentation for the¬†The PbmPlacementSolver data object (which I asume Storage Policy placement checker is utilizing)¬† even explicitly states that storage POD’s (SDRS Clusters) is a valid ‘Hub’ for checking against.

But it seems as if the ‘hub’ in the case of being an SDRS cluster, will produce an error for every underlying datastore that throws an error. In cases of mixed-capability datastores in a single SDRS Cluster, depending on how specific your storage profile is, chances are it will always throw an error.

So this seems contradictory!  How can we have an SDRS advanced setting that operates on a per-datastore bases, while the cluster object will likely always stop the compatibility check from succeeding?

 

As a possible workaround for these errors, I tried applying tags to the SDRS Cluster itself. ¬†I applied the¬†“Equalogic Store 1” and¬†“Equalogic Store 2” both to the SDRS Cluster object. The idea being that the compatibility check of the storage policy would never fail to match on either of these tags.

 

 

But alas, it seems to ignore tags you set on the SDRS Cluster itself.

Anyway, its throwing an error, but is it really stopping SDRS from taking the policy into account, or not?

 

Testing SDRS Behaviors

 

 

Provision a new VM

Selecting the SDRS Cluster, It throws the compatibility warning twice, without telling you which underlying datastores it is warning you about. That is not very useful!

However, it will deploy the VM without any issue.

When we check the VM, we can see that it has indeed placed the VM on a compatible Datastore

 

 

Manual Storage-vmotion to non-compliant datastore

In order to force a specific target datastore inside an SDRS Cluster, check the ‘Disable Storage DRS for this virtual machine’ checkbox. This will create an override rule for this VM specifically. ¬†When we do this and select a non-compatible datastore, it throws a warning, as we might expect. But as I have chosen to override SDRS recommendations completely here, I expect to be able to just power on through this selection.

 

No such luck. Remember that¬†EnforceStorageProfiles¬†is still set to only ‘1’, which is a soft enforcement. This is not the kind of behavior I expect from a ‘soft’ enforcement, especially not when I just specified that I wanted to ignore SDRS placement recommendations altogether!

I should be able to ignore these warnings, for above stated reasons. Its a bit inconsistent that I am still prevented from overriding!

There are 2 ways around this.

First of all you can momentarily turn off SDRS completely.

You must now choose a datastore manually. Selecting the non-compatible datastore will give the warning, as expected.

But now no enforcement takes place and we are free to move the VM wherever we want.

The other workaround, which is not so much a workaround, as it is the correct way of dealing with policy-based VM placement, is to change the policy. 
If you put the VMs policy back to default, it doesn’t care where you move it.

 

Storage DRS Movement Behaviors

When EnforceStorageProfiles=1  SDRS does not seem to move the VM, even if it is non-complaint.

Unfortunately, EnforceStorageProfiles=2 (hard enforce) does not change this behavior. I was really hoping here that it would automatically move the VM to the correct storage, but it does not, even when manually triggering SDRS recommendations.

Manual Storage-vmotion to compliant datastore

When the VM is already inside the storage-cluster, but on a non-complaint datastore , you would think it would be easy to get it back onto compliant datastore.
It is not. When you select the datastore-cluster object as the target, it will fault on the same error as manually moving it in the previous example. Рexplicit movements inside an SDRS-enabled cluster always require an override.

Create the override by selecting the checkbox again.

Dont forget to remove the override again, afterwards.

Manual Storage-vmotion from external datastore to the storage-cluster

Here, SDRS will respect the storage policy and recommend initial placement on the correct compliant datastores.


 

Conclusion.

Tag-based storage policies, and their use in combination with SDRS Clusters, appears to be buggy and underdeveloped. The interface feedback is inconsistent and unclear. As a result, the behavior of the EnforceStorageProfiles setting becomes unreliable.

Its hard to¬†think of a better used case for ¬†EnforceStorageProfiles ¬†than the self-tagged SDRS datastore scenario I tried in the lab. both vSAN and vVOL datastores do not benefit from this setting. It really only applies to ‘classic’ datatores in an SDRS cluster.

I have seen that self-tagging does not work correctly. But I have not yet gone to back to the original use-case of Storage Profiles: VASA properties. However, with VASA advertised properties you are limited to what the VASA endpoint is advertising. Self-tagging is far more flexible, and currently the only way I can give datastores a ‘side’ in a shared-storage metro-cluster design.

Nothing I have read about vSphere 6.5 so far, leads me to believe this situation has been improved. But I will have to wait for the bits to become available.

 

Tags: , , , , , , , , ,