VMworld final day and recap

I am a little delayed posting this final recap but better late than never.  The only reason I am posting this information is to encourage those would like to go to VMworld next year and help out those who could not attend VMworld. As you can see with my previous posts there is a lot of value in attending and the volume of knowledge you gain is tremendous.

My final day at VMworld started off by attending Chris Wahal & Jason Nash’s session ‘vSphere Distributed switches – Deep dive’. I have put out a separate blog post on it but it was a good session to have. I was looking for more technical depth but it still was a great session. I headed over to VMUG Leader Lunch thereafter and the VMUG leaders from various geographies met together and were also joined by the top brass of VMware – Pat Gelsinger, Raghu Rajaram, and Ben Fathi.

They took questions during the lunch and Mr. Gelsinger gave us an insight of where VMware is going in terms of future innovations and how they would further participate in VMUG activities. A new thing that will probably get announced tomorrow is the launch of VMTN – yes you read it right. The scope and depth of the program is unknown to us at this time but we are told it will be comprehensive in nature. So I look forward to that announcement either tomorrow or in the upcoming days. This is not very confidential information by any means and VMware has been focussed on launching VMTN as soon as they could.

The VMware executive also spoke about the future of vCloud Director and vCloud Automation Center. They clarified that vCD is not going away – it will be available only to service providers. vCAC is for the enterprise environment. All features of vCD except for multi-tenancy are available in vSphere 6.0 which is in public beta. So if you are interested to try it out go for it.

A couple of VMUG leaders enquired about the vCloud Air (formerly vCHS) announcement and hopefully as the infrastructure scales up more people will be able to leverage that in different ways. Further conversation occurred around the VMworld announcement of EVO Rail and EVO Rack. As publicly known now there is some level of overlap with vendors that support VMware platforms but that is a common industry trend. If VMware does not push the innovation in that area, either the vendors will be slow to innovate or else competitors will eat into that market. So look forward to some new stuff happening in the EVO area.

We finally ended the VMUG Leader Lunch with awards and one of the VMUG Board of Directors and a person I know for a while – Ravi Venkatasubbaiah won the VMUG President’s Award for exemplary leadership. Congrats to Ravi for this achievement.

I then headed over STO 1153 – Storage Performance Best Practices for Tier1 Applications on Virtual SAN. However, as was common at VMworld this year (unfortunately) the room was switched again and was in a different building a couple of blocks away. Reaching there would have wasted a further 15 minutes so I sat for another session instead – EUC 2551 – Architecture for Next Gen desktops. They spoke about enhancements to the Horizon Suite, VMware’s acquisition of Cloud Volumes and their strategy of further simplifying desktop deployment. I was more interested in hearing to the presenter so I didn’t take any notes on this one.

The final session I attended was INF3037 – How to build and deploy a well run Hybrid Cloud. The presenters spoke about hybrid cloud strategies – enhancements to architectural products, automation tools, and deployment software. Look forward to the presentations being shared post VMworld for all attendees.

The day ended with attending the VMware Canada Customer Reception party that was just across Moscone West at Jillians. I also headed out later for a private dinner with one of our vendor SE’s who was also in town for VMworld.

With a great level of learning and a lot more insight into VMware technologies I am satisfied and pleased that the conference was a success and brought valuable content to its attendees. I also networked with a few great individuals and am returning more enlightened on VMware and vendor technologies.

If you didn’t attend VMworld but would like to view the content – you can sign up for a subscription (last year it was $600) to get access to all VMworld content (presentations, sessions, etc). Not sure about VMworld lab content but I believe that will be available as well. VMUG Advantage membership ($200) last year also provided free access to VMworld content. So check out what’s available and go for it.

To all my friends who met me at VMworld – a shout out to atleast a few of you – Mathew Brender, Sean Thulin, Mark Browne, Jonathan Frappier, Angelo Luciani, Ravi Venkatasubbaiah, Irfan Ahmad, Rob Kyle, Peter Chang, Dwayne Lessner, Chris Halverson, Avram Woroch, Manjeet Bavage, Brandi Collins, Dave Henry – hope to see you again next year.

 

 

STO2496 – vSphere Storage Best Practices: Next-Gen Storage Technologies

This was a panel like session that wasn’t vendor specific but broadly gave pointers on new type of arrays – like all vSAN,SDRS, VVOLs, flash arrays, datastore types, and jumbo frame usage etc. It truly lived up to its name – not just by content but also by its duration. The session ran over its scheduled duration of 1 hour and actually finished in 1.5 hours but no one was complaining since there was a lot of interesting stuff.

Presenters – Rawlinson Rivera (VMware), Chad Sakac (EMC),  Vaughn Stewart (Pure Storage)

The session kicked off by talking about enabling simplicity in the storage environment. Some key points discussed were -

1) Use large datastores

  • NFS16Tb and VMFS 64Tb
  • Backup and restore times and objectives should be considered

2) Limit use of RDMs to when required for application support

3) Use datastore clusters and SDRS

  • Match Service Levels on all datastores on each datastore cluster
  • Disable SDRS IO Metric on all flash arrays and arrays with storage tiering

4) Use automated storage array services

  • Auto tiering for performance
  • Auto grow/extend for datastores

5) Avoid Jumbo frames for iSCSI and NFS

  • Jumbo frames provide performance gains with increased complexity and the improvement in storage technology no longer requires Jumbo frames

They spoke about the forms of Hybrid Storage and categorized them based on their key functionality -

  • Hybrid arrays – Nimble, Tintri, All modern arrays
  • Host Caches – PernixData, vFRC, SanDisk
  • Converged Infrastructure – Nutanix, vSAN, Simplivity

Benchmark Principles

Good Benchmarking is NOT easyYou need to benchmark over time – most arrays have some degree of behaviour variability over time

  • You need to look at lots of hosts, VMs – not a ‘single guest’ or ‘single datastore’
  • You need to benchmark mixed loads – in practice, all forms of IO will be flinging at the persistence layer
  • If you use good tools like SLDB or IOmeter – recognize that they are still artificial workloads, and make sure to configure them to drive out a lot of different workloads
  • With modern systems (particularly AFA’s  or all flash hyper-converged), its really, REALLY hard to drive sufficient load to saturate the system. Have a lot of workload generators (generating more than 20K IOPS out of a host isn’t easy)
  • Absolute performance more often than not is not the only design consideration

virtual disk formart can be IO bottleneck

 

Storage Networking Guidance

VMFS and NFS provide similar performance

  • FC, FCoE and NFS tend to provide slightly better performance than iSCSI

Always separate guest VM traffic from storage and VMkernel network

  • Converged infrastructures require similar separation as data is written to 1+ remote nodes

Recommendation: avoid Jumbo frames as risk via human error outweighs any gain

  • Goal is to increase IO while reducing host CPU
  • Ethernet is 1500 MTU
  • Jumbo frames are often viewed as 9000 MTU (9216)
  • FCoE auto negotiates to ‘baby/ – jumbo frame of 2112 MTU (2158)
  • Jumbo frames provide modes benefits in mixed workload clouds
  • TOE adapters can produce issues uncommon in software stacks

jumbo frame performance example

 

Jumbo Frame summary – Is it worth it ?

Large environments may derive the most benefit from Jumbo frames but are also the most difficult to maintain compliance

- All the steps need to align – on every device

Mismatched settings can severely hinder performance

- A simple human error will result in significant storage issue for a large environment

Isolate jumbo frames iSCSI traffic (e.g. backup/replication) – apply CoS/QoS

Unless you have control over all host/network/storage settings, best practice is to use standard 1500 MTU

The future – Path Maximum Transmission Unit Discovery (PMTUD) – It is an IP packet (L3 routers) whereas Jumbo frames are L2 (switches)

It is part of ICMP protocol (same protocol that has Ping, Traceroute, etc) and is available on all modern Operating Systems.

The speakers then got into Data Reduction technologies – they are the new norm (specially de-duplication in arrays)

Deduplication is generally good at reducing VM Binaries (OS and application files). Deduplication block size variances can be impacted by GOS file system fragmentation

  • 512B – Pure Storage
  • 4KB – NetApp FAS
  • 4KB – XtremIO
  • 16KB – HP 3Par
There is a major operational difference between Inline (Pure Storage, XtremIO type) and post-process (NetApp FAS, EMC VNX)
- The advice they provided is that try it yourself or talk to another customer (use VMUGs) – don’t take vendor claims seriously.
Compression is generally good at reducing storage capacity of applications
- Inline compression tends to provide moderate savings (2:1 common) but there are CPU/latency tradeoffs
Post process compression tends to provide additional savings (3:1 common)
Data reduction in Virtual disks
Thin, thick, and EZ-Thick VMDKs reduce to the same size
- Differences exist between array vendors but not between various disk types
T10 UNMAP is still not here in vSphere 5.5 – in the way people ‘expect’ – UNMAP is a SCSI command that allows to reclaim space from blocks that have been deleted by virtual machine.
- It is one of the rare cases where Windows is still ahead – but only in Windows Server 2012 R2
- Manual ‘vmkfstools -k’ option for vSphere 5.1 is available. See Cormac Hogan’s blog post by clicking on this link
- Manual ‘esxcli storage vmfs UNMAP’ in vSphere 5.5 can do > 2Tb volumes (a diagram depicting UNMAP of 15TB over 2 hours was displayed)
- Not all GOS zero properly which means you may not reclaim space properly via UNMAP
An entire set of Horizon specific and Citrix specific Best Practices to follow (vSphere config and GOS config)
Rawlinson who had stepped away from the stage as Chad and Vaughn spoke about storage stuff earlier, then came on to talk about VMware vSAN Best Practices
Network Connectivity
- 10GbE Preferred Speed (previously 1Gb connectivity used to be good enough. But vSAN works best if 10GbE connectivity is available – specifically because of the volume of data that travels over the network)
- Leverage vSphere Distributed Switches (vDS) – NIOC is not commonly used in most organizations but acts like SIOC where it performs QoS for the network traffic and throttles traffic to offer the best performance. Specifically, vDS offer the best flexibility and control over network performance with the feature set that is required in Enterprise environments.
Storage Controller Queue Depth – The Queue Depth setting is something that should not be setup manually anymore unless you are observing performance issues. VMware has specifically reviewed it and officially set it up at 256. In some environments however you may have a requirement to change. Just don’t change it for the sake of setting something up manually without allowing uninterrupted operation and associated monitoring of default values.
- Queue depth support of 256 or higher
- Higher storage controller queue depth will increase
  • Performance
  • Resynchronization
  • Rebuilding Operations
  • Pass-through node preferred
Disks and Disk Groups
  • Don’t mix disk types in a cluster for predictable performance
  • More disk groups are better than one

The session finally concluded at 6:30pm and after a few hand shakes everyone was on their way. But it was completely worthwhile and goes on to show why attending VMworld offers great insights that you cannot learn in a 4 day course. The structure and content of these sessions is not limited by any way.

 

NET2745 – Technical Deep dive on vSphere Distributed Switch

Presenters: Chris Wahl and Jason Nash

 

Both Chris and Jason are very well known in the virtualization industry for their IT expertise and it was great to receive some deep dive information on vSphere distributed switch. They both also hold a dual vCDX (VMware Design Expert) certification

Chris and Jason dived right into their session truly making it a deep dive. For those professionals who have worked with vSphere distributed switches a lot there wasn’t too much but they could always pay attention to best practices and design tips.

Each ESX host keeps local database that describes vDS /etc/vmware/dvsdata.db and it allows ESXi to run vDS as a simple vSwitch when vCenter is down.

Recommendation – Use Elastic ports – don’t set ports manually. For e.g. some people prefer to set their vDS configuration to specific ports instead of using Elastic Ports. Unless you have a pretty good technical reason this is not required.

vDS Quick tips

  • Use 802.1Q tags for port groups (don’t use native tagging)
  • Atleast 2 vmnics (uplinks) perVDS
  • A 2 x 10GbE configuration can work fine
  • Put QoS tagging in VDS or physical, not both
  • Use descriptive naming everywhere (e.g. use vlan, subnet, and possibly application)

 

Real world use cases

  • Migration VSS to VDS
  • Mixing 1Gb and 10Gb links inside one distributed switch
  • Handling vMotion saturation
  • Controlling vSphere Replication bandwidth
  • Doing QoS tagging
  • Load based teaming vs Link Aggregation

 

Don’t try and pin any traffic to one specific uplink

Rename uplinks and use all uplinks in the same way – e.g. with physical links

Multiple vMotion host saturation

In vDS Port Group settings —> traffic shaping —> ingress and Egress shaping  can avoid saturation- DRS may cause vMotion saturation.

  • Egress – goes in the host. leaving vDS to the host
  • Ingress – goes out of the host

 

Set the average bandwidth and peak bandwidth to the same value. Using QoS we can control traffic shaping

NIOC —-> use this feature which is available under Networking -> Select Port Group —> manage – resource allocation.

If you have more bandwidth during the evening and less during the day you can change traffic direction.

Priority based Flow Control – try to use in UCS PFC – 802.1Qbb (use within vDS)

QoS tips (ideal to use on 10Gb network)

KISS – it solves contention

Pick a place to tag traffic – virtual or physical (don’t do it at both places)

Don’t enforce QoS in many ways

Use clearly defined tagging

 

Edit resource pool and layer 2 QoS is available there. Also Traffic filtering and marking available in the Port Group setting in web client

A few slide pictures that I took at the session -

vSphere Distributed Switch vDS with mixed NIC speeds  Segmenting Port Groups

 

VMworld Day 2 recap

Based on how Day1 went, it was logical that the entire VMworld conference was going to be hectic. I began my Tuesday with the session STO 3161 – What can VVOLs do for you.

It was a good session but not as technical as I expected. Usually Matt Cowger and Suzy Viswanathan get very technical but this time around they just skimmed the surface with the required information.

They spoke about current challenges like -

  • Extensive manual book keeping to match VMs to LUNs
  • LUN granularity hinders per VM SLA
  • Over provisioning
  • Waster resources, time, and high costs
  • Frequent Data migrations

Covered VVOL architecture on

  • Out of band lifecycl operations between ESXi and VASA provider, Create, Delete etc
  • VASA Provider handles virtual volumes namespace and mapping for array
  • Virtual volume presented for Block or File IO

High Level Architecture

  • No Filesystem
  • ESX manages array through VASA APIs.
  • Arrays are logically partitioned into containers called storage containers
  • VM disks called Virtual volumes stored natively on the storage containers
  • IO from ESX to array is addressed through an access point called Protocol Endpoint (PE)
  • Data services are offloaded to the array
  • Managed through storage policy based management framework

Spoke about VASA and VVOLs do together

  • VASA does not provision any LUN – it only interacts with the array – protocols FC, NFS, iSCSI, FCoE
  • We configure a VMware protocol endpoint – no storage is associated with it. The storage pools host VVOL storage containers.
  • With VVOL – we can offload a hypervisor managed snapshot to the array (per VM snapshots)
  • Array Managed cloning – Even better than VAAI XCopy
  • We can take an entire VVOL and duplicate just that part of volume
  • Manage applications by service level objectives via policy based automation -
  • Provision VASA storage provider
  • Create VVOL datastore
  • Define storage policy
  • Provision a VM

Then I headed off to the Solutions Pavilion to speak to a couple of Experts – Ramesh Venkatasubramaniam and we spoke about vSAN, vCOPS, Log Insight, desktone, and ITBM

I also met with Mukesh Hira to talk about vSphere Distributed switches. By the time I was done with all this I had already missed lunch but survived till I could get my hands on some food. Walking a lot at VMworld makes you really hungry and also saps out your energy for short periods.

Another problem that was a problem from my standpoint this year at VMworld was the number of sessions that were re-scheduled for other rooms or locations. Problem is that I had all the rooms sorted out in my calendar and at the last minute I had to go around finding which rooms the sessions got relocated to. Some of them were still in the same building but others were moved two blocks away to the Marriott Hotel and walking back and forth was a pain.

I missed my session STP3229 – Guide to protect your cloud investment as a result. Luckily this session was only for 30 minutes so I hope to catch this up once the presentations are posted after VMworld to their website.

After lunch, I headed over to the Solutions Pavilion and met more vendors – especially new ones I hadn’t heard of or met, so that I could understand their products and see if anything innovative came out. One such product that I really liked was Atlantis USX – and they won ‘Best of VMworld 2014′ award so their technology was worth reviewing. I sat through their product demonstration and spoke to the Canada rep as well.

I then scrambled across to attend #INF1864 – Software Defined Storage (What’s next) from Chad Sakac. I have made it a tradition to listen to Chad’s future looking sessions because they are extremely informative and being an EMC executive he has a lot more awareness of upcoming technologies that he likes to share. I have blogged on that in a separate post so feel free to check it out.

The final session of the day was #STO2496 – vSphere Storage best practices – Next Gen Storage technologies – with Chad Sakac (EMC), Vaughn Stewart (Pure Storage), and Rawlinson Riviera (VMware vSAN). There was a brief statement by someone from Cisco as well who spoke about avoiding Jumbo frames and that it’s no longer important to use Jumbo frames and it came straight from Cisco so it was worth noting.

All of them gave a very informative presentation which I am going to post by way of slides. I did not write much in that session and was focussed on just hearing stuff. This was a session where you had to understand and grasp more than note stuff.

They talked big time and the 1 hour session stretched by another 30 minutes but I wasn’t missing anything. I was happy to delay attending my reception parties in favour of this conversation which is highly important if you are at VMworld. Chad and Vaughn kept the focus on non-marketing and independent storage technology discussions. Rawlinson spoke about vSAN and its improvements.

All in all it was a great day. I ended the day by going to the vExpert/vCDX reception. The party was attended by VMware CEO – Pat Gelsinger and he gave a brief note of thanks to the vExpert and vCDX community. John Arrasjid was honoured by Pat Gelsinger for his role in growing the vCDX community. John is also leaving VMware to work with the EMC CTO office. Pat launched a new vCDX certification on the network side (believe it was NSX related) and more details will be out soon. They honoured the first vCDX as well and after that the speeches ended. I also went to the Veeam customer event and ended my day thereafter.

 

(Photo credit – Sean Thulin)

INF1864 – Software Defined Storage (What’s Next)

Chad Sakac is a very dynamic EMC executive with a reputation in the industry of being highly knowledgeable, technical, and innovative professional. He talked about the next steps in Software defined storage in the session INF1864 and I was right there to capture some key information to expand on my Storage Architecture knowledge.

He started off his talk by highlighting a few things and then spoke about the upcoming new technologies.

What is software defined

  • Decoupling and abstracting control and policy from physical stuff that does the work
  • Where the physical stuff that does work (data plane) can be software on commodity hardware – e.g. VSAN, VASA
  • Programmable infrastructure APIs: automate everything

 

Four ‘Data Plane’ architectures

  • Clustered Scale Up & Down e.g. Nimble, VNX, Nexenta
  • Tightly Coupled Scale out clusters – e.g. Hitachi, 3Par
  • Loosely Coupled Scale out -e.g. vSAN, ScaleIO, Nutanix, Simplivity
  • Distributed share nothing  e.g Swift, AWS S3

 

What to expect next

  • SDS control planes maturing
  • VVOL
  • ViPR 2.0
  • Cinder
  • SDS data services moving to real world use
  • Some Data Services become ‘features’
  • Acceleration of ‘Old App’ Hyper-Converged Stacks
  • Acceleration of ‘New App’ Hyper-Converged Stacks

VVOL EMC update

  • VNXe3200 will be first to to use VVOL
  • VMAX3 will be right behind VNXe
  • VPLEX will support VVOLs (for VERY async vMotion)
  • ViPR Controller will support VVOL control
  • XtremIO will support VVOL
  • ScaleIO will support VVOL
  • VNX will get VVOL through same path as VNXe
  • C4 – storage containerization that is adopted by VNX as a codestack. Used by entire VNX family

If you are testing the vSphere 6 beta release and trying out VVOLs then feel free to forward any feedback to veena.joshi@emc.com

Now EMC offers ScaleIO as a vCenter plugin to quickly deploy it. The plugin automatically goes through and deploys nodes and clients to access the underlying storage after some regular configuration stpes.

If you need to store stuff beyond VMDK’s then you can use ScaleIO vs vSAN (since a lot of people are thinking its the same as vSAN)

RecoverPoint for VMs 

download fully functional EMC recover point at no charge with no time limit – October 2014.

For image archive- An S3 compliant datastore is the best option to store data

‘Phoenix’ Server Hardware – designed for EVO Rail – It is the only server that EMC sells and since it is not in the server business there is usually not much discussion around it.

It is a 2U server with internal disk and you can google it for more details.

Big Trends

  • Increasing application diversity
  • World of how apps run is geting more diverse
  • SDS + commodity HW is awesome and belongs in many places but tightly coupled architectures will bias to appliance

The session ran slightly over time (Chad is so passionate while talking and takes questions so his sessions usually tend to go long) and we just glimpsed through the last slide.