Welcome 2015. Welcome RecoverPoint for VMs

EMC has now raised curtains formally and in a big way for their RecoverPoint for VM software. Those familiar with the traditional RecoverPoint appliances know how robust the replication appliances are. I am hoping its the same through the software. I will be trying it out very soon in our my own lab environment and will post about he results after that.

But, right now what is important is that RecoverPoint for VM is free. That’s right – it is free for all non production environments.  This is not a trial – it is the full blown version for unlimited time. This provides sufficient time for testing and ensuring it meets your needs – before you buy.

That means a ton of user requirements can be tested as long as you can provide required disk space (for replication journalling). Usually 5-10% of source volume during testing and in production it is usually 15-20% of source volume depending on how long you want of a data journal protection you want – i.e. PVR like functionality.

The link to the software download page is here

You will need to register to download the software.

The RecoverPoint for VMs download package contains the following content:

• Complete binary image of the product
• The binary image of the Deployment Manager
• RecoverPoint for VMs Documentation Kit which includes

• FAQs,
• Try Before You Buy Readme,
• Preliminary Sizing Guide,
• Installation & Deployment Guide,
• Quick Start Poster, and
• Administrator’s Guide

Environmental Requirements:

VMware Infrastructure: VMware vCenter and ESXi Servers, release 5.1 U1 or later
Network Infrastructure: 4 virtual networks (LAN, WAN, 2 iSCSI); 1 software iSCSI adapter per ESXi node with 2 VMKernel ports
RecoverPoint for VMs: Minimum of two virtual appliances per cluster
Virtual Appliance Configuration Options:

• 2 vCPUs / 4 GHz, 4 GB RAM, 80 GB storage
• 4 vCPUs / 9 GHz, 4 GB RAM, 80 GB storage
• 8 vCPUs / 17 GHz, 8 GB RAM, 80 GB storage

vSphere Performance Concepts and Troubleshooting at MBVMUG

Do you know enough about vSphere Performance concepts and troubleshooting to handle performance problems. Are you able to identify specific metrics that should be reviewed when disk latency goes high.

Have you started testing vSphere Flash Read Cache or know how to troubleshoot issues related to it. Want to learn about vNUMA or learn how best to use esxtop.

Get these and a lot of other questions answered when you attend the upcoming Manitoba VMUG Meeting on October 9, 2014 at Delta Winnipeg.

Click here for details and to register - http://www.vmug.com/p/cm/ld/fid=7843

VMworld final day and recap

I am a little delayed posting this final recap but better late than never.  The only reason I am posting this information is to encourage those would like to go to VMworld next year and help out those who could not attend VMworld. As you can see with my previous posts there is a lot of value in attending and the volume of knowledge you gain is tremendous.

My final day at VMworld started off by attending Chris Wahal & Jason Nash’s session ‘vSphere Distributed switches – Deep dive’. I have put out a separate blog post on it but it was a good session to have. I was looking for more technical depth but it still was a great session. I headed over to VMUG Leader Lunch thereafter and the VMUG leaders from various geographies met together and were also joined by the top brass of VMware – Pat Gelsinger, Raghu Rajaram, and Ben Fathi.

They took questions during the lunch and Mr. Gelsinger gave us an insight of where VMware is going in terms of future innovations and how they would further participate in VMUG activities. A new thing that will probably get announced tomorrow is the launch of VMTN – yes you read it right. The scope and depth of the program is unknown to us at this time but we are told it will be comprehensive in nature. So I look forward to that announcement either tomorrow or in the upcoming days. This is not very confidential information by any means and VMware has been focussed on launching VMTN as soon as they could.

The VMware executive also spoke about the future of vCloud Director and vCloud Automation Center. They clarified that vCD is not going away – it will be available only to service providers. vCAC is for the enterprise environment. All features of vCD except for multi-tenancy are available in vSphere 6.0 which is in public beta. So if you are interested to try it out go for it.

A couple of VMUG leaders enquired about the vCloud Air (formerly vCHS) announcement and hopefully as the infrastructure scales up more people will be able to leverage that in different ways. Further conversation occurred around the VMworld announcement of EVO Rail and EVO Rack. As publicly known now there is some level of overlap with vendors that support VMware platforms but that is a common industry trend. If VMware does not push the innovation in that area, either the vendors will be slow to innovate or else competitors will eat into that market. So look forward to some new stuff happening in the EVO area.

We finally ended the VMUG Leader Lunch with awards and one of the VMUG Board of Directors and a person I know for a while – Ravi Venkatasubbaiah won the VMUG President’s Award for exemplary leadership. Congrats to Ravi for this achievement.

I then headed over STO 1153 – Storage Performance Best Practices for Tier1 Applications on Virtual SAN. However, as was common at VMworld this year (unfortunately) the room was switched again and was in a different building a couple of blocks away. Reaching there would have wasted a further 15 minutes so I sat for another session instead – EUC 2551 – Architecture for Next Gen desktops. They spoke about enhancements to the Horizon Suite, VMware’s acquisition of Cloud Volumes and their strategy of further simplifying desktop deployment. I was more interested in hearing to the presenter so I didn’t take any notes on this one.

The final session I attended was INF3037 – How to build and deploy a well run Hybrid Cloud. The presenters spoke about hybrid cloud strategies – enhancements to architectural products, automation tools, and deployment software. Look forward to the presentations being shared post VMworld for all attendees.

The day ended with attending the VMware Canada Customer Reception party that was just across Moscone West at Jillians. I also headed out later for a private dinner with one of our vendor SE’s who was also in town for VMworld.

With a great level of learning and a lot more insight into VMware technologies I am satisfied and pleased that the conference was a success and brought valuable content to its attendees. I also networked with a few great individuals and am returning more enlightened on VMware and vendor technologies.

If you didn’t attend VMworld but would like to view the content – you can sign up for a subscription (last year it was $600) to get access to all VMworld content (presentations, sessions, etc). Not sure about VMworld lab content but I believe that will be available as well. VMUG Advantage membership ($200) last year also provided free access to VMworld content. So check out what’s available and go for it.

To all my friends who met me at VMworld – a shout out to atleast a few of you – Mathew Brender, Sean Thulin, Mark Browne, Jonathan Frappier, Angelo Luciani, Ravi Venkatasubbaiah, Irfan Ahmad, Rob Kyle, Peter Chang, Dwayne Lessner, Chris Halverson, Avram Woroch, Manjeet Bavage, Brandi Collins, Dave Henry – hope to see you again next year.



STO2496 – vSphere Storage Best Practices: Next-Gen Storage Technologies

This was a panel like session that wasn’t vendor specific but broadly gave pointers on new type of arrays – like all vSAN,SDRS, VVOLs, flash arrays, datastore types, and jumbo frame usage etc. It truly lived up to its name – not just by content but also by its duration. The session ran over its scheduled duration of 1 hour and actually finished in 1.5 hours but no one was complaining since there was a lot of interesting stuff.

Presenters – Rawlinson Rivera (VMware), Chad Sakac (EMC),  Vaughn Stewart (Pure Storage)

The session kicked off by talking about enabling simplicity in the storage environment. Some key points discussed were -

1) Use large datastores

  • NFS16Tb and VMFS 64Tb
  • Backup and restore times and objectives should be considered

2) Limit use of RDMs to when required for application support

3) Use datastore clusters and SDRS

  • Match Service Levels on all datastores on each datastore cluster
  • Disable SDRS IO Metric on all flash arrays and arrays with storage tiering

4) Use automated storage array services

  • Auto tiering for performance
  • Auto grow/extend for datastores

5) Avoid Jumbo frames for iSCSI and NFS

  • Jumbo frames provide performance gains with increased complexity and the improvement in storage technology no longer requires Jumbo frames

They spoke about the forms of Hybrid Storage and categorized them based on their key functionality -

  • Hybrid arrays – Nimble, Tintri, All modern arrays
  • Host Caches – PernixData, vFRC, SanDisk
  • Converged Infrastructure – Nutanix, vSAN, Simplivity

Benchmark Principles

Good Benchmarking is NOT easyYou need to benchmark over time – most arrays have some degree of behaviour variability over time

  • You need to look at lots of hosts, VMs – not a ‘single guest’ or ‘single datastore’
  • You need to benchmark mixed loads – in practice, all forms of IO will be flinging at the persistence layer
  • If you use good tools like SLDB or IOmeter – recognize that they are still artificial workloads, and make sure to configure them to drive out a lot of different workloads
  • With modern systems (particularly AFA’s  or all flash hyper-converged), its really, REALLY hard to drive sufficient load to saturate the system. Have a lot of workload generators (generating more than 20K IOPS out of a host isn’t easy)
  • Absolute performance more often than not is not the only design consideration

virtual disk formart can be IO bottleneck


Storage Networking Guidance

VMFS and NFS provide similar performance

  • FC, FCoE and NFS tend to provide slightly better performance than iSCSI

Always separate guest VM traffic from storage and VMkernel network

  • Converged infrastructures require similar separation as data is written to 1+ remote nodes

Recommendation: avoid Jumbo frames as risk via human error outweighs any gain

  • Goal is to increase IO while reducing host CPU
  • Ethernet is 1500 MTU
  • Jumbo frames are often viewed as 9000 MTU (9216)
  • FCoE auto negotiates to ‘baby/ – jumbo frame of 2112 MTU (2158)
  • Jumbo frames provide modes benefits in mixed workload clouds
  • TOE adapters can produce issues uncommon in software stacks

jumbo frame performance example


Jumbo Frame summary – Is it worth it ?

Large environments may derive the most benefit from Jumbo frames but are also the most difficult to maintain compliance

- All the steps need to align – on every device

Mismatched settings can severely hinder performance

- A simple human error will result in significant storage issue for a large environment

Isolate jumbo frames iSCSI traffic (e.g. backup/replication) – apply CoS/QoS

Unless you have control over all host/network/storage settings, best practice is to use standard 1500 MTU

The future – Path Maximum Transmission Unit Discovery (PMTUD) – It is an IP packet (L3 routers) whereas Jumbo frames are L2 (switches)

It is part of ICMP protocol (same protocol that has Ping, Traceroute, etc) and is available on all modern Operating Systems.

The speakers then got into Data Reduction technologies – they are the new norm (specially de-duplication in arrays)

Deduplication is generally good at reducing VM Binaries (OS and application files). Deduplication block size variances can be impacted by GOS file system fragmentation

  • 512B – Pure Storage
  • 4KB – NetApp FAS
  • 4KB – XtremIO
  • 16KB – HP 3Par
There is a major operational difference between Inline (Pure Storage, XtremIO type) and post-process (NetApp FAS, EMC VNX)
- The advice they provided is that try it yourself or talk to another customer (use VMUGs) – don’t take vendor claims seriously.
Compression is generally good at reducing storage capacity of applications
- Inline compression tends to provide moderate savings (2:1 common) but there are CPU/latency tradeoffs
Post process compression tends to provide additional savings (3:1 common)
Data reduction in Virtual disks
Thin, thick, and EZ-Thick VMDKs reduce to the same size
- Differences exist between array vendors but not between various disk types
T10 UNMAP is still not here in vSphere 5.5 – in the way people ‘expect’ – UNMAP is a SCSI command that allows to reclaim space from blocks that have been deleted by virtual machine.
- It is one of the rare cases where Windows is still ahead – but only in Windows Server 2012 R2
- Manual ‘vmkfstools -k’ option for vSphere 5.1 is available. See Cormac Hogan’s blog post by clicking on this link
- Manual ‘esxcli storage vmfs UNMAP’ in vSphere 5.5 can do > 2Tb volumes (a diagram depicting UNMAP of 15TB over 2 hours was displayed)
- Not all GOS zero properly which means you may not reclaim space properly via UNMAP
An entire set of Horizon specific and Citrix specific Best Practices to follow (vSphere config and GOS config)
Rawlinson who had stepped away from the stage as Chad and Vaughn spoke about storage stuff earlier, then came on to talk about VMware vSAN Best Practices
Network Connectivity
- 10GbE Preferred Speed (previously 1Gb connectivity used to be good enough. But vSAN works best if 10GbE connectivity is available – specifically because of the volume of data that travels over the network)
- Leverage vSphere Distributed Switches (vDS) – NIOC is not commonly used in most organizations but acts like SIOC where it performs QoS for the network traffic and throttles traffic to offer the best performance. Specifically, vDS offer the best flexibility and control over network performance with the feature set that is required in Enterprise environments.
Storage Controller Queue Depth – The Queue Depth setting is something that should not be setup manually anymore unless you are observing performance issues. VMware has specifically reviewed it and officially set it up at 256. In some environments however you may have a requirement to change. Just don’t change it for the sake of setting something up manually without allowing uninterrupted operation and associated monitoring of default values.
- Queue depth support of 256 or higher
- Higher storage controller queue depth will increase
  • Performance
  • Resynchronization
  • Rebuilding Operations
  • Pass-through node preferred
Disks and Disk Groups
  • Don’t mix disk types in a cluster for predictable performance
  • More disk groups are better than one

The session finally concluded at 6:30pm and after a few hand shakes everyone was on their way. But it was completely worthwhile and goes on to show why attending VMworld offers great insights that you cannot learn in a 4 day course. The structure and content of these sessions is not limited by any way.


NET2745 – Technical Deep dive on vSphere Distributed Switch

Presenters: Chris Wahl and Jason Nash


Both Chris and Jason are very well known in the virtualization industry for their IT expertise and it was great to receive some deep dive information on vSphere distributed switch. They both also hold a dual vCDX (VMware Design Expert) certification

Chris and Jason dived right into their session truly making it a deep dive. For those professionals who have worked with vSphere distributed switches a lot there wasn’t too much but they could always pay attention to best practices and design tips.

Each ESX host keeps local database that describes vDS /etc/vmware/dvsdata.db and it allows ESXi to run vDS as a simple vSwitch when vCenter is down.

Recommendation – Use Elastic ports – don’t set ports manually. For e.g. some people prefer to set their vDS configuration to specific ports instead of using Elastic Ports. Unless you have a pretty good technical reason this is not required.

vDS Quick tips

  • Use 802.1Q tags for port groups (don’t use native tagging)
  • Atleast 2 vmnics (uplinks) perVDS
  • A 2 x 10GbE configuration can work fine
  • Put QoS tagging in VDS or physical, not both
  • Use descriptive naming everywhere (e.g. use vlan, subnet, and possibly application)


Real world use cases

  • Migration VSS to VDS
  • Mixing 1Gb and 10Gb links inside one distributed switch
  • Handling vMotion saturation
  • Controlling vSphere Replication bandwidth
  • Doing QoS tagging
  • Load based teaming vs Link Aggregation


Don’t try and pin any traffic to one specific uplink

Rename uplinks and use all uplinks in the same way – e.g. with physical links

Multiple vMotion host saturation

In vDS Port Group settings —> traffic shaping —> ingress and Egress shaping  can avoid saturation- DRS may cause vMotion saturation.

  • Egress – goes in the host. leaving vDS to the host
  • Ingress – goes out of the host


Set the average bandwidth and peak bandwidth to the same value. Using QoS we can control traffic shaping

NIOC —-> use this feature which is available under Networking -> Select Port Group —> manage – resource allocation.

If you have more bandwidth during the evening and less during the day you can change traffic direction.

Priority based Flow Control – try to use in UCS PFC – 802.1Qbb (use within vDS)

QoS tips (ideal to use on 10Gb network)

KISS – it solves contention

Pick a place to tag traffic – virtual or physical (don’t do it at both places)

Don’t enforce QoS in many ways

Use clearly defined tagging


Edit resource pool and layer 2 QoS is available there. Also Traffic filtering and marking available in the Port Group setting in web client

A few slide pictures that I took at the session -

vSphere Distributed Switch vDS with mixed NIC speeds  Segmenting Port Groups