large cluster using some virtualization?

Support forum for the ViciBox ISO Server Install and ISO LiveCD Demo

Moderators: enjay, williamconley, Staydog, mflorell, MJCoate, mcargile, Kumba

large cluster using some virtualization?

Postby dragn-fire » Mon Jul 10, 2017 8:40 am

I know it's not recommended to look into using virtualization for production, but is some of a cluster setup safe for virtualization? We're looking at setting up a 1000 seat dialer (actually 2, 2 locations for redundancy). I know the DB server will need to be quite the beast with multiple load-balanced webservers etc... but instead of looking at a 40+ server setup can any portion be virtualized to save in space and other components physical? I would be using vicibox 7.04.

Thanks

DF
dragn-fire
 
Posts: 6
Joined: Mon Aug 11, 2014 1:53 pm

Re: large cluster using some virtualization?

Postby mflorell » Mon Jul 10, 2017 9:48 am

You will actually need more hardware(and spend more money for it) when you virtualize the OS in order to reliably get the same capacity in VICIdial.

Virtualization adds more points of failure and more underlying system load, as well as adding a CPU blocking mechanism to the mix.

We've consulted with a lot of companies that have tried, and they all run into the same roadblocks. The ones that do end up pushing forward with it end up with much lower capacity per dialer and a lot more machines. They also have a much less stable system that is susceptible to a lot more problems with voice quality and VICIdial actions being executed reliably.
mflorell
Site Admin
 
Posts: 18379
Joined: Wed Jun 07, 2006 2:45 pm
Location: Florida

Re: large cluster using some virtualization?

Postby buns » Fri Apr 13, 2018 8:26 am

Hi,

I use VMWare and Proxmox up to 450 seats with no problems.
Vicidial Consultant in France and Indian Ocean. http://vicidial.fr
buns
 
Posts: 14
Joined: Thu Nov 23, 2017 1:59 am

Re: large cluster using some virtualization?

Postby frequency » Tue Apr 17, 2018 2:32 pm

Let me tell you about my experience regarding virutalization.

I am usually using KVM environment for small clients only, who will be running mostly 10 agents, max 100 calls and the servers is all-in-one.

I have used the similar setup on a multiple 400 seats + with web servers on virtualized while phone on dedicated servers, dialing mostly on a NVMe "Private" openstack cloud. The cloud is a 128+ cores skylake and kabylake servers setup. I learn't the following from all this experience over a kvm environment.

1 They would work best for dialers only. Not for phone or dedicated or slave reporting server. WEB only if they are using same network as DB and at a distance of max 2ms.
2. 100 calls load on a virtualized server wtih 8 cores with AMD would be close to 4.00 load average meaning half the capacity of the server is utilized, meanwhile same sort of dedicated server with 4c/8t can make 180-200 calls with same load average.
3. Virtualized servers would usually take asterisk as %sy load while dedicated won't.

In short, you won't be reducing servers but increasing at the end of the day.
It is better to use some old haswell servers, they are as powerful as 6th gen (I mean almost +/- 15%). they are cheap, readily available and cheap spares in market. you cannot go for low-end SSDs but enterprise level SSD's and NVMe's only which can cope high writes and IOPS at all times (for both DB and slave servers specially)

If you are using a public cloud, i would not recommend going for anything because you don't control any of the server or customers or even bandwidth. They are good but experimenting 1000 people over a public cloud would not be great.
frequency
 
Posts: 117
Joined: Mon Jun 13, 2016 11:18 am

Re: large cluster using some virtualization?

Postby Kumba » Wed Apr 18, 2018 8:10 pm

Pretty much what Frequency just described. The first issue you run into with Virtualization is with the RTP audio. This works on a 20-ms timing period, or P-Time. This means that the audio stream is chopped up into 20-ms chunks of audio and sent one chunk per packet to the far end. You are also receiving a packet every 20-ms from the far end with their audio stream. This is why you can get one-way audio because one side of the stream is making it to the other side but not receiving a stream themselves. This results in roughly 100 packets per second per remote channel. Each agent is a remote channel and each customer call attempt is a remote channel.

So if you have 20 agents, dialing 50 lines, you have potential for 100,000 packets per second. On outbound this is skewed somewhat and probably more like 60,000 packets per second due to pre-answer and ringing but on inbound this is exactly the kind of traffic you'll be seeing on a single dialer. Now lets say you have 4 guests on that host each doing 20 agents at 50 lines, that means you have the Host OS receiving and handling 400,000 packets per second. Since all computer networks inherently have some measurable form of lag on them you have to buffer this transfer with a jitter buffer. A good jitter buffer will be 80 to 100 ms or less. This means you have 80 to 100 ms of lag that you can compensate for from end to end on this call because your agent or your customer will start to have audio quality issues. Bad jitter generally will sound like either skips/cut-outs in audio or like the person is under water. Packet loss usually sounds like chirps or blips or stutters. This isn't always universal but a pretty good rule of thumb when you hear audio quality issues.

Now lets go back to this VM host. ALL virtualization by it's very nature operates on a sort of time slice CPU sharing mechanism. So when the CPU is underutilized everything can run at near realtime, almost like it was on baremetal hardware. This works great for development where you are more focused on features and general testing since you have no real load to speak of. I myself develop ViciBox and other ViciDial things using Virtual Box because it's fast and easy.

The problem comes into play when you start introducing real load to the guests in production. What happens is once the host reaches saturation (Host Overhead + VM Overhead) you will start seeing your guests put into small suspended states so that the other guests can run. Now we have an 80 to 100 ms jitter buffer that can be used to help buffer this, but that assumes that the guest VM is suspended for less then 80 to 100 ms and when it does start running again that the CPU can play catch up fast enough before it gets suspended again. So for every 20 ms a guest is suspended that's 140 packets (20 agents plus 50 customers) the CPU hasn't processed that it now does have to process in addition to the 140 packets that were behind it. This also reduces your allowable jitter to 60 ms (100ms jitter buffer minus the 20ms of suspended packets plus the normal delivery of the next 20ms packets). If the CPU and I/O subsystem in the guest/host doesn't have enough power to deliver those 280 packets, reassemble them into a coherent stream, do codec translation, and deliver them to the endpoints in 60 ms you get audio issues.

This also results in a sort of load oscillation because now you go from processing a more or less steady stream of 140-packets per second every 20 ms to now processing twice the number of packets, which causes the guest to use MORE cpu, which causes the host to try and allocate more CPU, which results in it potentially putting other guests in a suspended state to provide that CPU power. Now add to this scenario Asterisk, which is known for being somewhat troublesome on it's own at high loads. When asterisk gets loaded up it tends to lock up and start causing CPU spikes on it's own. CPU spikes and high load generally does not work well on Virtualization whereas bare metal hardware will just keep running instead of potentially being stopped and eventually catch up transparently by virtue of the jitter buffer.

After a long enough time the spikes can get bad enough that you progress from audio issues to packet loss and strange program behaviour. Usually onces the realtime clock starts to skew you see issues with Asterisk missing or dropping calls, actions not happening like hanging up a call, the web interface may delay or not see a click action from the users browser, etc. Basically you get all the tell-tale signs of an overloaded system, but it happens at a significantly lower load. You lose anywhere from 5 to 15% of your capacity in just overhead between the Host and Guests depending upon the hardware architecture and VM environment. Once you introduce high steady loads to the VM and then have bursts of activity you compound the issue resulting in the host being over-utilized. This is why at load in all of our testing and the testing other clients have done youe best case scenario is 50% of normal bare metal capacity due to the extra overhead you need to ensure you don't have any issues. The other issue with Asterisk is it does not scale much past a quad-core CPU whether it's virtualized or baremetal.

So long story short VM's generally do not work well for high load environments with load bursts. VMs do work great for HA setups and where hardware can be vertically scaled to meet your load or clustered out so more servers are utilized to meet your requirements.

You can safely virtualize an archive server, but that only needs to provide FTP and HTTP so literally anything can be an archive server. You could potentially virtualize the web server or database but your performance will be noticeably better on baremetal hardware if you run any significant loads on these. If your web and database load are more consistent then you might have decent results. It is worth noting that doing something as simple as loading leads on a virtualized web or database server can cause a significant enough load spike that the guest wobbles a little as the host tries to allocate resources to it.
Kumba
 
Posts: 939
Joined: Tue Oct 16, 2007 11:44 pm
Location: Florida


Return to ViciBox Server Install and Demo

Who is online

Users browsing this forum: Google [Bot] and 29 guests