Virtual machines? Clusters? Nodes?
Confused? Lucky you! Not so long ago I went to Amsterdam, and it wasn’t for a long weekend/holiday/fun trip, but for a workshop filled with sweat (airco didn’t work, icecream took its place) and determination. But the content of this workshop might be interesting for you. So what was it all about?
Well, lots about data science and Microsoft Azure functionalities. And I’d like to share some of the knowlegde I’ve gained here. Today, we’ll talk about virtual machines (VM’s), clusters and nodes. Key questions: What are they? What can we use them for? And how to create them?
To start off:
What are VM’s, clusters and nodes?
A virtual machine basically is an on-demand, scalable computing resource. Which means that you can do computations on it, and the computation power that you get is based on your demand. Do some heavy lifting computations and you can get lots of computation power. But then of course, you’ll also have to level up your pricing plan, because you basically rent a part of Microsofts data center to do your computations. Big advantage: you don’t need to maintain or buy the physical hardware yourself.
A cluster is a network of virtual machines. This can be handy, because you cannot endlessly scale up the computing power of one virtual machine. However, you can put several VM’s in parallel and scale up your computing power this way.
A node is simply one VM in a cluster.
What can we use it for?
Virtual machines are the ideal solution to do heavy computations. What are heavy computations? We typically call computations heavy, if either there is a very large amount of data, or the algorithm itself requires lots of computation power, or both. Because you kind of outsource the computations to a data center, you save your own laptop. That’s one crash prevented!
How to create a virtual machine?
A VM can either be created through the Azure portal, Azure CLI or Powershell. Extensive documentation and tutorials can be found here.
Let’s now create a data science virtual machine together.
Prerequisites:
- Basic familiarity with creating resources in Azure
- An active Azure subscription. A free trial won’t do.
Steps:
- Go to “create resource” in the Azure portal and select “Data Science Virtual Machine — Windows 2016”.
- Provide the inputs that are required by Microsoft. These will be a name for your VM, VM disktype (HDD enables you to use a GPU based machine if you wish so), a user name, your password and a confirmation, your Azure subscription, a new or existing resource group and a location of your VM. Click ok.
- Then select which size you prefer. There is a trade-off between power and cost, so choose carefully. In this step you can get an error. This is often due to either you subscription not having access to NC series VM’s or you hit a quota error. Using a smaller sized VM usually solves the problem.
- Under setting you can customize your VM. Often the defaults are ok.
- Next you’ll find yourself on the create page. Review the summary and then click “create”.
- It takes up to 15 minutes to provision the VM in Azure.
- When the VM is ready to be used, you’ll get a notification in the Azure portal. Select “Go to resource” in the notification.
- At the resource blade, select Connect and then select Download RDP File.
- Run the downloaded RDP file, make sure that Clipboard is checked and click continue.
- Enter the username and password that you provided at step 2 and click “connect”.
- Select “connect” when prompted to accept the certificate and to connect.
- Your Data Science VM will pop up on your screen, ready to be used. There are several programs already installed on your VM. Simply open one of those tools and let the data science begin!
This blog was originally posted here.