I recently completed Stanford Machine Learning on Coursera taught by Andrew Ng. It is a fantastic course for anyone interested, plus it’s free! The course uses MATLAB or Octave for the programming assignments. With the release of Azure Machine Learning, this seemed like a good exercise to reimplement these assignments on Azure ML to learn the tool.
Creating an Azure ML Workspace
To get started, you will need an Azure subscription. From the Azure Management Portal click ‘+ NEW’ from the lower left and create a workspace, and link to or create an associated storage account for use with the data.
Once that is done, you can visit https://studio.azureml.net to interact with the workspace and conduct your ‘experiments’
Programming Exercise 1: Linear Regression
I completed the course ml-006 and the exercises will relate to this instance of the course. The first programming exercise starts out pretty straight forward: linear regression with one variable.
Data is provided as ex1data.txt: a comma separated value (csv) list of profts and city populations for a restaurant franchise. This can be uploaded into your workspace by clicking the ‘+ NEW’ in the lower left and choosing Dataset -> From Local File. The data set is actually a csv file without header, so choose that type for the dataset and provide a name and description. (Choosing csv simplifies use of the data later vs a generic txt file)
Now create a new experiment again from the ‘+ NEW’ in the lower left. Use the experiment items selector on the left you can click through or search for components. Setting up the experiment with the components and connects as below will train our linear regression model. Find all the componenets, drag them onto the canvas and then drag and drop from each of the nodes to connect up the components.
There is some configuration required before we can train the model. To configure each component, click on the component in the canvas the the properties will be visible in the pane on the right. As we are imitating the code from ml-006, the required properties are:
- Metadata Editor Item:
- Select All columns
- Set New column names to: Population, Profit
- Linear Regression Item:
- Solution Method: Online Gradient Descent
- Normalize features unchecked
- Average final hypothesis unchecked
- Learning rate 0.01
- Number of training epochs 1500
- Decrease learning rate unchecked
- Train Model Item:
- Label Column: Click ‘Launch column selector’ select single column: Profit (here is where metadat editor labelling the columns comes in handy rather than being arbitrarily labelled col2)
Everything should be right now to run the experiment. Click RUN in the bottom toolbar and let it do its thing.
Once it is done, you should see green ticks across the board and we can see what has happened. By clicking on any of the dots on each item on the canvas you get options to view information at that point. Clicking after the metadata editor as shown below gives the summary statistics of our data after clicking ‘visualise’. Clicking on the Population column and plotting against the profit will visualise the data and confirm that a linear model seems like a good fit from visual inspection.
Doing the same on the ‘Train Model’ item will give us the factors for the trained linear model. In this case:
- Bias: 1.41206
- Population: 20.1685
(Comparitively, running the solution in MATLAB resulted in values of -3.630291 and 1.166362. I am still yet to determine why the values are different.)
These represent the gradient and intercept of the linear model. The second line from the metadata editor to the score model item allows us to score the trained model with the original data. Visualising the output from the ‘Score Model’ item will give the summary statistics of this scored model.
More interesting though, to evaluating the model, is the ‘Evaluate Model’ item. Visualising the output of this will give us the errors of our model compared to the original data and will provide insights into how well the model fits.