Sunday, August 12, 2018

DS and ML study - my collection

Working with Data Science and Machine Learning requires continuous education. Usually when I start on reading new paper or just google for some question I stumble apon a good course or a book or a resource, which I read fast to grasp the concept and bookmark to come back later.
*sometimes I even DO come back later :)
Today I just happen to organise my bookmarks, and than decided to share resources I prefer for online education - most of them offer free education, at least in audit mode.
NB! Order is arbitrary, and shows no preferences

edX https://courses.edx.org
Very popular with a lot of content. Mostly free, and one can buy certificate upon course completion.

Cousera https://www.coursera.org/
Another very popular. Unfortunately, some courses do not give access to practical excersizes unless paid.

LinkedIn https://www.linkedin.com/learning
Some interesting content there. Still kind of new, so I am trying in out now.

Microsoft Virtual Academyhttps://mva.microsoft.com/
A lot of free content regarding Microsoft technologies. The quality is also excellent.

Microsoft AI Schoolhttps://aischool.microsoft.com/
Somewhat similar to previous but with applied filter to AI stuff.

Pluralsights https://app.pluralsight.com/library/
The really good ones are not free. But maybe your work could pay for access?

Udacity https://classroom.udacity.com/
Another famous resource. Lectures are interactive and playfull.

DataCamp https://www.datacamp.com/community/open-courses/
I find cources here very simple and interactive. Sometimes it is useful to repeat basics.

Yandex Data Schoolhttps://yandexdataschool.ru/edu-process/courses (this one is on russian)
This one gives good insights witht more mathematics and underlying theory compared to other ML courses. If you speak russian - absolutely reccomended and truly enjoyable. Russians love math, don't we? :)

Youtube https://www.youtube.com/
Lot's of recordings of lectures and courses from most famous teachers and universities. No homework, but excellent for boosting theoretical understanding.

Monday, June 11, 2018

NDC Oslo

This year is the 3rd time I have an honor to speak at NDC Conference. NDC Oslo is my favourite one, - great content, cool venue and fantastic mood :)

Speaking about Machine Learning on several occasions I have often recieved comments that there is a lot of theoretical knowledge all around about the concept but nothing really about real life expirience.
So,
For this year we (me and @KatyaGeek) have decided to talk about how running Machine Learning project is different from running a typical Software project.

https://ndcoslo.com/talk/not-a-magic-what-to-expect-from-machine-learning-projects/

Come along!

Monday, December 11, 2017

AzureML supported R packages

To avoiid the situation when you write a custom R code and execute it in AzureML... and suddenly get an error about package is not supported. Bummer!
There is a list of R packages supported : https://msdn.microsoft.com/en-us/library/mt741980.aspx

Save yourself a trouble, have a look first!

Monday, August 28, 2017

Getting started with Machine Learning?

I am often asked what to look at if somebody wants to get started with Machine Learning. Usually I sent people to Coursera "Machine Learning" class by Andrew Ng. It's like a litmus test - after taking that one, people usually get a feeling if Machine Learning is something they want to continue with or not.

However, if you decide to continue, what's next? What kind of knowledge\skills to look at? What are buzzword in all those learning materials?

And than I found this blog post, which IMHO, summarize it all pretty well! Even addressing nesessity to understand Linear Algebra (while I usually take it for granted and never mention, but in fact - one have to know Linear Algebra. It's a corner stone of any engeneering skill) So here it is, enjoy:
http://abhijitannaldas.com/getting-started-with-machine-learning-in-one-hour/

Do not despair if it seems too much to deal at once. The learning approach can be bottom-up i.e. from theory to practice... but top-down works as well! I.e. get yourself a case and work down the limited scope of the theory nesessary to understand and develop the solution.

Friday, January 27, 2017

Understanding AzureML Web Services pricing

So, you have created a predictive experiment in Azure ML Studio, and it is time to go in production.
I am not going to explain how to deploy a web service - enough tutorials for that.
The question you need to ask yourself - how much it is going to cost me? And here come confusions. Most of people just click on deploy and go for default proposed. I haven't found any article explaining differences and options available. So, this post is about it.

At first, if we go to Azure Pricing calulator and add Machine Learning:
There is nothing here about cost of web services deployed from AzureML studio.

But click at the small information button and choose "Machine Learning pricing details" and the curtain lifts up! It redirects us to the page with explanation what is what and how much. Let's focus on the part "Production Web API pricing".

First of all, everybody understands that a web service requires some resources to spin. Those resources need to be allocated - at that is what we pay for. For standart web application those resources are allocated by creating App Sevice Plan. App Service Plans come in many cost options defines by how much compute power and how much disk place it includes. Behind an App Service Plan there is a virtual machine, so be aware - it's gonna cost no matter if you use it or not (unless the choosen plan is "Free"). Same as monthly abonement to a fitness club - you pay for it, and nobody cares if you actually visit it.

When deploy web service in AzureML studio, there is no such thing to choose as "App Service Plan". Those are for standart web services only. For AzureML web services there are 2 options: 

  • "classic" Web Service 
  • Machine Learning Web Service

What's the difference?
Classic is pretty much "pay-as-you-go" with cost:
And MachineLearning Web Service Plan is basically same as  "App Service Plan" but for AzureML. Prepaid, pre-allocated resources (there is a Free option - with 2 hours and place for 2 services).




Also, when creating a new ML Workspace we must now create a "Machine Learning Web Service Plan". And Free option can be used once per geo-location. So, when creating new Workspace pay attention to what you choose for Web Service Plan:
as chosing default suggestion will create new Web Service plan, and if Free one has already been used, the new one gonna cost.

Few words about scaling. As it says here, 20 concurrent requests are configured by default. If you need more - add more endpoints. Top limit is 200.

And yes, there is a new portal for just managing ML web services now - with the view for both deployment options:

I hope pricing options for AzureML web services are clear now :)

Tuesday, November 8, 2016

3 things I really miss in Azure Machine Learning

Azure Machine Learning is a handy tool, absolutely. If I need to run some model quickly to justify gut feeling or to have a simple overview over data, it fits really well. Or, for example, set up a web service from a Machine Learning experiment is really easy, so kudos for that!

But there are some things which annoy me time after time, which I really want to be implemented or done differently. Here is my top 3 "wish-list":

1. Navigation inside the experiment

 

mean, honestly... I can kindof accept zoom button, but for navigating inside the experiment window I really expect drag scene to work! As for know it requires moving mouse to a  side every time and scroll up or down.
Update : yey, there is a possibility to drag scene in AzureML! Just need to be enabled by clicking this button

2. Delete several datasets together
I do load and save a lot of datasets, and to delete them as they become irrelevant one-by-one is a time killer.

3. Ability to script the experiment
Yes, if I only could upload  experiment as a script... that opens so many possibilities. Like an essential one - VERSION CONTROL my experiments. Oh, don't even let me started...

Please Microsoft. Christmas is coming, and I was a nice girl :)

PS: do you have such things that annoy hell out of you? Comment!

Thursday, October 27, 2016

Convert order lines to weighted graph

Let's say we have orders history with some products. We need to perform community detection as a part of market basket analysis. Order lines are like OrderId - ProductId.

First thing need to be done is to convert order lines into weighted graph. Where Nodes are products and Edges connect nodes if products were purchased in the same order.
Something like this:
Weighted means that an Edge "weight" between two products is equal to amount of times those products were bought together.

Trying to find existing code for this task and not succeding, I have created a code snippet in R, which does the conversion from order lines to weigthed graph using adjacency matrix
For the graph above adjacency matrix can look like this:

The ides is simple - convert order lines into adjacency matrix N x N, where N = number of products (all columns are products, and all rows are products, edges weight = number of times two products bought in the same order). And adjacency matrix is easy convertable to graph.
This approach happens to work relatively fast.


Hope that saves somebody's time :)