I want to start with a new series where I summarize about the various meetups I go to. I want to include the following information – 

 

1) Actual link of meetup for anyone who is willing to go to the next one.

2) Video/Slides of the presentation

3) My summary of things

 

Cavets – 1) My summary is going to be far from thorough 2) Cross verify any claims/facts

 

The first meetup I want to start with was called “New Developments in Scalable Machine Learning”.

1) Meetup – http://www.meetup.com/SF-Bay-Areas-Big-Data-Think-Tank/events/196881922/

2) Video – https://www.youtube.com/watch?v=CupFvYJ-ZuQ

Summary – 

1) This was a panel discussion instead of traditional presentation
2) The only way to upgrade Hadoop version is to start a new company (Ted Dunning)
3) Everyone is excited and talking about Spark – mainly because we have reached a point where we can have clusters doing most of the computation in-memory (0xdata is in-memory ML computing engine)
4) Data ingestion/cleaning/munging is 80-90% of the ML pipeline, according to all the panelists
5) At production scale, the focus is on the time it takes for an ML model to score, version controlling and hot swapping of ML models
6) Deep learning has helped a lot of customers – in doing things that were not feasible in reasonable amount of time. Progression : Logistic regression -> GBM -> deep learning
7) The panelists are not too excited about GPU computing just yet – GPU computing is hard, performance improvement in only very specific applications (dense matrix multiplication)
 
But the most important of all, 
8) The technologies are changing rapidly. It is important for someone to learn a technology, use it in production for a while, but then be ready to move onto a better newer technology. This is going to be the norm in the immediate future. There will be a lot of relearning involved wrt technologies.

 

Advertisements