Keynotes – nothing substantial, couple of them pitches by sponsors, salesforce – how mobile Internet is changing lives (couple of interesting points). Rubicon – in the moment analytics (check out). Data drill – data exploration for non IT. Data science getting it’s due (open government data) – DJ Patil first chief data scientist. New government data – open and machine readable. Big Data report?

Spark talk – hall of fame. Alibaba tabolo. Zebra fish, microscope the fish. Spark mapping the brain. Laser, activate individual neurons. Scala resource optimization, better shuttle, data frames!

Tsar – Google analytics for twitter

Streaming design patterns – kappa architecture. External lookup.

Belgian MoD – @stevenbeeckman
Sex and cash theory
Startup bus. Civilian data analyst.

Quid – contextual vs global models (see pic).

Raster maps.geotrellis – geographic data processing.

Keynote – cyber operations room, Stafford beer, chile

Open data platform

New data visualizations – length better than area. Edward tufte.

The connected cow. Estrus

Netflix. Hadoop + s3 instead of hdfs.

Crunch – faster cascading. Hive on tez fastest. Spark – easy maintenance.

Fastest SQL – hive on tez. Hawk/hive

Cloudera presentation – click stream data

Adobe presentation – middle America. Tb is standard, pb is limits. Most of the time people are mining structured data. Old organizations (>100 years) using data.

Mapr – myriad. The day yarn was announced, mesos was in production for a year. Actor based bidi RPC(?). Mesos create virtual clusters. Omega paper/Google – single scheduler framework not viable. Slider

Spark – shuffle (common inefficiency), job on driver vs worker. Rdd.toDebugString(). Collect transfers data from worker to driver. Only driver can perform operations on rdds (no rdds within rdds). For converting batch to stream – transform/foreachrdd. Testing : instead of SparkContext.stop() use LocalSparkContext.stop(). Spark-packages.

Cybernetics revolutionaries.