Big Data news from Strata
By Andy Ormsby
27 Sep 2011
Category:
None

The Big Data world continues to grow and O'Reilly's "Strata" conference in New York provided plentiful evidence of this. The conference was a full five days: a one day "Strata Jumpstart" session, a two day Summit aimed at a business perspective and the final two day Strata Conference itself.
I attended the last two days. These were supposed to focus on the "nuts and bolts" of Big Data but such a description fails to do justice to the wide range of people present. There were significant numbers of data scientists, both current and aspiring and in case you are wondering what skills an aspiring data scientist needs, John Rauser's inspirational keynote talk ("What is a Career in Big Data?") listed maths, engineering, writing, scepticism and curiosity. You can see his talk here: http://www.youtube.com/watch?v=0tuEEnL61HM
Another talk that seemed to engage the audience and that I enjoyed hugely was Martin Maden's "First, Firster, Firstest" which managed to be compelling, amusing and informative. It touched on a range of issues concerned with data storage and management from the Elizabethan era onwards, touching on library classifications, taxonomies, schemas. Who else could get Francis Bacon and NoSQL in the same presentation? And who knew that page rank was invented in the 1930s? http://www.youtube.com/watch?v=Qv0yF47L8WE
From a technology standpoint, it is clear that Hadoop remains at the top of a lot of people's list. It has instant name recognition now and lots of commercial take-up beyond Cloudera, the company that first brought commercial support for Hadoop to market. The downside of widespread recognition is that the Hadoop marketplace is now really quite crowded and understanding the nuances of what each of the players are offering is a little complex.
Why the fuss? It's simply that Hadoop helps people solve a whole series of big data analytics problems relatively easily and very cost-effectively. Abhishek Mehta from Tresata's talk was a great example. Abhishek focused on what Hadoop means for banking: A mere 1-2% of the 10-50 petabytes of data in a typical bank is subject to analysis; Hadoop enables more of this data to be used effectively and that as a result, banks have an opportunity price assets more effectively, offer real personalisation of services rather than coarse segmentation and use outliers to inform models rather than stress them by looking at populations rather than samples.
Even Tresata would agree that in the real-time area, Hadoop is not the answer. In his talk on the "Big Data Pipeline", Acunu's CEO Tim Moreton talked more generally about how emerging requirements for real-time analytics require solutions that are distinct from the current generation of Hadoop solutions which tend to focus on batch analytics. Tim's slides are here:http://assets.en.oreilly.com/1/event/63/Navigating%20the%20Data%20Pipeline%20Presentation.pdf
There's also an interview with Tim Moreton here: http://www.youtube.com/watch?v=pXcxG1ItgxM
The next Strata Conference is scheduled to start on February 28th in Santa Clara and if it is anything like as good as the New York edition, it should be worth attending. You can find out more here: http://strataconf.com/strata2012
I attended the last two days. These were supposed to focus on the "nuts and bolts" of Big Data but such a description fails to do justice to the wide range of people present. There were significant numbers of data scientists, both current and aspiring and in case you are wondering what skills an aspiring data scientist needs, John Rauser's inspirational keynote talk ("What is a Career in Big Data?") listed maths, engineering, writing, scepticism and curiosity. You can see his talk here: http://www.youtube.com/watch?v=0tuEEnL61HM
Another talk that seemed to engage the audience and that I enjoyed hugely was Martin Maden's "First, Firster, Firstest" which managed to be compelling, amusing and informative. It touched on a range of issues concerned with data storage and management from the Elizabethan era onwards, touching on library classifications, taxonomies, schemas. Who else could get Francis Bacon and NoSQL in the same presentation? And who knew that page rank was invented in the 1930s? http://www.youtube.com/watch?v=Qv0yF47L8WE
From a technology standpoint, it is clear that Hadoop remains at the top of a lot of people's list. It has instant name recognition now and lots of commercial take-up beyond Cloudera, the company that first brought commercial support for Hadoop to market. The downside of widespread recognition is that the Hadoop marketplace is now really quite crowded and understanding the nuances of what each of the players are offering is a little complex.
Why the fuss? It's simply that Hadoop helps people solve a whole series of big data analytics problems relatively easily and very cost-effectively. Abhishek Mehta from Tresata's talk was a great example. Abhishek focused on what Hadoop means for banking: A mere 1-2% of the 10-50 petabytes of data in a typical bank is subject to analysis; Hadoop enables more of this data to be used effectively and that as a result, banks have an opportunity price assets more effectively, offer real personalisation of services rather than coarse segmentation and use outliers to inform models rather than stress them by looking at populations rather than samples.
Even Tresata would agree that in the real-time area, Hadoop is not the answer. In his talk on the "Big Data Pipeline", Acunu's CEO Tim Moreton talked more generally about how emerging requirements for real-time analytics require solutions that are distinct from the current generation of Hadoop solutions which tend to focus on batch analytics. Tim's slides are here:http://assets.en.oreilly.com/1/event/63/Navigating%20the%20Data%20Pipeline%20Presentation.pdf
There's also an interview with Tim Moreton here: http://www.youtube.com/watch?v=pXcxG1ItgxM
The next Strata Conference is scheduled to start on February 28th in Santa Clara and if it is anything like as good as the New York edition, it should be worth attending. You can find out more here: http://strataconf.com/strata2012