I was pleased to attend Enterprise 2.0 Innovate on the West Coast for the first time. It is occurring November 12 – 15 in the Santa Clara Convention Center. Here my notes from this year’s Enterprise 2.0 2012 conference in Boston. Here are my notes from the session: Can You See Me Now? Tools and Techniques for Visualizing Big Data led by Puneet Piplani, Geography Head, Mu Sigma. Here is the workshop description.
“Big Data is creating big headaches, forcing enterprises to find new ways to harness data to make decisions. Big Data is producing a positive wave of disruption in social data analysis and having a major impact on businesses. Third-parties are developing analytical methods and visualization techniques to help companies manage and interpret all data types. Some may be surprised to learn that there’s a shortage of data scientists and the importance of recruiting analytical talent specifically in the Big Data space is critical.”
Puneet said that Mu Sigma is a decision science and analytics company that was started in 2004. They have over 2,000 scientists working across 10 verticals. He said that data science is simply a means to the end of making better decisions. Data is growing rapidly. Google processes 24 terabytes of data per day. In 1993 the total internet traffic was about 100 terabytes.
The challenge is not just the volume, but also the variety: people to people, people to machines, and machines to machines. There is also the velocity. There are 2.9 million emails sent every second and 20 hours of video uploaded to YouTube every minute. There are also 50 million tweets per day.
To deal with this exploding data volume you need new applications. The new technology results in more data and all this data is getting stored. Is big data just a hype?
Data is an asset. It can be structured and unstructured. It can be inside and outside the firewall. How do you combine all this data to get useful information? For example, a sports clothing company wanted to forecast how many team jerseys to create and what players. So they looked at social media to make projections.
A competitive differentiator is how you use the data to make decisions. The consumption of analytics is the differentiator, but the creation. Big data technologies are becoming commoditized bring down the cost of these tools. It is the organizations that learn from the data that will win.
Visualization plays a key role. A Danish scientist looked at how we use our senses. You process in this sense data in the following order of increasing speed: taste, smell, touch, sight. However, the brain process this data slower that it gets all this data. So the brain makes assumptions. The unconscious part of the brain depends more on senses that language. Sight is the fastest sense so visualization is key. He compared a list of figures to a graph. You can see much more from the graph much faster and make assumptions much quicker. The graph frees the brain up from processing numbers to seeing trends.
I was a cognitive psychologist in a former life and saw this power of visualization. Although, there are also times when language trumps visuals. You need to match the characteristics of the media with the cognitive processes that people use to solve problems. Some tasks are best solved through the precision and dichotomies that language offers. Other tasks are best solved with the big picture that visualization offers.
Tools can now provide dynamic visualizations when the data changes and when you ask questions. There can also be personal or role based visualizations that adjust depending on who you are. You can get visual alerts and notifications base don changing data. You need to see the outliers and exceptions. That is where the key data occurs for decision making.
He showed some interesting examples. First, there was a hierarchical tree map that is useful for topic visualizations. One example was foodmood.com.in that showed what people are eating in different places and how they like it and make comparisons (see below).
He then showed thematic rivers to see data trends over time. He went to a site that showed the tweets over time about this conference. Another example is the History of Everything – ChronoZoom (see below).
A third is CNN Ecosphere #RIO20 about a conference in Rio. It is big set of networked dots that you zoom in and explore (see below). These are public examples and I checked out each Web site. They are fun.
Puneet also mentioned an example his firm created for an airline to minimize the effects of flight delays on over flights. There are many vendors in this space. He showed a 2006 Gartner hype cycle and the visualization of big data was going into the trough of disillusionment. Now it is reaching the slope of enlightenment. It is starting to take hold and be accepted as real and valuable.
This is an area that I am interested in as I am involved with a firm that does data visualization, Darwin Ecosystems. It produces the Darwin Awareness Engine(TM) that looks at what is happening in social media using Chaos Theory based algorithms to let the content self-organize and produces visualizations of the findings. There is also Tweather that focuses specifically on Twitter and shows topic trends over time.