Our data scientists and industry experts offer insights, examples, and advice on machine learning and data science.

"No Better Magic than Machine Learning": Giving the Gift of ML Education

Posted by Jennifer Bolton on Nov 23, 2015 7:47:00 AM

Topics: Machine Learning Coursera

Interview: Gabriel Menegatti, Founder and CEO, Simbiose Technology Ventures

At Dato, we love to hear about the ways our customers are innovating with machine learning. However, there is still a shortage of talent needed to realize the value of machine learning more widely for businesses, customers, and social good.

Recently we were inspired to hear about a Brazilian company piloting an initiative to cover 100% of the costs for selected applicants to take the Coursera Machine Learning Specialization, a program developed by Dato in partnership with University of Washington and Coursera. We reached out to learn more about the company and its initiative to champion machine learning education in Brazil.

Read More

Any Given Sunday: Football and a Machine Learning Rookie

Posted by Susan Romero on Nov 6, 2015 11:59:09 AM

Topics: GraphLab Create, Machine Learning Use Cases, NFL Data

I love football more than engineers love coffee, all my Dato friends know that. Throughout the course of an NFL season I have fantasy teams, point-spread pools, survivor pools and non-stop pontification on the latest terrible play call, awful refs and amazing plays (GO HAWKS!). Despite hearing about all the neato, life-changing, world-saving, cutting-edge, inspirational things machine learning could be used for, my first thought was "Huh. I should learn that and use it for football."

Read More

Calculating Churn Prediction with Customer Usage Data

Posted by Antoine Atallah on Oct 23, 2015 9:10:00 AM

Topics: Churn Prediction, Machine Learning Use Cases, Log Data

Usage logs are a treasure trove of information, often very large and spanning vast periods of time. Tapping into this data can be a daunting task, but fear not! An exciting new feature of GraphLab Create 1.6 is the addition of time series data! An example of the information we can extract from customer usage data is customer churn - calculating the probability that a customer will come back to our website, application, or business given their past behavior.

In this post we will take a look at the new Churn Prediction toolkit that is now part of GraphLab Create.

Read More

GraphLab Integration with Spark Open Source Release

Posted by Emad Soroush on Oct 15, 2015 10:08:00 AM

Topics: GraphLab Create, Open Source, Spark, SFrame

Due to its ability to support a wide variety of data engineering tasks across a growing range data sources, Apache Spark has become an integral part of the Hadoop eco-system. In this post, we introduce the new spark-sframe package which unites the data ingestion and processing capabilities of Apache Spark with the sophisticated machine learning tools of GraphLab Create enabling simplified development of rich machine learning models on a wide variety of data sources.  

Often the most challenging part of machine learning is getting the right data in the right form. Apache Spark provides rich Java, Scala, SQL, and Python APIs for bulk data and leverages fault tolerant distributed processing to accelerate IO and CPU intensive operations. However, once the data has been cleaned and transformed, the process of training models is often most efficiently achieved using specialized ML tools that leverage the structure of ML algorithms.

Over the past several years we have been developing a column based data frame that is specifically optimized for ML algorithms called SFrame. A few weeks ago, we announced the open source release of SFrame and today we are excited to announce the open source release of the spark-sframe package. The spark-sframe package unifies the bulk data processing capabilities of Apache Spark with the optimized open-source machine learning SFrame data-structure by providing a simple and efficient API to move between SFrame and RDD respresentations of data.

Read More

Why Personalisation is the Future of eCommerce

Posted by Neil Hughes (Guest Tech Blogger) on Oct 9, 2015 11:59:00 AM

Topics: Personalization, Machine Learning for eCommerce, Machine Learning Use Cases

Amazon and eBay have been leading the way in e-commerce for many years. Many will attribute their success to offering the consumer simplicity with their one-click ordering process. However, the reality is there is a whole heap of incredible work that goes on underneath the hood to ensure sales are steadily increasing.

Although, the decisions that online shoppers make often seem focused and unprompted, the reality is that we are often given a gentle nudge in the right direction without even realising it. Machine learning acts like a virtual butler of sorts who ensures that the customers’ needs are always met without you ever noticing they are in the room.

Read More

Machine Learning at Strata + Hadoop World

Posted by Eduardo Rosini on Oct 6, 2015 5:30:00 AM

Last week we went to Strata + Hadoop World in NY to join the industry conversation, and we learned a lot. Strata has always been a great conference for all things data, a place where established and new technologies come together to provide a fresh look at what works, what doesn’t, and what’s promising. To those of you that stopped by our booth, attended our tutorials, and gave us your dedicated 1:1 time, thank you so much for the insights. We look forward to continue the conversation and driving the industry forward together with you.

Machine learning was everywhere at Strata. This is a big deal. More companies than ever are using machine learning technology to build their products. Once a niche conversation, it has now become center stage. The industry has spent many years nailing the infrastructure that allows for data to be well managed. With that largely in place now, the focus is shifting to unleashing the power of machine learning on that data and let it drive new customer experiences and business innovation.

Read More

Data Science Salaries, Tools, and Trends: O’Reilly Media Report

Posted by Jennifer Bolton on Sep 30, 2015 12:02:00 PM

Topics: Hadoop, Data Science, Data Science Trends, Spark, Python

Hot off the press in time for Strata NYC, O’Reilly Media released its third annual report on salary ranges for data scientists and engineers, time spent by task type, use of tools, and correlation of tool usage and salary.

For their 2015 Data Science Salary Survey report, O’Reilly Media compiled anonymous survey results from 820 respondents in 47 countries and 38 states across a range of industries.

Read More

SFrame Open Source Release

Posted by Carlos Guestrin on Sep 25, 2015 5:29:00 AM

Topics: Open Source, SFrame

At Dato we’re big believers in open source software. So it’s with great pleasure that today, I announce the open source release of SFrame, our highly scalable column based data frame. 

With the SFrame you can easily interact with data that is larger than the amount of RAM on your system. SFrame is a column based data frame that is compressed and disk-backed. It’s optimized for data science and machine learning. It supports strictly typed columns (int, float, str, datetime), weakly typed columns (schema free lists, dictionaries) as well as specialized types such as Image. For more on the design architecture, see the data processing architecture blog post by our Chief Architect, Yucheng Low.

Read More

Coursera Specialization in Machine Learning: A New Way to Learn Machine Learning

Posted by Emily Fox on Sep 22, 2015 11:11:00 AM

Machine learning is transforming how we experience the world as intelligent applications have become more pervasive over the past five years. Following this trend, there is an increasing demand for ML experts. To help meet this demand, Carlos and I were excited to team up with our colleagues at the University of Washington and Dato to develop a Coursera specialization in Machine Learning. Our goal is to avoid the standard prerequisite-heavy approach used in other ML courses. Instead, we motivate concepts through intuition and real-world applications, and solidify concepts with a very hands-on approach. The result is a self-paced, online program targeted at a broad audience and offered through Coursera with the first course available today.

Read More

Dancing Babies, Interns, and Algorithms

Posted by Jennifer Bolton on Sep 16, 2015 5:28:00 PM

“Algorithm” is not a common word on news radio. Yet, there it was during my morning commute listening to The Takeway’s story on “How a Dancing Baby Video Could Change Copyright Law.”

Eight years ago, an epic copyright battle started over a video of a toddler dancing to 29 seconds of the Prince song “Let’s Go Crazy.” Universal Music Group charged the mom with the copyright infringement case under the Digital Millenium Copyright Act. In turn, she countersued Universal for misrepresentation, maintaining that the music in her video constituted fair use.

Read More


Subscribe to Dato Blog notifications