Dato Blog

Our data scientists and industry experts offer insights, examples, and advice on machine learning and data science.

Is Capacity the Main Purpose for Deploying LTE?

Posted by Francesco Malandrino on Feb 5, 2016 6:30:00 AM

Topics: Big Data Analytics, Data Science Tools, Big Data, Data Visualization, SFrames, SFrame, Log Data

It is a widespread belief that LTE (Long Term Evolution) networks, also called 4G in some countries, are primarily a way to deal with the increasing demand of data from consumers. In this blog we'll validate this widespread belief.

Read More

Elementary, My Dear Watson! An Introduction to Text Analytics Using Sherlock Holmes Stories

Posted by Michael Fire on Jan 29, 2016 6:29:00 AM

Topics: Text Classification, Text Analysis, NLP, Natural Language Processing


As a data scientist, analyzing text corpora is one of the more interesting tasks I like to do. By analyzing various text sources, we can learn a lot about the world around us. In this blog you'll find some of my favorite resources to learn about text analysis and my own example tutorial using the Sherlock Holmes stories.

Read More

What's Coming in 2016 for Machine Learning

Posted by Pablo Serrano on Jan 27, 2016 9:58:03 AM

Topics: Machine Learning Use Cases, Machine Learning Coursera


Dato kicked off the year with a "fireside chat" with Carlos Guestrin, CEO of Dato and the Amazon Professor of Machine Learning in Computer Science & Engineering at the University of Washington. Here are some of the key takeaways from his presentation and live Q&A with participants. The slides are below or you can register to watch the recording which addresses questions such as:

  • How will different industries will incorporate machine learning in the next 3 years?
  • Should I get a PhD in machine learning?
  • Difference between statistics and machine learning?
  • What techniques apply to stream data with a temporal component?
  • Do you see AI-as-a-Service eventually becoming a large offering like SaaS has become?
Read More

Best TV Shows with Computer Science, Data Science, and Machine Learning

Posted by Jennifer Bolton on Dec 28, 2015 12:28:00 PM

To celebrate the new year, our team is excited to share our favorite TV series with plotlines enriched by computer science, data analytics, machine learning, and technology innovation. So grab a holiday beverage and curl up with great storytelling and engaging characters on your screen of choice.

Friendly warning: These series may be addictive. As such, you may experience pure geeky enjoyment followed by severe drowsiness as you realize it is way past your bedtime.

Read More

How Fast Are Out-of-Core Algorithms?

Posted by Alon Palombo on Dec 10, 2015 8:58:00 AM

Topics: GraphLab Create, SFrame, scikit-learn, Pandas

In recent years, with the increasing amounts of data being stored; there is need for algorithms that can process large datasets. Out-of-core algorithms (sometimes also known as external memory algorithms) are designed to process data too large to fit into a computer's main memory. In this blog post, we introduce out-of-core algorithms, how to benefit from them, and explain some basic principles that influence out-of-core algorithm design.

Read More

The Quest for Composable Data Vis (React and ECharts)

Posted by Dick Kreisberg on Dec 4, 2015 7:37:00 AM

Topics: GraphLab Canvas, Front End Development, Data Visualization

In the adventure of Data Science, visualization plays an important role. Happily, there are many options for Python, like matplotlib, seaborn, ggplot, bokeh, and vispy. R offers libraries like ggplot2, ggvis, and shiny. Each library is well designed for scripted analysis and some provide interactive interfaces.

Front-end web developers have a great set of choices, as well. Bokeh, Plot.ly and Shiny have lovely interactive components.  D3 has both a toolbelt of utility functions, as well as an insightful API to encode data as the presentation and animation of DOM/SVG elements. Given time and effort a bespoke visualization, perfectly suited to your data and requirements, is just a project away.

Read More

Beginner's Guide to Click-Through Rate Prediction with Logistic Regression

Posted by Kevin Markham on Nov 30, 2015 11:39:48 AM

Topics: Logistic Regression

Let's say that you're a major search engine, and you need to decide which ad to display at the top of your search results. How would you do it?

Your first thought might be to narrow the scope to ads "related" to the search, and then choose whichever ad offers the greatest revenue. Companies have already bid on how much they will pay you, so it seems easy to maximize your revenue by choosing the highest paying ad. But is that the right approach?

Many ads are actually sold on a "pay-per-click" (PPC) basis, meaning the company only pays for ad clicks, not ad views. Thus your optimal approach (as a search engine) is actually to choose an ad based on "expected value", meaning the price of a click times the likelihood that the ad will be clicked. In other words, a £1.00 ad with a 5% probability of being clicked has an expected value of £0.05, whereas a£2.00 ad with a 1% probability of being clicked has an expected value of only £0.02. In this case, you would choose to display the first ad.

Read More

"No Better Magic than Machine Learning": Giving the Gift of ML Education

Posted by Jennifer Bolton on Nov 23, 2015 7:47:00 AM

Topics: Machine Learning Coursera

Interview: Gabriel Menegatti, Founder and CEO, Simbiose Technology Ventures

At Dato, we love to hear about the ways our customers are innovating with machine learning. However, there is still a shortage of talent needed to realize the value of machine learning more widely for businesses, customers, and social good.

Recently we were inspired to hear about a Brazilian company piloting an initiative to cover 100% of the costs for selected applicants to take the Coursera Machine Learning Specialization, a program developed by Dato in partnership with University of Washington and Coursera. We reached out to learn more about the company and its initiative to champion machine learning education in Brazil.

Read More

Any Given Sunday: Football and a Machine Learning Rookie

Posted by Susan Romero on Nov 6, 2015 11:59:09 AM

Topics: GraphLab Create, Machine Learning Use Cases, NFL Data

I love football more than engineers love coffee, all my Dato friends know that. Throughout the course of an NFL season I have fantasy teams, point-spread pools, survivor pools and non-stop pontification on the latest terrible play call, awful refs and amazing plays (GO HAWKS!). Despite hearing about all the neato, life-changing, world-saving, cutting-edge, inspirational things machine learning could be used for, my first thought was "Huh. I should learn that and use it for football."

Read More

Calculating Churn Prediction with Customer Usage Data

Posted by Antoine Atallah on Oct 23, 2015 9:10:00 AM

Topics: Churn Prediction, Machine Learning Use Cases, Log Data

Usage logs are a treasure trove of information, often very large and spanning vast periods of time. Tapping into this data can be a daunting task, but fear not! An exciting new feature of GraphLab Create 1.6 is the addition of time series data! An example of the information we can extract from customer usage data is customer churn - calculating the probability that a customer will come back to our website, application, or business given their past behavior.

In this post we will take a look at the new Churn Prediction toolkit that is now part of GraphLab Create.

Read More

Comments


Subscribe to Dato Blog notifications