Dato Blog

Our data scientists and industry experts offer insights, examples, and advice on machine learning and data science.

Anomaly Detected! Earthquakes...in Oklahoma

Posted by Brian Kent on Apr 8, 2016 11:00:00 AM

Topics: anomaly detection, time series

In a truly ground-shaking turn of events, Oklahoma has now become the earthquake capital of the continental United States: a report issued by the USGS in late March says parts of Oklahoma and Kansas now face the same risk of earthquake damage as California.

A March New York Times article reports that USGS seismologists have warned an increase in the number of earthquakes could indicate growing risk for a larger, more destructive temblor, so early detection of growing seismic activity is highly valuable. As a result, the Oklahoma government would like to know as soon as possible if its new regulations are impacting earthquake frequency and severity.

With this in mind, let's build an online machine learning system to alert us when there are anomalies in Oklahoma's seismic activity.

Read More

Gravitational Waves Exist and It’s Possible to Predict Next Day Forex Exchange Rates

Posted by Tigran Sargsyan on Mar 23, 2016 11:15:00 AM

In physics, gravitational waves are ripples in the curvature of spacetime which propagate as waves, traveling outward from the source. Predicted in 1916 by Albert Einstein on the basis of his theory of general relativity, gravitational waves transport energy as gravitational radiation. Over the century skeptics believe it’s not possible to detect gravitational waves in any observed future. However in mid February 2016, scientists announced that they had directly detected gravitational waves from a pair of black holes merging using the advanced detectors. The technical progress, measurement and processing tools improvement made this reality happen. Given that one may ask, if it’s possible to measure everything in today’s world, is it possible to predict the future? I hope to provide an affirmative answer for the small matter of “exchange rate forecasting” at the very least.

Read More

Shoulders of Giants: Women in Technology Who Inspire Us

Posted by Jennifer Bolton on Mar 8, 2016 4:08:00 PM

Topics: Women in Technology

In honor of International Women's Day, we would like to pay tribute to women in technology who have inspired us.

Technology is an incredibly exciting field, but it is not particularly diverse. We are grateful to have these female role models since we learn from their stories - challenges, failures, successes, and perseverence. We are also grateful to have colleagues - both men and women - who support, mentor, and provide new opportunities in our own journeys.

Read More

State of the SFrame

Posted by Yucheng Low on Feb 8, 2016 2:08:00 PM

Topics: SFrame, Numpy

2016 Roadmap for Dato’s Scalable DataFrame Datastructure

The SFrame repository has been first open sourced at the DS4DS workshop about 4 months ago and a lot of our development is happening there now. Since then, there have been 9705 lines of code added and 3517 lines of code deleted, 100 Forks, and 249 Stars. While much of the development in the last few months has been around incremental refactoring and bug fixes, with the new year comes a long wish list we will be working through including the following. Read More

Is Capacity the Main Purpose for Deploying LTE?

Posted by Francesco Malandrino on Feb 5, 2016 6:30:00 AM

Topics: Big Data Analytics, Data Science Tools, Big Data, Data Visualization, SFrames, SFrame, Log Data

It is a widespread belief that LTE (Long Term Evolution) networks, also called 4G in some countries, are primarily a way to deal with the increasing demand of data from consumers. In this blog we'll validate this widespread belief.

Read More

Elementary, My Dear Watson! An Introduction to Text Analytics Using Sherlock Holmes Stories

Posted by Michael Fire on Jan 29, 2016 6:29:00 AM

Topics: Text Classification, Text Analysis, NLP, Natural Language Processing

As a data scientist, analyzing text corpora is one of the more interesting tasks I like to do. By analyzing various text sources, we can learn a lot about the world around us. In this blog you'll find some of my favorite resources to learn about text analysis and my own example tutorial using the Sherlock Holmes stories.

Read More

What's Coming in 2016 for Machine Learning

Posted by Pablo Serrano on Jan 27, 2016 9:58:03 AM

Topics: Machine Learning Use Cases, Machine Learning Coursera

Dato kicked off the year with a "fireside chat" with Carlos Guestrin, CEO of Dato and the Amazon Professor of Machine Learning in Computer Science & Engineering at the University of Washington. Here are some of the key takeaways from his presentation and live Q&A with participants. The slides are below or you can register to watch the recording which addresses questions such as:

  • How will different industries will incorporate machine learning in the next 3 years?
  • Should I get a PhD in machine learning?
  • Difference between statistics and machine learning?
  • What techniques apply to stream data with a temporal component?
  • Do you see AI-as-a-Service eventually becoming a large offering like SaaS has become?
Read More

Best TV Shows with Computer Science, Data Science, and Machine Learning

Posted by Jennifer Bolton on Dec 28, 2015 12:28:00 PM

To celebrate the new year, our team is excited to share our favorite TV series with plotlines enriched by computer science, data analytics, machine learning, and technology innovation. So grab a holiday beverage and curl up with great storytelling and engaging characters on your screen of choice.

Friendly warning: These series may be addictive. As such, you may experience pure geeky enjoyment followed by severe drowsiness as you realize it is way past your bedtime.

Read More

How Fast Are Out-of-Core Algorithms?

Posted by Alon Palombo on Dec 10, 2015 8:58:00 AM

Topics: GraphLab Create, SFrame, scikit-learn, Pandas

In recent years, with the increasing amounts of data being stored; there is need for algorithms that can process large datasets. Out-of-core algorithms (sometimes also known as external memory algorithms) are designed to process data too large to fit into a computer's main memory. In this blog post, we introduce out-of-core algorithms, how to benefit from them, and explain some basic principles that influence out-of-core algorithm design.

Read More

The Quest for Composable Data Vis (React and ECharts)

Posted by Dick Kreisberg on Dec 4, 2015 7:37:00 AM

Topics: GraphLab Canvas, Front End Development, Data Visualization

In the adventure of Data Science, visualization plays an important role. Happily, there are many options for Python, like matplotlib, seaborn, ggplot, bokeh, and vispy. R offers libraries like ggplot2, ggvis, and shiny. Each library is well designed for scripted analysis and some provide interactive interfaces.

Front-end web developers have a great set of choices, as well. Bokeh, Plot.ly and Shiny have lovely interactive components.  D3 has both a toolbelt of utility functions, as well as an insightful API to encode data as the presentation and animation of DOM/SVG elements. Given time and effort a bespoke visualization, perfectly suited to your data and requirements, is just a project away.

Read More


Subscribe to Dato Blog notifications