BigML is working hard to support a wide range of browsers. Your experience will be better with:
Normalize repeats for Anomaly Detection
Image of Normalize repeats for Anomaly Detection

Improve the anomalies found in your dataset using the new parameter "Normalize repeats". If you enable this parameter, Anomalies will take into account not only the different values but also the frequency of repeated (or very similar) data points.

For example, if you had many missing values in your data, it may happened that the Anomaly detector identified the instances with missing values as highly anomalous regardless whether missing values were the rule instead of the exception. By using "Normalize repeats", instances with missing data will not appear among the top anomalies in this case.

anomalies frequency score anomaly
Select date-time fields for Time Series
Image of Select date-time fields for Time Series

You can now select the date-time field from the dataset to plot your time series data in the Dashboard. Be aware that your instances must be chronologically sorted in the dataset to select the date-time field.

date-time fields dashboard visualization timeseries
Deepnet Predictions
Image of Deepnet Predictions

One of the main goals of any BigML resource is making predictions, and Deepnets are no exception. Deepnets can be used to predict categorical or numeric values. As Deepnets have more than one layer of nodes between the input and the output layers, the output will be the network’s prediction. In the case of categorical objective fields, an array of per-class probabilities will be returned, while a single, real value will be predicted for regression problems. You can perform single predictions, if you want to predict just one instance; or batch predictions if you want to predict multiple instances at the same time.

deepnets classification regression probabilities prediction
Deepnet Evaluations
Image of Deepnet Evaluations

Evaluate the performance of your Deepnets to get an estimate of how good your model will be at making predictions for new data and easily interpret the results with BigML evaluation visualizations. As with other supervised learning models on BigML, Deepnets include the confusion matrix and the ROC and Precision-Recall curves to help you solve classification problems. Furthermore, quickly compare the performance of your different Deepnets and other models built using different algorithms with the BigML evaluation comparison tool.

deepnets classification regression comparison tool roc curve auc confusion matrix evaluation
Image of Deepnets

BigML is proud to announce Deepnets, an optimized version of Deep Neural Networks, the machine-learned models loosely inspired by the neural circuitry of the human brain. Deepnets are state-of-the-art in many important supervised learning applications. To avoid the difficult and time-consuming work of hand-tuning the algorithm, BigML’s unique implementation of Deep Neural Networks offers first-class support for automatic network search and parameter optimization. BigML makes it easier for you by searching over all possible networks for your dataset and returning the best network found to solve your problem. Thus, non-experts can train deep learning models with results matching that of top-level data scientists.

supervised classification regression evaluation dashboard predictions partial dependence plot neural networks deep learning miscellaneous deepnet
Time Series: Model Decomposition
Image of Time Series: Model Decomposition

BigML recently launched Time Series, a sequentially indexed representation of your historical data commonly used for predicting stock prices, sales forecasting; website traffic, production, and inventory analysis as well as weather forecasting among other use cases.

BigML implements exponential smoothing methods which are able to learn multiple models out of the training data by using different combinations of the three essential model components: the level, the trend, and the seasonality. Now, you can decompose your models to display each of these components plotted in a separate chart. For a detailed explanation of each component, please visit the dedicated release page, where you will find a series of six blog posts about Time Series, the BigML Dashboard and API documentation, the webinar slideshow as well as the full webinar recording.

seasonality level trend dashboard decomposition models timeseries
Time Series Forecasts
Image of Time Series Forecasts

You can use your Time Series models to make predictions, which are called Forecasts. With Time Series Forecasts, you can easily forecast events in short or longer time horizons. You can also employ a Time Series model to forecast the future values of multiple objective fields. Along with each forecasted data point, BigML generates an error interval as a measure of the quality of your forecast. The interval indicates the lower and upper bounds within which the forecast will fall with a 95% confidence.

forecasts time series intervals horizon miscellaneous timeseries
Time Series Evaluations
Image of Time Series Evaluations

You can easily evaluate the performance of your Time Series models. For this, you need to use two different subsets of data: one for training and the other one for testing. BigML represents your test data and the model forecasts in a chart, so you can visually analyze the goodness-of-fit of your Time Series models.

You will also see multiple performance metrics such as the Mean Absolute Error (MAE), the Mean Squared Error (MSE), the R squared, the Symmetric Mean Absolute Percentage Error (SMAPE), the Mean Scaled Error (MSE), and the Mean Directional Accuracy (MDA). You can find an explanation for each metric in the 6th chapter of the Time Series documentation.

evaluations time series performance metrics MAE MSE R squared SMAPE MDA miscellaneous timeseries
Time Series Models
Image of Time Series Models

BigML is proud to launch Time Series, a sequentially indexed representation of your historical data that can be used to forecast future values of numerical properties. This is a versatile method often used for predicting stock prices, sales forecasting, website traffic, production and inventory analysis, and weather forecasting, among many other use cases.

A Time Series model needs to be trained with numeric fields containing a time-ordered sequence of regularly spaced data points in time. BigML implements exponential smoothing methods which are able to forecast time-based data with complex trends and seasonal patterns. BigML generates multiple models behind the scenes so you can select the best-performing ones. You can find a detailed explanation in the Time Series documentation.

time series forecast numeric fields supervised learning miscellaneous timeseries
Compare Multiple Evaluations
Image of Compare Multiple Evaluations

Any classification problem can be solved by using different supervised learning algorithms and using different configurations per algorithm as you iteratively improve your models. BigML brings to the Dashboard an easy and visual way to compare your models and decide which one performs better. You can easily select the models, the positive class, the metrics and any of the ROC curve, the precision-recall curve, the gain curve, or the lift curve for your comparison. You can also rank your models by the Area Under the Curve (AUC), K-S statistic, Kendall's Tau, or Spearman's Rho.

classification models ensembles logistic regression evaluation comparison roc curve precision-recall curve lift curve gain curve auc auch probability threshold kendall's tau spearman's rho k-s statistic evaluation
Evaluation Curves for Classification Models
Image of Evaluation Curves for Classification Models

Evaluating the performance of your Machine Learning models is one of the most important steps in the predictive process. BigML is releasing a new Dashboard visualization for evaluations, which includes new performance metrics to make it easier to assess the performance of your classification models. Now, you can use the popular ROC curve to understand the trade-off between sensitivity and specificity for each possible threshold as well as the precision-recall curve, the gain curve, and the lift curve. Moreover, you can find new metrics that measure the overall predictive performance of your models for the selected positive class such as the Area Under the Curve (AUC), the Area Under the Convex Hull (AUCH), the K-S statistic, the Kendall's Tau, and the Spearman's Rho coefficients.

roc curve precision-recall curve gain curve lift curve auc auch probability threshold kendall's tau spearman's rho k-s statistic evaluation
Resource Configuration Information
Image of Resource Configuration Information

To solve Machine Learning problems, you usually need several iterations that employ different algorithms and configurations to build your final models and workflows. Now, BigML makes it even easier and faster for you to find the right resources at a glance from among many that belong to the same project by listing the values of your configured parameters for each resource.

configuration parameters resoruces names API Dashboard miscellaneous
Select the Consequent for Associations
Image of Select the Consequent for Associations

Associations are powerful means in finding strong correlations among your dataset values. However, depending on your case, you may not always be interested in finding the strongest relationships but only the rules that meet certain conditions instead. Specifying your conditions to zero in on your rules of interest is now easier than ever. Simply select your data field of interest and one or more field values for the consequent part of the rule, and you will obtain the relevant associations in no time.

associations consequent antecedent fields dashboard api association
Select the Consequent for Associations
Image of Select the Consequent for Associations

Associations are powerful means in finding strong correlations among your dataset values. However, depending on your case, you may not always be interested in finding the strongest relationships but only the rules that meet certain conditions instead. Specifying your conditions to zero in on your rules of interest is now easier than ever. Simply select your data field of interest and one or more field values for the consequent part of the rule, and you will obtain the relevant associations in no time.

associations consequent antecedent fields dashboard api association
Boosted Trees Predictions
Image of Boosted Trees Predictions

The ultimate goal of creating any supervised learning model is to get a prediction for new intstances. Like other supervised models, Boosted Trees offer Single Predictions to predict a given single instance and Batch Predictions to predict multiple instances simultaneously. Instead of returning a single class along with its confidence, Boosted Trees return a set of probabilities for all the classes in the objective field which is visible in the predictions histogram.

ensemble classification regression single predictions batch predictions dashboard api boosted trees
Boosted Trees
Image of Boosted Trees

The BigML team is proud to announce Boosted Trees, the third ensemble-based strategy that BigML provides to help you easily solve your classification and regression problems. Together with Bagging and Random Decision Forests, Boosted Trees make for a powerful combination available both via the BigML Dashboard and our REST API. This well-known technique is an ensemble of several single models, where each tree improves the mistakes made by the previously grown tree. It is one of the best performing Machine Learning methods to solve complex real-world problems.

ensemble classification regression partial dependence plot boosting boosted trees dashboard API
Partial Dependence Plots for Models
Image of Partial Dependence Plots for Models

As a complement to our popular decision trees visualization and the sunburst, we are launching a third view for your models: the Partial Dependence Plot. This heatmap chart also allows you to analyze the marginal impact of each input field on predictions for classification and regression models built by using ensembles and logistic regressions.

supervised classification regression visualization dashboard predictions model miscellaneous
Batch Deletion for Resources
Image of Batch Deletion for Resources

Solving a Machine Learning problem is an iterative process that requires the creation of a great number of intermediary datasets, models, evaluations and predictions to get the final model. Now, BigML simplifies it keeping your account organized and up-to-date by allowing the deletion of multiple resources at the same time. Just click the deletion icon found in the resources listing in the Dashboard, and select the resources to be deleted.

deletion resources api dashboard miscellaneous
E-mail Notifications for Scripts
Image of E-mail Notifications for Scripts

Asynchcronous WhizzML script executions can take some time to finish to fulfill complex Machine Learning workflows they implement. You no longer need to check your execution repeatedly to see if your results are ready. This new option let's you specify that you want to be notified by e-mail once the execution finishes, while you concentrate on other tasks.

whizzml scripts executions notifications
Scriptify: Reify Complex Workflows
Image of Scriptify: Reify Complex Workflows

Furthering our obsession to speed up your Machine Learning processes, we have incorporated Scriptify into your 1-click menu options. Now, you can automatically regenerate any BigML resource (models, evaluations, predictions, etc) with a single click. Scriptify creates a script that contains all the workflow information end-to-end (from configuration parameters to resources created). You can precisely repeat the processing steps of any original Machine Learning resource to your heart's desire!

whizzml worklows automation resources 1-click actions
Shared Resources Cloning
Image of Shared Resources Cloning

Now, you can easily clone datasets, models and scripts, from other users into your BigML account. Provided that a user shares a resource using the sharing link and the cloning capability is enabled, any other user with access to the link will be able to include this resource in their BigML account.

This new feature will allow you to fully use the shared resources. For example, when another user shares a dataset using the sharing link, it is in "view only" mode, so you can not perform any actions such as creating new models, exporting it, sampling it, etc. Now, by cloning it, you will be able to perform all BigML actions available for datasets.

resources cloning sharing link dashboard web
Stats Computation for Logistic Regression
Image of Stats Computation for Logistic Regression

These new Dashboard statistics allow you to introspect the predictive power of your model by revealing the significance of each coefficient estimate. BigML computes the likelihood ratio to test how well the model fits your data along with the p-value, confidence interval, standard, error and Z score for each coefficient.

Learn more about the Logistic Regression statistics in the Dashboard documentation.

logistic regression dashboard supervised stats p-value z score standard error confidence intervals likelihood ratio logisticregression
Association Predictions: Association Sets
Image of Association Predictions: Association Sets

BigML is bringing predictions for Associations to the Dashboard. Association Sets allow you to pinpoint the items which are most strongly associated with your input data. For example, given a set of products purchased by a person, what other products are most likely to be bought?

All the predicted items will be ranked according to a similarity score, and they will be displayed in a table view. You can also visualize each predicted rule in a Venn diagram to get a sense of the correlation strength between the input data and the predicted items. Read more about Association Sets in the 8th chapter of the Associations documentation.

associations predictions itemsets association rules associationset
BigML Certifications
Image of BigML Certifications

We are happy to announce BigML Certifications, for organizations and professionals that want to master BigML to successfully deliver real-life Machine Learning projects. These courses are ideal for software developers, system integrators, analysts, or scientists, to boost their skill set and deliver sophisticated data-driven solutions. We offer two separate courses, each of them consisting of 4 weekly online classes of 3 hours each:

  • Certified Engineer: all you need to know about advanced modeling, advanced data transformations, and how to use the BigML API (and its wrappers) in combination with WhizzML to build and automate your Machine Learning workflows.

  • Certified Architect: learn how to implement your Machine Learning solutions so they are scalable, impactful, capable of being integrated with third-party systems, and easy to maintain and retrain.

If you successfully pass the certification exam, BigML will award you with a diploma. In addition, BigML Certified Partners will receive business referrals that help them source new Machine Learning projects.

courses modeling api supervised unsupervised whizzml data transformations engineer architect miscellaneous
Partial Dependence Plot for Ensembles
Image of Partial Dependence Plot for Ensembles

This new visualization for ensembles, commonly known as Partial Dependence Plot, allows you to visualize the impact that a set of fields have on predictions. You will be able to determine which fields are most relevant for ensemble predictions and how sensitive your ensemble predictions are to their different values.

The chart displays a heatmap representation of your predictions based on different values of the two selected fields in the axes regardless of the rest of the fields used to train your ensemble. You can select any categorical or numeric field for the axes and configure the values for the rest of the input fields by using the fields inspector panel on the right.

supervised classification regression visualization dashboard predictions ensemble
Batch Field Importances
Image of Batch Field Importances

This feature enables you to include the field importances in your batch predictions, i.e., a set of percentages indicating how much each field in your dataset contributed to the prediction of a given instance. You can include those values in your output file and dataset either with BigML Dashboard or the API. This will give you a better understanding of your predictions as it will reveal which are the most relevant fields factoring in a given prediction.

supervised predictions batch predictions regression classification models ensembles api dashboard prediction
Topic Distributions
Image of Topic Distributions

Topic Models assume that each document exhibits a mixture of topics. The main goal of creating a Topic Model is to discover the topic importances for a given document. For example, a document may be 70% about "Machine Learning", 20% about "stock market" and 10% about "startups".

Topic Distributions allow you to make predictions for a single data instance, and Batch Topic Distributions help predict the same for multiple instances simultaneously. Based on a given Topic Model, BigML Topic Distributions provide a set of probabilities for each data instance (one probability per topic), which indicate the relative relevance of all topics for that instance.

TopicModels TopicDistributions BatchTopicDistributions TopicProbabilities API Dashboard Fall2016 Unsupervised topicmodel
Topic Models
Image of Topic Models

The BigML team has brought Topic Models to the API and the Dashboard as part of Fall 2016 release. Topic Models are an optimized implementation of Latent Dirichlet Allocation, a probabilistic unsupervised learning method that determines the topics underlying a collection of documents.

Topic Models' main application areas include browsing, organizing and understanding large amounts of unstructured text data, which can be very useful for information retrieval tasks, collaborative filtering or content recommendation use cases among others.

BigML provides two original visualizations that accompany its implementation so you can better inspect your Topic Model:

  • Topic Map: get an overview of your topic importances and their thematic closeness.
  • Term Chart: get an overview of the main terms that make up your found topics.
TopicModels Topics TermChart TopicMap API Dashboard Fall2016 Unsupervised topicmodel
Flatliner code editor & evaluator
Image of Flatliner code editor & evaluator

Flatline is BigML’s Lisp-like language that enables you to programmatically perform an array of data transformations, including filtering and new field generation. Flatliner is a handy code editor (available in our Labs section) that helps you test your Flatline expressions.

dataset labs filter transform sample flatline miscellaneous
Evaluation Comparison
Image of Evaluation Comparison

You can now compare multiple evaluations against a test set in a ROC space. The graph can then be downloaded as a .PNG image, and the performance measures can be exported as a .csv for further analyses.

comparison auc evaluation labs
Google Integration
Image of Google Integration

With the Winter Release, you'll now be able to add sources to BigML through Google Cloud Storage and Google Drive, similar to our prior integrations with Dropbox and Azure Data Marketplace. You can also now log into BigML using your Google ID.

google datasources google drive google cloud storage winter2015 miscellaneous
Image of Projects

We're happy to introduce Projects to help you organize your machine learning resources. You only have to create a new project using the web interface or the API resource and update a new source to this project. All the new resources created from this source will be associated to the same project.

winter2015 projects miscellaneous
Dataset Comparison
Image of Dataset Comparison

This is another simple but useful application we have released into our new BigML Labs. It allows users to compare side by side two different datasets. Check it out here.

dataset winter2015 labs web
Sample Service
Image of Sample Service

BigML's new Sample Service provides fast access to datasets that are kept in an in-memory cache which enables a variety of sampling, filtering and correlation techniques. We have leveraged this new service to create a Dynamic Scatterplot visualization that we've released into BigML Labs.

sample dataset viz visualization winter2015 labs api
BigML Labs
Image of BigML Labs

Our team is constantly working on innovative applications built on top of BigML's API. We're now unveiling several of these in early access through our BigML Labs.

labs winter2015 miscellaneous
G-means Clusters
Image of G-means Clusters

This latest addition to BigML's unsupervised learning algorithms is ideal for when you may not know how many clusters you wish to build from your dataset.

gmeans winter2015 cluster
Cluster summary report
Image of Cluster summary report

Now you can download a Summary Report for your BigML Clusters. This report will inform you on the distribution of data across your clusters, as well as the associated features and data distances.

report cluster
BigML Comes to Australia and New Zealand
Image of BigML Comes to Australia and New Zealand

BigML is very pleased to announce that we've launched this new website to better serve our customers in Australia & New Zealand. This site will contain all of the content and functionality of our https://bigml.com site, but will provide faster performance as well as some localized content (e.g., local events and local training opportunities). Read more about in this blog post.

australia new zealand opening miscellaneous
Anomaly Detector
Image of Anomaly Detector

BigML makes it easy to build a top-performing anomaly detector that will help you identify instances in your dataset that do not conform to a regular pattern.

fraud detection summer2014 anomaly
Batch Anomaly Scores
Image of Batch Anomaly Scores

You can quickly score multiple lines of data through BigML's Batch Anomaly Score. The output can be downloaded as a .csv and/or you can use it to automatically create a new dataset.

batches fraud detection anomalyscore summer2014
Anomaly Score
Image of Anomaly Score

You can score individual data points against your anomaly detector by using the web interface. Simply input the variables and BigML will provide you with an anomaly percentage (a higher score reflects greater anomaly).

score fraud detection anomalyscore summer2014
New dataset from batch prediction output
Image of New dataset from batch prediction output

Batch predictions are a powerful way to score likely outcomes on multiple rows of data. You can now create a new dataset directly from the batch prediction output (in addition to getting the output as a .csv file).

batch prediction batch centroid batch anomalyscore workflow summer2014 dataset
Models from clusters
Image of Models from clusters

Now you can automatically create a model for each cluster that will not only help you better understand the cluster, but also use it to classify new instances.

model summer2014 cluster
Modeling with missing splits
Image of Modeling with missing splits

As we know that cleaning up data might be hard and having all the input data handy at prediction time is important, we have built a new option to create models that will generate predicates that explicitly deal with missing values.

missing values ensembles summer2014 model
Online predictions
Image of Online predictions

New client-side predictions make it easier than ever to explore the influence of each field in your models, ensembles or clusters. In addition, we are open sourcing the related Javascript libraries so you can leverage this functionality to build very powerful and dynamic apps and web services.

free client-side javascript summer2014 prediction
Fast ensembles
Image of Fast ensembles

We have refined the way the models of an ensemble are built to save a great amount of time in data transportation. This will dramatically speed up creation of your ensembles.

fast summer2014 ensemble
Sending Request...
Sending Request...