Development mode is required
You can change your settings in your account page.
Production mode is required
As a complement to our popular decision trees visualization and the sunburst, we are launching a third view for your models: the Partial Dependence Plot. This heatmap chart also allows you to analyze the marginal impact of each input field on predictions for classification and regression models built by using ensembles and logistic regressions.
Solving a Machine Learning problem is an iterative process that requires the creation of a great number of intermediary datasets, models, evaluations and predictions to get the final model. Now, BigML simplifies it keeping your account organized and up-to-date by allowing the deletion of multiple resources at the same time. Just click the deletion icon found in the resources listing in the Dashboard, and select the resources to be deleted.
Asynchcronous WhizzML script executions can take some time to finish to fulfill complex Machine Learning workflows they implement. You no longer need to check your execution repeatedly to see if your results are ready. This new option let's you specify that you want to be notified by e-mail once the execution finishes, while you concentrate on other tasks.
Furthering our obsession to speed up your Machine Learning processes, we have incorporated Scriptify into your 1-click menu options. Now, you can automatically regenerate any BigML resource (models, evaluations, predictions, etc) with a single click. Scriptify creates a script that contains all the workflow information end-to-end (from configuration parameters to resources created). You can precisely repeat the processing steps of any original Machine Learning resource to your heart's desire!
Now, you can easily clone datasets, models and scripts, from other users into your BigML account. Provided that a user shares a resource using the sharing link and the cloning capability is enabled, any other user with access to the link will be able to include this resource in their BigML account.
This new feature will allow you to fully use the shared resources. For example, when another user shares a dataset using the sharing link, it is in "view only" mode, so you can not perform any actions such as creating new models, exporting it, sampling it, etc. Now, by cloning it, you will be able to perform all BigML actions available for datasets.
These new Dashboard statistics allow you to introspect the predictive power of your model by revealing the significance of each coefficient estimate. BigML computes the likelihood ratio to test how well the model fits your data along with the p-value, confidence interval, standard, error and Z score for each coefficient.
Learn more about the Logistic Regression statistics in the Dashboard documentation.
BigML is bringing predictions for Associations to the Dashboard. Association Sets allow you to pinpoint the items which are most strongly associated with your input data. For example, given a set of products purchased by a person, what other products are most likely to be bought?
All the predicted items will be ranked according to a similarity score, and they will be displayed in a table view. You can also visualize each predicted rule in a Venn diagram to get a sense of the correlation strength between the input data and the predicted items. Read more about Association Sets in the 8th chapter of the Associations documentation.
We are happy to announce BigML Certifications, for organizations and professionals that want to master BigML to successfully deliver real-life Machine Learning projects. These courses are ideal for software developers, system integrators, analysts, or scientists, to boost their skill set and deliver sophisticated data-driven solutions. We offer two separate courses, each of them consisting of 4 weekly online classes of 3 hours each:
Certified Engineer: all you need to know about advanced modeling, advanced data transformations, and how to use the BigML API (and its wrappers) in combination with WhizzML to build and automate your Machine Learning workflows.
Certified Architect: learn how to implement your Machine Learning solutions so they are scalable, impactful, capable of being integrated with third-party systems, and easy to maintain and retrain.
If you successfully pass the certification exam, BigML will award you with a diploma. In addition, BigML Certified Partners will receive business referrals that help them source new Machine Learning projects.
This new visualization for ensembles, commonly known as Partial Dependence Plot, allows you to visualize the impact that a set of fields have on predictions. You will be able to determine which fields are most relevant for ensemble predictions and how sensitive your ensemble predictions are to their different values.
The chart displays a heatmap representation of your predictions based on different values of the two selected fields in the axes regardless of the rest of the fields used to train your ensemble. You can select any categorical or numeric field for the axes and configure the values for the rest of the input fields by using the fields inspector panel on the right.
This feature enables you to include the field importances in your batch predictions, i.e., a set of percentages indicating how much each field in your dataset contributed to the prediction of a given instance. You can include those values in your output file and dataset either with BigML Dashboard or the API. This will give you a better understanding of your predictions as it will reveal which are the most relevant fields factoring in a given prediction.
Topic Models assume that each document exhibits a mixture of topics. The main goal of creating a Topic Model is to discover the topic importances for a given document. For example, a document may be 70% about "Machine Learning", 20% about "stock market" and 10% about "startups".
Topic Distributions allow you to make predictions for a single data instance, and Batch Topic Distributions help predict the same for multiple instances simultaneously. Based on a given Topic Model, BigML Topic Distributions provide a set of probabilities for each data instance (one probability per topic), which indicate the relative relevance of all topics for that instance.
The BigML team has brought Topic Models to the API and the Dashboard as part of Fall 2016 release. Topic Models are an optimized implementation of Latent Dirichlet Allocation, a probabilistic unsupervised learning method that determines the topics underlying a collection of documents.
Topic Models' main application areas include browsing, organizing and understanding large amounts of unstructured text data, which can be very useful for information retrieval tasks, collaborative filtering or content recommendation use cases among others.
BigML provides two original visualizations that accompany its implementation so you can better inspect your Topic Model:
- Topic Map: get an overview of your topic importances and their thematic closeness.
- Term Chart: get an overview of the main terms that make up your found topics.
Flatline is BigML’s Lisp-like language that enables you to programmatically perform an array of data transformations, including filtering and new field generation. Flatliner is a handy code editor (available in our Labs section) that helps you test your Flatline expressions.
You can now compare multiple evaluations against a test set in a ROC space. The graph can then be downloaded as a .PNG image, and the performance measures can be exported as a .csv for further analyses.
With the Winter Release, you'll now be able to add sources to BigML through Google Cloud Storage and Google Drive, similar to our prior integrations with Dropbox and Azure Data Marketplace. You can also now log into BigML using your Google ID.
We're happy to introduce Projects to help you organize your machine learning resources. You only have to create a new project using the web interface or the API resource and update a new source to this project. All the new resources created from this source will be associated to the same project.
This is another simple but useful application we have released into our new BigML Labs. It allows users to compare side by side two different datasets. Check it out here.
BigML's new Sample Service provides fast access to datasets that are kept in an in-memory cache which enables a variety of sampling, filtering and correlation techniques. We have leveraged this new service to create a Dynamic Scatterplot visualization that we've released into BigML Labs.
Our team is constantly working on innovative applications built on top of BigML's API. We're now unveiling several of these in early access through our BigML Labs.
This latest addition to BigML's unsupervised learning algorithms is ideal for when you may not know how many clusters you wish to build from your dataset.
Now you can download a Summary Report for your BigML Clusters. This report will inform you on the distribution of data across your clusters, as well as the associated features and data distances.
BigML is very pleased to announce that we've launched this new website to better serve our customers in Australia & New Zealand. This site will contain all of the content and functionality of our https://bigml.com site, but will provide faster performance as well as some localized content (e.g., local events and local training opportunities). Read more about in this blog post.
BigML makes it easy to build a top-performing anomaly detector that will help you identify instances in your dataset that do not conform to a regular pattern.
You can quickly score multiple lines of data through BigML's Batch Anomaly Score. The output can be downloaded as a .csv and/or you can use it to automatically create a new dataset.
You can score individual data points against your anomaly detector by using the web interface. Simply input the variables and BigML will provide you with an anomaly percentage (a higher score reflects greater anomaly).
Batch predictions are a powerful way to score likely outcomes on multiple rows of data. You can now create a new dataset directly from the batch prediction output (in addition to getting the output as a .csv file).
Now you can automatically create a model for each cluster that will not only help you better understand the cluster, but also use it to classify new instances.
As we know that cleaning up data might be hard and having all the input data handy at prediction time is important, we have built a new option to create models that will generate predicates that explicitly deal with missing values.