Production mode is required
Topic Models assume that each document exhibits a mixture of topics. The main goal of creating a Topic Model is to discover the topic importances for a given document. For example, a document may be 70% about "Machine Learning", 20% about "stock market" and 10% about "startups".
Topic Distributions allow you to make predictions for a single data instance, and Batch Topic Distributions help predict the same for multiple instances simultaneously. Based on a given Topic Model, BigML Topic Distributions provide a set of probabilities for each data instance (one probability per topic), which indicate the relative relevance of all topics for that instance.
The BigML team has brought Topic Models to the API and the Dashboard as part of Fall 2016 release. Topic Models are an optimized implementation of Latent Dirichlet Allocation, a probabilistic unsupervised learning method that determines the topics underlying a collection of documents.
Topic Models' main application areas include browsing, organizing and understanding large amounts of unstructured text data, which can be very useful for information retrieval tasks, collaborative filtering or content recommendation use cases among others.
BigML provides two original visualizations that accompany its implementation so you can better inspect your Topic Model:
- Topic Map: get an overview of your topic importances and their thematic closeness.
- Term Chart: get an overview of the main terms that make up your found topics.