When you create a topic model in BigML, your topics get default names. Until now the names were set as "Topic 00", "Topic 01", and so on. Now BigML takes the top term per topic as the default name, aiming to provide more descriptive names for your future topic models.
If two topics share the first top term the next most likely term will also be included in the topic names. You can configure this setting and return to the old naming or increase the number of terms that you want to be included per topic name.
Please read the section 4.4 Minimum Terms Per Topic Name in the Topic Model documentation.
Topic Models assume that each document exhibits a mixture of topics. The main goal of creating a Topic Model is to discover the topic importances for a given document. For example, a document may be 70% about "Machine Learning", 20% about "stock market" and 10% about "startups".
Topic Distributions allow you to make predictions for a single data instance, and Batch Topic Distributions help predict the same for multiple instances simultaneously. Based on a given Topic Model, BigML Topic Distributions provide a set of probabilities for each data instance (one probability per topic), which indicate the relative relevance of all topics for that instance.
The BigML team has brought Topic Models to the API and the Dashboard as part of Fall 2016 release. Topic Models are an optimized implementation of Latent Dirichlet Allocation, a probabilistic unsupervised learning method that determines the topics underlying a collection of documents.
Topic Models' main application areas include browsing, organizing and understanding large amounts of unstructured text data, which can be very useful for information retrieval tasks, collaborative filtering or content recommendation use cases among others.
BigML provides two original visualizations that accompany its implementation so you can better inspect your Topic Model:
- Topic Map: get an overview of your topic importances and their thematic closeness.
- Term Chart: get an overview of the main terms that make up your found topics.