API Reference¶
Plotting¶
-
upsetplot.
plot
(data, fig=None, **kwargs)[source]¶ Make an UpSet plot of data on fig
Parameters: - data : pandas.Series or pandas.DataFrame
Values for each set to plot. Should have multi-index where each level is binary, corresponding to set membership. If a DataFrame,
sum_over
must be a string or False.- fig : matplotlib.figure.Figure, optional
Defaults to a new figure.
- kwargs
Other arguments for
UpSet
Returns: - subplots : dict of matplotlib.axes.Axes
Keys are ‘matrix’, ‘intersections’, ‘totals’, ‘shading’
-
class
upsetplot.
UpSet
(data, orientation='horizontal', sort_by='degree', sort_categories_by='cardinality', subset_size='legacy', sum_over=None, facecolor='black', with_lines=True, element_size=32, intersection_plot_elements=6, totals_plot_elements=2, show_counts='', sort_sets_by='deprecated')[source]¶ Manage the data and drawing for a basic UpSet plot
Primary public method is
plot()
.Parameters: - data : pandas.Series or pandas.DataFrame
Elements associated with categories (a DataFrame), or the size of each subset of categories (a Series). Should have MultiIndex where each level is binary, corresponding to category membership. If a DataFrame,
sum_over
must be a string or False.- orientation : {‘horizontal’ (default), ‘vertical’}
If horizontal, intersections are listed from left to right.
- sort_by : {‘cardinality’, ‘degree’}
If ‘cardinality’, subset are listed from largest to smallest. If ‘degree’, they are listed in order of the number of categories intersected.
- sort_categories_by : {‘cardinality’, None}
Whether to sort the categories by total cardinality, or leave them in the provided order.
- subset_size : {‘auto’, ‘count’, ‘sum’}
Configures how to calculate the size of a subset. Choices are:
- ‘auto’
If
data
is a DataFrame, count the number of rows in each group, unlesssum_over
is specified. Ifdata
is a Series with at most one row for each group, use the value of the Series. Ifdata
is a Series with more than one row per group, raise a ValueError.- ‘count’
Count the number of rows in each group.
- ‘sum’
Sum the value of the
data
Series, or the DataFrame field specified bysum_over
.
Until version 0.4, the default is ‘legacy’ which uses
sum_over
to control this behaviour. From version 0.4, ‘auto’ will be default.- sum_over : str or None
If
subset_size='sum'
or'auto'
, then the intersection size is the sum of the specified field in thedata
DataFrame. If a Series, only None is supported and its value is summed.If
subset_size='legacy'
,sum_over
must be specified whendata
is a DataFrame. If False, the intersection plot will show the count of each subset. Otherwise, it shows the sum of the specified field.- facecolor : str
Color for bar charts and dots.
- with_lines : bool
Whether to show lines joining dots in the matrix, to mark multiple categories being intersected.
- element_size : float or None
Side length in pt. If None, size is estimated to fit figure
- intersection_plot_elements : int
The intersections plot should be large enough to fit this many matrix elements.
- totals_plot_elements : int
The totals plot should be large enough to fit this many matrix elements.
- show_counts : bool or str, default=False
Whether to label the intersection size bars with the cardinality of the intersection. When a string, this formats the number. For example, ‘%d’ is equivalent to True.
- sort_sets_by
Methods
add_catplot
(self, kind[, value, elements])Add a seaborn catplot over subsets when plot()
is called.make_grid
(self[, fig])Get a SubplotSpec for each Axes, accounting for label text width plot
(self[, fig])Draw all parts of the plot onto fig or a new figure plot_intersections
(self, ax)Plot bars indicating intersection size plot_matrix
(self, ax)Plot the matrix of intersection indicators onto ax plot_totals
(self, ax)Plot bars indicating total set size plot_shading -
add_catplot
(self, kind, value=None, elements=3, **kw)[source]¶ Add a seaborn catplot over subsets when
plot()
is called.Parameters: - kind : str
One of {“point”, “bar”, “strip”, “swarm”, “box”, “violin”, “boxen”}
- value : str, optional
Column name for the value to plot (i.e. y if orientation=’horizontal’), required if
data
is a DataFrame.- elements : int, default=3
Size of the axes counted in number of matrix elements.
- **kw : dict
Additional keywords to pass to
seaborn.catplot()
.Our implementation automatically determines ‘ax’, ‘data’, ‘x’, ‘y’ and ‘orient’, so these are prohibited keys in
kw
.
Returns: - None
Dataset loading and generation¶
-
upsetplot.
from_contents
(contents, data=None, id_column='id')[source]¶ Build data from category listings
Parameters: - contents : Mapping (or iterable over pairs) of strings to sets
Keys are category names, values are sets of identifiers (int or string).
- data : DataFrame, optional
If provided, this should be indexed by the identifiers used in
Python Documentation contents
.- id_column : str, default=’id’
The column name to use for the identifiers in the output.
Returns: - DataFrame
data
is returned with its index indicating category membership, including a column named according to id_column. If data is not given, the order of rows is not assured.
Notes
The order of categories in the output DataFrame is determined from
Python Documentation contents
, which may have non-deterministic iteration order.Examples
>>> from upsetplot import from_contents >>> contents = {'cat1': ['a', 'b', 'c'], ... 'cat2': ['b', 'd'], ... 'cat3': ['e']} >>> from_contents(contents) # doctest: +NORMALIZE_WHITESPACE id cat1 cat2 cat3 True False False a True False b False False c False True False d False True e >>> import pandas as pd >>> contents = {'cat1': [0, 1, 2], ... 'cat2': [1, 3], ... 'cat3': [4]} >>> data = pd.DataFrame({'favourite': ['green', 'red', 'red', ... 'yellow', 'blue']}) >>> from_contents(contents, data=data) # doctest: +NORMALIZE_WHITESPACE id favourite cat1 cat2 cat3 True False False 0 green True False 1 red False False 2 red False True False 3 yellow False True 4 blue
-
upsetplot.
from_memberships
(memberships, data=None)[source]¶ Load data where each sample has a collection of category names
The output should be suitable for passing to
UpSet
orplot
.Parameters: - memberships : sequence of collections of strings
Each element corresponds to a data point, indicating the sets it is a member of. Each category is named by a string.
- data : Series-like or DataFrame-like, optional
If given, the index of category memberships is attached to this data. It must have the same length as
memberships
. If not given, the series will contain the value 1.
Returns: - DataFrame or Series
data
is returned with its index indicating category membership. It will be a Series ifdata
is a Series or 1d numeric array. The index will have levels ordered by category names.
Examples
>>> from upsetplot import from_memberships >>> from_memberships([ ... ['cat1', 'cat3'], ... ['cat2', 'cat3'], ... ['cat1'], ... [] ... ]) # doctest: +ELLIPSIS, +NORMALIZE_WHITESPACE cat1 cat2 cat3 True False True 1 False True True 1 True False False 1 False False False 1 Name: ones, dtype: ... >>> # now with data: >>> import numpy as np >>> from_memberships([ ... ['cat1', 'cat3'], ... ['cat2', 'cat3'], ... ['cat1'], ... [] ... ], data=np.arange(12).reshape(4, 3)) # doctest: +NORMALIZE_WHITESPACE 0 1 2 cat1 cat2 cat3 True False True 0 1 2 False True True 3 4 5 True False False 6 7 8 False False False 9 10 11
-
upsetplot.
generate_counts
(seed=0, n_samples=10000, n_categories=3)[source]¶ Generate artificial counts corresponding to set intersections
Parameters: - seed : int
A seed for randomisation
- n_samples : int
Number of samples to generate statistics over
- n_categories : int
Number of categories (named “cat0”, “cat1”, …) to generate
Returns: - Series
Counts indexed by boolean indicator mask for each category.
See also
generate_samples
- Generates a DataFrame of samples that these counts are derived from.
-
upsetplot.
generate_samples
(seed=0, n_samples=10000, n_categories=3)[source]¶ Generate artificial samples assigned to set intersections
Parameters: - seed : int
A seed for randomisation
- n_samples : int
Number of samples to generate
- n_categories : int
Number of categories (named “cat0”, “cat1”, …) to generate
Returns: - DataFrame
Field ‘value’ is a weight or score for each element. Field ‘index’ is a unique id for each element. Index includes a boolean indicator mask for each category.
Note: Further fields may be added in future versions.
See also
generate_counts
- Generates the counts for each subset of categories corresponding to these samples.