CAP.get_caps#

CAP.get_caps(subject_timeseries, runs=None, n_clusters=5, cluster_selection_method=None, random_state=None, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, algorithm='lloyd', standardize=True, n_cores=None, show_figs=False, output_dir=None, progress_bar=False, as_pickle=False, **kwargs)[source]#

Perform K-Means Clustering to Identify CAPs.

Concatenates the timeseries of each subject into a single NumPy array with dimensions (participants x TRs) x ROI and uses sklearn.cluster.KMeans on the concatenated data. Separate KMeans models are generated for all groups.

Parameters:

subject_timeseries (SubjectTimeseries or str) – A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. Refer to documentation for SubjectTimeseries in the “See Also” section for an example structure.
runs (int, str, list[int], list[str], or None, default=None) – Specific run IDs to perform the CAPs analysis with (e.g. runs=[0, 1] or runs=["01", "02"]). If None, all runs will be used.
n_clusters (int or list[int], default=5) – Number of clusters to use. Can be a single integer or a list of integers (if cluster_selection_method is not None).
cluster_selection_method ({“elbow”, “davies_bouldin”, “silhouette”, “variance_ratio”} or None, default=None) – Method to find the optimal number of clusters. Options are “elbow”, “davies_bouldin”, “silhouette”, and “variance_ratio”.
random_state (int or None, default=None) – Random state (seed) value to use.
init ({“k-means++”, “random”}, Callable, or ArrayLike, default=”k-means++”) – Method for choosing initial cluster centroid. Options are “k-means++”, “random”, or callable or array-like of shape (n_clusters, n_features).
n_init ({“auto”} or int, default=”auto”) – Number of times k-means is ran with different initial clusters. The model with lowest inertia from these runs will be selected.
max_iter (int, default=300) – Maximum number of iterations for a single run of k-means.
tol (float, default=1e-4) – Stopping criterion if the change in inertia is below this value, assuming max_iter has not been reached.
algorithm ({"lloyd", "elkan"}, default="lloyd") – The algorithm to use. Options are “lloyd” and “elkan”.
standardize (bool, default=True) –
Standardizes the columns (ROIs) of the concatenated timeseries data. Uses sample standard deviation (n-1).

Note

Standard deviations below np.finfo(std.dtype).eps are replaced with 1 for numerical stability.
n_cores (int or None, default=None) – Number of cores to use for multiprocessing, with Joblib, to run multiple k-means models if cluster_selection_method is not None. The “loky” backend is used.
show_figs (bool, default=False) – Displays the plots for the specified cluster_selection_method for all groups.
output_dir (str or None, default=None) – Directory to save plots as png files if cluster_selection_method is not None. The directory will be created if it does not exist. If None, plots will not be saved.
progress_bar (bool, default=False) – If True and cluster_selection_method is not None, displays a progress bar.
as_pickle (bool, default=False) –
When output_dir and cluster_selection_method is specified, plots are saved as pickle filess, which can be further modified, instead of png images.

Added in version 0.26.5.
**kwargs –
Additional keyword arguments when cluster_selection_method is specified:
- S: int, default=1 – Adjusts the sensitivity of finding the elbow. Larger values are more conservative and less sensitive to small fluctuations. Passed to KneeLocator from the kneed package.
- dpi: int, default=300 – Dots per inch for the figure.
- figsize: tuple, default=(8, 6) – Adjusts the size of the plots.
- bbox_inches: str or None, default=”tight” – Alters size of the whitespace in the saved image.
- step: int, default=None – An integer value that controls the progression of the x-axis in plots.