neurocaps.analysis.CAP.get_caps#

CAP.get_caps(subject_timeseries, runs=None, n_clusters=5, cluster_selection_method=None, random_state=None, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, algorithm='lloyd', standardize=True, n_cores=None, show_figs=False, output_dir=None, progress_bar=False, **kwargs)[source]#

Perform K-Means Clustering to Identify CAPs.

Concatenates the timeseries of each subject into a single NumPy array with dimensions (participants x TRs) x ROI and uses sklearn.cluster.KMeans on the concatenated data. Separate KMeans models are generated for all groups.

Parameters:
  • subject_timeseries (SubjectTimeseries or str) – A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. Refer to documentation for SubjectTimeseries in the “See Also” section for an example structure.

  • runs (int, str, list[int], list[str], or None, default=None) – Specific run IDs to perform the CAPs analysis with (e.g. runs=[0, 1] or runs=["01", "02"]). If None, all runs will be used.

  • n_clusters (int or list[int], default=5) – Number of clusters to use. Can be a single integer or a list of integers (if cluster_selection_method is not None).

  • cluster_selection_method ({“elbow”, “davies_bouldin”, “silhouette”, “variance_ratio”} or None, default=None) – Method to find the optimal number of clusters. Options are “elbow”, “davies_bouldin”, “silhouette”, and “variance_ratio”.

  • random_state (int or None, default=None) – Random state (seed) value to use.

  • init ({“k-means++”, “random”}, Callable, or ArrayLike, default=”k-means++”) – Method for choosing initial cluster centroid. Options are “k-means++”, “random”, or callable or array-like of shape (n_clusters, n_features).

  • n_init ({“auto”} or int, default=”auto”) – Number of times k-means is ran with different initial clusters. The model with lowest inertia from these runs will be selected.

  • max_iter (int, default=300) – Maximum number of iterations for a single run of k-means.

  • tol (float, default=1e-4) – Stopping criterion if the change in inertia is below this value, assuming max_iter has not been reached.

  • algorithm ({"lloyd", "elkan"}, default="lloyd") – The algorithm to use. Options are “lloyd” and “elkan”.

  • standardize (bool, default=True) – Standardizes the columns (ROIs) of the concatenated timeseries data. Uses sample standard deviation (n-1).

  • n_cores (int or None, default=None) – Number of cores to use for multiprocessing, with Joblib, to run multiple k-means models if cluster_selection_method is not None. The “loky” backend is used.

  • show_figs (bool, default=False) – Displays the plots for the specified cluster_selection_method for all groups.

  • output_dir (str or None, default=None) – Directory to save plots as png files if cluster_selection_method is not None. The directory will be created if it does not exist. If None, plots will not be saved.

  • progress_bar (bool, default=False) –

    If True and cluster_selection_method is not None, displays a progress bar.

    Added in version 0.21.5.

  • **kwargs

    Additional keyword arguments when cluster_selection_method is specified:

    • S: int, default=1 – Adjusts the sensitivity of finding the elbow. Larger values are more conservative and less sensitive to small fluctuations. Passed to KneeLocator from the kneed package.

    • dpi: int, default=300 – Dots per inch for the figure.

    • figsize: tuple, default=(8, 6) – Adjusts the size of the plots.

    • bbox_inches: str or None, default=”tight” – Alters size of the whitespace in the saved image.

    • step: int, default=None – An integer value that controls the progression of the x-axis in plots.

Returns:

self

Note

KMeans Algorithm: Refer to scikit-learn’s Documentation for additional information about the KMeans algorithm used in this method.

The n_clusters, random_state, init, n_init, max_iter, tol, and algorithm parameters are passed to sklearn.cluster.KMeans.