CAP.get_caps#

CAP.get_caps(subject_timeseries, runs=None, n_clusters=5, cluster_selection_method=None, random_state=None, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, algorithm='lloyd', standardize=True, n_cores=None, show_figs=False, output_dir=None, progress_bar=False, as_pickle=False, **kwargs)[source]#

Perform K-Means Clustering to Identify CAPs.

Concatenates the timeseries of each subject into a single NumPy array with dimensions (participants x TRs) x ROI and uses sklearn.cluster.KMeans on the concatenated data. Separate KMeans models are generated for all groups.

Parameters:
  • subject_timeseries (SubjectTimeseries or str) – A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. Refer to documentation for SubjectTimeseries in the “See Also” section for an example structure.

  • runs (int, str, list[int], list[str], or None, default=None) – Specific run IDs to perform the CAPs analysis with (e.g. runs=[0, 1] or runs=["01", "02"]). If None, all runs will be used.

  • n_clusters (int or list[int], default=5) – Number of clusters to use. Can be a single integer or a list of integers (if cluster_selection_method is not None).

  • cluster_selection_method ({“elbow”, “davies_bouldin”, “silhouette”, “variance_ratio”} or None, default=None) – Method to find the optimal number of clusters. Options are “elbow”, “davies_bouldin”, “silhouette”, and “variance_ratio”.

  • random_state (int or None, default=None) – Random state (seed) value to use.

  • init ({“k-means++”, “random”}, Callable, or ArrayLike, default=”k-means++”) – Method for choosing initial cluster centroid. Options are “k-means++”, “random”, or callable or array-like of shape (n_clusters, n_features).

  • n_init ({“auto”} or int, default=”auto”) – Number of times k-means is ran with different initial clusters. The model with lowest inertia from these runs will be selected.

  • max_iter (int, default=300) – Maximum number of iterations for a single run of k-means.

  • tol (float, default=1e-4) – Stopping criterion if the change in inertia is below this value, assuming max_iter has not been reached.

  • algorithm ({"lloyd", "elkan"}, default="lloyd") – The algorithm to use. Options are “lloyd” and “elkan”.

  • standardize (bool, default=True) –

    Standardizes the columns (ROIs) of the concatenated timeseries data. Uses sample standard deviation (n-1).

    Note

    Standard deviations below np.finfo(std.dtype).eps are replaced with 1 for numerical stability.

  • n_cores (int or None, default=None) – Number of cores to use for multiprocessing, with Joblib, to run multiple k-means models if cluster_selection_method is not None. The “loky” backend is used.

  • show_figs (bool, default=False) – Displays the plots for the specified cluster_selection_method for all groups.

  • output_dir (str or None, default=None) – Directory to save plots as png files if cluster_selection_method is not None. The directory will be created if it does not exist. If None, plots will not be saved.

  • progress_bar (bool, default=False) – If True and cluster_selection_method is not None, displays a progress bar.

  • as_pickle (bool, default=False) –

    When output_dir and cluster_selection_method is specified, plots are saved as pickle filess, which can be further modified, instead of png images.

    Added in version 0.26.5.

  • **kwargs

    Additional keyword arguments when cluster_selection_method is specified:

    • S: int, default=1 – Adjusts the sensitivity of finding the elbow. Larger values are more conservative and less sensitive to small fluctuations. Passed to KneeLocator from the kneed package.

    • dpi: int, default=300 – Dots per inch for the figure.

    • figsize: tuple, default=(8, 6) – Adjusts the size of the plots.

    • bbox_inches: str or None, default=”tight” – Alters size of the whitespace in the saved image.

    • step: int, default=None – An integer value that controls the progression of the x-axis in plots.

See also

neurocaps.typing.SubjectTimeseries

Type definition for the subject timeseries dictionary structure. Refer to the SubjectTimeseries documentation.

Returns:

self

Raises:

NoElbowDetectionError – Occurs when cluster_selection_method is set to elbow but kneed’s KneeLocator does not detect an elbow in the convex curve. Refer to NoElbowDetectionError documentation.

Notes

KMeans Algorithm: Refer to scikit-learn’s Documentation for additional information about the KMeans algorithm used in this method.

The n_clusters, random_state, init, n_init, max_iter, tol, and algorithm parameters are passed to sklearn.cluster.KMeans. Only n_clusters differs from scikit-learn’s default value, changing from 8 to 5.

Default Group Naming: When group is None during initialization of the CAP class, then “All Subjects” is the default group name. On the first call of this function, the subject IDs in subject_timeseries will be automatically detected and stored in self.group. This mapping persists until the CAP class is re-initialized.

Concatenated Timeseries: The concatenated timeseries is stored in self.concatenated_timeseries for user convenience and can be deleted using del self.concatenated_timeseries without disruption to the any other function. Additionally, for versions >= 0.25.0, the concatenation of subjects is performed lexicographically based on their subject IDs.