neurocaps.analysis.CAP.get_caps#

CAP.get_caps(subject_timeseries, runs=None, n_clusters=5, cluster_selection_method=None, random_state=None, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, algorithm='lloyd', standardize=True, n_cores=None, show_figs=False, output_dir=None, progress_bar=False, **kwargs)[source]#

Perform K-Means Clustering to Identify CAPs.

Concatenates the timeseries of each subject into a single NumPy array with dimensions (participants x TRs) x ROI and uses sklearn.cluster.KMeans on the concatenated data. Note, KMeans uses Euclidean distance. Additionally, the Elbow method is determined using KneeLocator from the kneed package and the Davies Bouldin, Silhouette, and Variance Ratio methods are calculated using scikit-learn's davies_bouldin_score, silhouette_score, and calinski_harabasz_score functions, respectively. Note, if groups were given when the CAP class was initialized, separate KMeans models and plots will be generated for all groups.

Parameters:
  • subject_timeseries (dict[str, dict[str, np.ndarray]] or os.PathLike) --

    A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. The expected structure of is as follows:

    subject_timeseries = {
            "101": {
                "run-0": np.array([...]), # Shape: TRs x ROIs
                "run-1": np.array([...]), # Shape: TRs x ROIs
                "run-2": np.array([...]), # Shape: TRs x ROIs
            },
            "102": {
                "run-0": np.array([...]), # Shape: TRs x ROIs
                "run-1": np.array([...]), # Shape: TRs x ROIs
            }
        }
    

  • runs (int, str, list[int], list[str], or None, default=None) -- The run numbers to perform the CAPs analysis with (e.g. runs=[0, 1] or runs=["01", "02"]). If None, all runs in the subject timeseries will be concatenated into a single dataframe and subjected to k-means clustering.

  • n_clusters (Union[int, list[int]], default=5) -- The number of clusters to use for sklearn.cluster.KMeans. Can be a single integer or a list of integers (if cluster_selection_method is not None).

  • cluster_selection_method ({"elbow", "davies_bouldin", "silhouette", "variance_ratio"} or None, default=None) -- Method to find the optimal number of clusters. Options are "elbow", "davies_bouldin", "silhouette", and "variance_ratio".

  • random_state (int or None, default=None) -- The random state to use for sklearn.cluster.KMeans. Ensures reproducible results.

  • init ({"k-means++", "random"}, Callable, or ArrayLike, default="k-means++") -- Method for choosing initial cluster centroid for sklearn.cluster.KMeans. Options are "k-means++", "random", or callable or array-like of shape (n_clusters, n_features).

  • n_init ({"auto"} or int, default="auto") -- Number of times sklearn.cluster.KMeans is ran with different initial clusters. The model with lowest inertia from these runs will be selected.

  • max_iter (int, default=300) -- Maximum number of iterations for a single run of sklearn.cluster.KMeans.

  • tol (float, default=1e-4,) -- Stopping criterion for sklearn.cluster.KMeans``if the change in inertia is below this value, assuming ``max_iter has not been reached.

  • algorithm ({"lloyd", "elkan"}, default="lloyd") -- The type of algorithm to use for sklearn.cluster.KMeans. Options are "lloyd" and "elkan".

  • standardize (bool, default=True) -- Standardizes the columns (ROIs) of the concatenated timeseries data. Uses sample standard deviation with Bessel's correction (n-1 in denominator).

  • n_cores (int or None, default=None) -- The number of cores to use for multiprocessing, with Joblib, to run multiple sklearn.cluster.KMeans models if cluster_selection_method is not None. The "loky" backend is used.

  • show_figs (bool, default=False) -- Displays the plots for the specified cluster_selection_method for all groups if cluster_selection_method is not None.

  • output_dir (os.PathLike or None, default=None) -- Directory to save plots as png files if cluster_selection_method is not None. The directory will be created if it does not exist. If None, plots will not be saved.

  • progress_bar (bool, default=False) --

    If True and cluster_selection_method is not None, displays a progress bar.

    Added in version 0.21.5.

  • **kwargs --

    Dictionary to adjust certain parameters when cluster_selection_method is not None. Additional parameters include:

    • S: int, default=1 -- Adjusts the sensitivity of finding the elbow. Larger values are more conservative and less sensitive to small fluctuations. Passed to KneeLocator from the kneed package.

    • dpi: int, default=300 -- Dots per inch for the figure.

    • figsize: tuple, default=(8, 6) -- Adjusts the size of the plots.

    • bbox_inches: str or None, default="tight" -- Alters size of the whitespace in the saved image.

    • step: int, default=None -- An integer value that controls the progression of the x-axis in plots for the specified cluster_selection_method. When set, only integer values will be displayed on the x-axis.

Returns:

self

Note

KMeans Algorithm: Refer to scikit-learn's Documentation for additional information about the KMeans algorithm used in this method.