neurocaps.analysis.CAP.get_caps
- CAP.get_caps(subject_timeseries, runs=None, n_clusters=5, cluster_selection_method=None, random_state=None, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, algorithm='lloyd', standardize=True, n_cores=None, show_figs=False, output_dir=None, progress_bar=False, **kwargs)[source]
Perform K-Means Clustering to Identify CAPs.
Concatenates the timeseries of each subject into a single NumPy array with dimensions (participants x TRs) x ROI and uses
sklearn.cluster.KMeanson the concatenated data. Note,KMeansuses Euclidean distance. Additionally, the Elbow method is determined usingKneeLocatorfrom the kneed package and the Davies Bouldin, Silhouette, and Variance Ratio methods are calculated using scikit-learn'sdavies_bouldin_score,silhouette_score, andcalinski_harabasz_scorefunctions, respectively. Note, if groups were given when theCAPclass was initialized, separateKMeansmodels and plots will be generated for all groups.- Parameters:
subject_timeseries (
dict[str, dict[str, np.ndarray]]oros.PathLike) --A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. The expected structure of is as follows:
subject_timeseries = { "101": { "run-0": np.array([...]), # Shape: TRs x ROIs "run-1": np.array([...]), # Shape: TRs x ROIs "run-2": np.array([...]), # Shape: TRs x ROIs }, "102": { "run-0": np.array([...]), # Shape: TRs x ROIs "run-1": np.array([...]), # Shape: TRs x ROIs } }
runs (
int,str,list[int],list[str], orNone, default=None) -- The run numbers to perform the CAPs analysis with (e.g.runs=[0, 1]orruns=["01", "02"]). If None, all runs in the subject timeseries will be concatenated into a single dataframe and subjected to k-means clustering.n_clusters (
Union[int, list[int]], default=5) -- The number of clusters to use forsklearn.cluster.KMeans. Can be a single integer or a list of integers (ifcluster_selection_methodis not None).cluster_selection_method ({"elbow", "davies_bouldin", "silhouette", "variance_ratio"} or
None, default=None) -- Method to find the optimal number of clusters. Options are "elbow", "davies_bouldin", "silhouette", and "variance_ratio".random_state (
intorNone, default=None) -- The random state to use forsklearn.cluster.KMeans. Ensures reproducible results.init ({"k-means++", "random"},
Callable, or ArrayLike, default="k-means++") -- Method for choosing initial cluster centroid forsklearn.cluster.KMeans. Options are "k-means++", "random", or callable or array-like of shape (n_clusters, n_features).n_init ({"auto"} or
int, default="auto") -- Number of timessklearn.cluster.KMeansis ran with different initial clusters. The model with lowest inertia from these runs will be selected.max_iter (
int, default=300) -- Maximum number of iterations for a single run ofsklearn.cluster.KMeans.tol (
float, default=1e-4,) -- Stopping criterion forsklearn.cluster.KMeans``if the change in inertia is below this value, assuming ``max_iterhas not been reached.algorithm ({"lloyd", "elkan"}, default="lloyd") -- The type of algorithm to use for
sklearn.cluster.KMeans. Options are "lloyd" and "elkan".standardize (
bool, default=True) -- Standardizes the columns (ROIs) of the concatenated timeseries data. Uses sample standard deviation with Bessel's correction (n-1 in denominator).n_cores (
intorNone, default=None) -- The number of cores to use for multiprocessing, with Joblib, to run multiplesklearn.cluster.KMeansmodels ifcluster_selection_methodis not None. The "loky" backend is used.show_figs (
bool, default=False) -- Displays the plots for the specifiedcluster_selection_methodfor all groups ifcluster_selection_methodis not None.output_dir (
os.PathLikeorNone, default=None) -- Directory to save plots as png files ifcluster_selection_methodis not None. The directory will be created if it does not exist. If None, plots will not be saved.progress_bar (
bool, default=False) --If True and
cluster_selection_methodis not None, displays a progress bar.Added in version 0.21.5.
**kwargs --
Dictionary to adjust certain parameters when
cluster_selection_methodis not None. Additional parameters include:S:
int, default=1 -- Adjusts the sensitivity of finding the elbow. Larger values are more conservative and less sensitive to small fluctuations. Passed toKneeLocatorfrom the kneed package.dpi:
int, default=300 -- Dots per inch for the figure.figsize:
tuple, default=(8, 6) -- Adjusts the size of the plots.bbox_inches:
strorNone, default="tight" -- Alters size of the whitespace in the saved image.step:
int, default=None -- An integer value that controls the progression of the x-axis in plots for the specifiedcluster_selection_method. When set, only integer values will be displayed on the x-axis.
- Returns:
self
Added in version 0.19.3.
Note
KMeans Algorithm: Refer to scikit-learn's Documentation for additional information about the
KMeansalgorithm used in this method.