neurocaps.analysis.CAP.get_caps#
- CAP.get_caps(subject_timeseries, runs=None, n_clusters=5, cluster_selection_method=None, random_state=None, init='k-means++', n_init='auto', max_iter=300, tol=0.0001, algorithm='lloyd', standardize=True, n_cores=None, show_figs=False, output_dir=None, progress_bar=False, **kwargs)[source]#
Perform K-Means Clustering to Identify CAPs.
Concatenates the timeseries of each subject into a single NumPy array with dimensions (participants x TRs) x ROI and uses
sklearn.cluster.KMeanson the concatenated data. SeparateKMeansmodels are generated for all groups.- Parameters:
subject_timeseries (
SubjectTimeseriesorstr) – A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. Refer to documentation forSubjectTimeseriesin the “See Also” section for an example structure.runs (
int,str,list[int],list[str], orNone, default=None) – Specific run IDs to perform the CAPs analysis with (e.g.runs=[0, 1]orruns=["01", "02"]). If None, all runs will be used.n_clusters (
intorlist[int], default=5) – Number of clusters to use. Can be a single integer or a list of integers (ifcluster_selection_methodis not None).cluster_selection_method ({“elbow”, “davies_bouldin”, “silhouette”, “variance_ratio”} or
None, default=None) – Method to find the optimal number of clusters. Options are “elbow”, “davies_bouldin”, “silhouette”, and “variance_ratio”.random_state (
intorNone, default=None) – Random state (seed) value to use.init ({“k-means++”, “random”},
Callable, or ArrayLike, default=”k-means++”) – Method for choosing initial cluster centroid. Options are “k-means++”, “random”, or callable or array-like of shape (n_clusters, n_features).n_init ({“auto”} or
int, default=”auto”) – Number of times k-means is ran with different initial clusters. The model with lowest inertia from these runs will be selected.max_iter (
int, default=300) – Maximum number of iterations for a single run of k-means.tol (
float, default=1e-4) – Stopping criterion if the change in inertia is below this value, assumingmax_iterhas not been reached.algorithm ({"lloyd", "elkan"}, default="lloyd") – The algorithm to use. Options are “lloyd” and “elkan”.
standardize (
bool, default=True) – Standardizes the columns (ROIs) of the concatenated timeseries data. Uses sample standard deviation (n-1).n_cores (
intorNone, default=None) – Number of cores to use for multiprocessing, with Joblib, to run multiple k-means models ifcluster_selection_methodis not None. The “loky” backend is used.show_figs (
bool, default=False) – Displays the plots for the specifiedcluster_selection_methodfor all groups.output_dir (
strorNone, default=None) – Directory to save plots as png files ifcluster_selection_methodis not None. The directory will be created if it does not exist. If None, plots will not be saved.progress_bar (
bool, default=False) –If True and
cluster_selection_methodis not None, displays a progress bar.Added in version 0.21.5.
**kwargs –
Additional keyword arguments when
cluster_selection_methodis specified:S:
int, default=1 – Adjusts the sensitivity of finding the elbow. Larger values are more conservative and less sensitive to small fluctuations. Passed toKneeLocatorfrom the kneed package.dpi:
int, default=300 – Dots per inch for the figure.figsize:
tuple, default=(8, 6) – Adjusts the size of the plots.bbox_inches:
strorNone, default=”tight” – Alters size of the whitespace in the saved image.step:
int, default=None – An integer value that controls the progression of the x-axis in plots.
See also
- Returns:
self
Note
KMeans Algorithm: Refer to scikit-learn’s Documentation for additional information about the
KMeansalgorithm used in this method.The
n_clusters,random_state,init,n_init,max_iter,tol, andalgorithmparameters are passed tosklearn.cluster.KMeans.