CAP.return_cap_labels#

CAP.return_cap_labels(subject_timeseries, runs=None, continuous_runs=False, shift_labels=False)[source]#

Return CAP Labels for Each Subject.

Uses the group-specific k-means models in self.kmeans to assign each frames (TR) to CAPs for each subject in self.subject_table.

The process involves the following steps:

  1. Retrieve the timeseries for a specific subject’s run from subject_timeseries.

  2. Determine their group assignment using self.subject_table and scale their timeseries data (if standardize was set to True in self.get_caps()) using the means and standard deviation derived from the group specific concatenated dataframes (self.means and self.stdev).

    Note

    This scaling ensures the subject’s data matches the distribution of the input data used for group-specific clustering, which is needed for accurate predictions when using group-specific k-means models.

  3. Use group-specific k-means model (self.kmeans) and the predict() function from scikit-learn’s KMeans to assign each frame (TR).

  4. If shift_labels is True, apply a one unit shift for the minimum label to start at “1” instead of “0”.

  5. Repeat 1-4 to the remaining runs (all if runs is None or specific runs) for the subject.

  6. If continuous_runs is True, then stack each numpy array horizontally to create a single array containing the predicted labels for a subject.

  7. Repeat 1-6 for the remaining subjects.

Parameters:
  • subject_timeseries (SubjectTimeseries or str) – A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a serialized file containing this same structure. Refer to documentation for SubjectTimeseries in the “See Also” section for an example structure.

  • runs (int, str, list[int], list[str], or None, default=None) – The run IDs to return CAP labels for (e.g. runs=[0, 1] or runs=["01", "02"]). If None, CAP labels will be returned for all detected run IDs even if only specific runs were used during self.get_caps().

  • continuous_runs (bool, default=False) –

    If True, all runs will be treated as a single, uninterrupted run.

    # CAP assignment of frames from for run_1 and run_2
    run_1 = [0, 1, 1]
    run_2 = [2, 3, 3]
    
    # Computation of each CAP metric will be conducted on the combined vector
    continuous_runs = [0, 1, 1, 2, 3, 3]
    

    Note

    • This parameter can be used together with runs to filter the runs to combine.

    • The run-ID for each subject in the dictionary will be converted to run-continuous to denote that runs were combined.

    • If only a single run available for a subject, the original run ID (as opposed to “run-continuous”) will be used.

  • shift_labels (bool, default=False) –

    If True, shifts each label by up one unit for the minimum CAP label to start at “1” as opposed to “0” (scikit-learn’s default), if preferred.

    predicted_labels = [0, 2, 5]
    # Add plus one shift
    predicted_labels = [1, 3, 6]
    

See also

neurocaps.typing.SubjectTimeseries

Type definition for the subject timeseries dictionary structure. (See: SubjectTimeseries Documentation)

Returns:

dict[str, dict[str, np.ndarray]] – Dictionary mapping each subject to their run IDs and a 1D numpy array containing the predicted CAP for each frame (TR).