neurocaps.extraction.TimeseriesExtractor#

class TimeseriesExtractor(space='MNI152NLin2009cAsym', parcel_approach={'Schaefer': {'n_rois': 400, 'resolution_mm': 1, 'yeo_networks': 7}}, standardize='zscore_sample', detrend=True, low_pass=None, high_pass=None, fwhm=None, use_confounds=True, confound_names='basic', fd_threshold=None, n_acompcor_separate=None, dummy_scans=None, dtype=None)[source]#

Timeseries Extractor.

Performs timeseries denoising, extraction, serialization (pickling), and BOLD visualization.

Parameters:
  • space (str, default="MNI152NLin2009cAsym") -- The standard template space that the preprocessed bold data is registered to. Used for querying with pybids to locate preprocessed BOLD-related files.

  • parcel_approach (dict or os.PathLike, default={"Schaefer": {"n_rois": 400, "yeo_networks": 7, "resolution_mm": 1}}) --

    The approach used to parcellate NifTI images into distinct regions-of-interests (ROIs).

    To initialize a parcel_approach, the configuration requires a nested dictionary with:

    • First Level Key: The parcellation name ("Schaefer", "AAL", or "Custom").

    • Second Level Keys: Parameters specific to each parcellation method.

    Supported parcellation approaches and their parameters, includes:

    • "Schaefer":

      • "n_rois": Number of ROIs (100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000). Defaults to 400.

      • "yeo_networks": Number of Yeo networks (7 or 17). Defaults to 7.

      • "resolution_mm": Spatial resolution in millimeters (1 or 2). Defaults to 1.

    • "AAL":

      • "version": AAL parcellation version to use ("SPM5", "SPM8", "SPM12", or "3v2"). Defaults to "SPM12" if {"AAL": {}} is given.

    • "Custom" (user-defined):

      • "maps": Directory path to the location of the parcellation file.

      • "nodes": A list of node names in the order of the label IDs in the parcellation.

      • "regions": The regions or networks in the parcellation.

    Notes:

  • standardize ({"zscore_sample", "zscore", "psc", True, False}, default="zscore_sample") --

    Standardizes the timeseries.

    Note: Refer to nilearn.maskers.NiftiLabelsMasker for an explanation of each available option.

  • detrend (bool, default=True) -- Detrends the timeseries.

  • low_pass (float, int, or None, default=None) -- Filters out signals above the specified cutoff frequency.

  • high_pass (float, int, or None`, default=None) -- Filters out signals below the specified cutoff frequency.

  • fwhm (float, int, or None, default=None) -- Applies spatial smoothing to data (in millimeters).

  • use_confounds (bool, default=True) --

    If True, performs nuisance regression during timeseries extraction using the default or user-specified confounds in confound_names.

    Note: requires that confound tsv files to be in same directory as preprocessed BOLD images.

  • confound_names ({"basic"}, list[str], or None, default="basic") --

    Names of confounds extracted from the confound tsv files if use_confounds=True.

    If "basic", the following confounds are used by default:

    • All cosine-basis parameters.

    • Six head-motion parameters and their first-order derivatives.

    • First six combined aCompcor components.

    Notes:

    • Confound names follow fMRIPrep's naming scheme (versions >= 1.2.0).

    • Wildcards are supported: e.g., "cosine*" matches all confounds starting with "cosine".

    Changed in version 0.23.0: Changed default from None to "basic". The "basic" option provides the same functionality that None did in previous versions.

  • fd_threshold (float, dict[str, float], or None, default=None) --

    Threshold for volume censoring based on framewise displacement (FD).

    • If float, removes volumes where FD > threshold.

    • If dict, the following sub-keys are available:

      • "threshold": A float (Default=None). Removes volumes where FD > threshold.

      • "outlier_percentage": A float in interval [0,1] (Default=None). Removes entire runs where proportion of censored volumes exceeds this threshold. Proportion calculated after dummy scan removal. Issues warning when runs are flagged. If condition specified in self.get_bold, only considers volumes associated with the condition.

      • "n_before": An integer indicating the number of volumes to remove before each flagged volume (Default=None). For instance, if volume 5 flagged and {"auto": True, "n_before": 2}, then volumes 3, 4, and 5 are discarded.

      • "n_after": An integer indicating the of volumes to remove after each flagged volume (Default=False). For instance, if volume 5 flagged and {"auto": True, "n_after": 2}, then volumes 5, 6, and 7 are discarded.

      • "use_sample_mask": A boolean (Default=False). If True, censors before nuisance regression using Nilearn's NiftiLabelsMasker. Also, sets clean__extrapolate=False to prevent interpolation of end volumes. If False, censors after nuisance regression.

      • "interpolate": A boolean (Default=None). If True, uses scipy's CubicSpline function with extrapolate=False to perform cubic spline interpolation on censored frames that are not at the ends of the timeseries. For example, given a censor_mask=[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0] where "0" indicates censored volumes, only the volumes at index 3, 5, 6, and 8 would be interpolated. When False or None (default behavior), no interpolation is performed and all censored frames are discarded.

        Added in version 0.22.3: "interpolate" key added.

    Notes:

    • A column named "framewise_displacement" must be available in the confounds file.

    • use_confounds must be set to True.

    • Do not specify "framewise_displacement" in confound_names.

    • See Nilearn's documentation for details on censored volume handling:

    • When {"use_sample_mask": False} and standardize=True, applying an additional within-run standardization (using neurocaps.analysis.standardize) is recommended after outlier removal.

    • If {"interpolation: True}, then interpolation is only applied nuisance regression and parcellation steps have been completed. It is also applied prior to the condition being extracted from the timeseries.

    • See Scipy's documentation on their CubicSpline function.

  • n_acompcor_separate (int or None, default=None) --

    Number of aCompCor components to extract separately from the white-matter (WM) and CSF masks. Uses first "n" components from each mask separately. For instance, if n_acompcor_separate=5, then the the first 5 WM components and the first 5 CSF components (totaling 10 components) are regressed out.

    Notes: - use_confounds must be set to True. - If specified, this parameter overrides any aCompCor components listed in confound_names.

  • dummy_scans (int, dict[str, Union[bool, int]], or None, default=None) --

    Number of initial volumes to remove before timeseries extraction.

    • If int, removes first "n" volumes.

    • If dict, the following keys are supported:

      • "auto": A boolean (Default=None). If True, Automatically determines dummy scans from fMRIPrep confounds file by counting the number of "non_steady_state_outlier_XX" columns in confounds.tsv file. For instance, if two columns are found,then the first two columns are removed.

      • "min": An integer (Default=None). Minimum volumes to remove when auto is set to True. If "auto" finds 2 outliers but {"min": 3}, removes 3 volumes.

      • "max": An integer (Default=None). Maximum volumes to remove when auto=True. If "auto" finds 6 outliers but {"max": 5}, removes 5 volumes.

    Note: "min" and "max" keys only apply when "auto" is True.

  • dtype (str or "auto", default=None) -- The NumPy dtype the NIfTI images are converted to when passed to Nilearn's load_img function.

Properties#

space: str

The standard template space that the preprocessed BOLD data is registered to. The space can also be set after class initialization using self.space = "New Space" if the template space needs to be changed.

parcel_approach: dict

A dictionary containing information about the parcellation. Can also be used as a setter, which accepts a dictionary or a dictionary saved as pickle file. If "Schaefer" or "AAL" was specified during initialization of the TimeseriesExtractor class, then nilearn.datasets.fetch_atlas_schaefer_2018 and nilearn.datasets.fetch_atlas_aal will be used to obtain the "maps" and the "nodes". Then string splitting is used on the "nodes" to obtain the "regions":

# Structure of Schaefer
{
    "Schaefer":
    {
        "maps": "path/to/parcellation.nii.gz",
        "nodes": ["LH_Vis1", "LH_SomSot1", "RH_Vis1", "RH_Somsot1"],
        "regions": ["Vis", "SomSot"]
    }
}

# Structure of AAL
{
    "AAL":
    {
        "maps": "path/to/parcellation.nii.gz",
        "nodes": ["Precentral_L", "Precentral_R", "Frontal_Sup_L", "Frontal_Sup_R"],
        "regions": ["Precentral", "Frontal"]
    }
}

Refer to the example for "Custom" in the Note section below for the expected structure.

signal_clean_info: dict[str, Union[bool, int, float, str]] or None

Dictionary containing parameters for signal cleaning specified during initialization of the TimeseriesExtractor class. This information includes standardize, detrend, low_pass, high_pass, fwhm, dummy_scans, use_confounds, n_compcor_separate, and fd_threshold.

task_info: dict[str, Union[str, int]] or None

If self.get_bold() ran, is a dictionary containing all task-related information such as task, condition, session, runs, and tr (if specified) else None.

subject_ids: list[str] or None

A list containing all subject IDs that have retrieved from pybids and subjected to timeseries extraction.

n_cores: int or None

Number of cores used for multiprocessing with Joblib.

subject_timeseries: dict[str, dict[str, np.ndarray]] or None

A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. If this property needs to be deleted due to memory issues, del self.subject_timeseries can be used to delete this property and only have it return None. The structure is as follows:

subject_timeseries = {
        "101": {
            "run-0": np.array([...]), # Shape: TRs x ROIs
            "run-1": np.array([...]), # Shape: TRs x ROIs
            "run-2": np.array([...]), # Shape: TRs x ROIs
        },
        "102": {
            "run-0": np.array([...]), # Shape: TRs x ROIs
            "run-1": np.array([...]), # Shape: TRs x ROIs
        }
    }

Note

Passed Parameters: standardize, detrend, low_pass, high_pass, fwhm, and nuisance regression (confound_names) uses nilearn.maskers.NiftiLabelsMasker. The dtype parameter is used by nilearn.image.load_img. For framewise displacement, if the "use_sample_mask" key is set to True in the fd_threshold dictionary, then a boolean sample mask is generated (setting indices corresponding to high motion volumes as False) and is passed to the sample_mask parameter in nilearn.maskers.NiftiLabelsMasker.

Custom Parcellations: If using a "Custom" parcellation approach, ensure that the parcellation is lateralized (where each region/network has nodes in the left and right hemisphere). This is due to certain visualization functions assuming that each region consists of left and right hemisphere nodes. Additionally, certain visualization functions in this class also assume that the background label is 0. Therefore, do not add a background label in the "nodes" or "regions" keys.

The recognized sub-keys for the "Custom" parcellation approach includes:

  • "maps": Directory path containing the parcellation in a supported format (e.g., .nii or .nii.gz for NifTI).

  • "nodes": A list or numpy array of all node labels arranged in ascending order based on their numerical IDs from the parcellation. The 0th index should contain the label corresponding to the lowest, non-background numerical ID.

  • "regions": A dictionary defining major brain regions or networks, with each region containing "lh" (left hemisphere) and "rh" (right hemisphere) sub-keys listing node indices.

Refer to the neurocaps Parcellation Documentation for more detailed explanations and example structures for the "nodes" and "regions" sub-keys.

Note: Different sub-keys are required depending on the function used. Refer to the Note section under each function for information regarding the sub-keys required for that specific function.

Methods

get_bold(bids_dir, task[, session, runs, ...])

Retrieve Preprocessed BOLD Data from BIDS Datasets.

timeseries_to_pickle(output_dir[, filename])

Save the Extracted Subject Timeseries.

visualize_bold(subj_id, run[, roi_indx, ...])

Plot the Extracted Subject Timeseries.

get_bold(bids_dir, task, session=None, runs=None, condition=None, condition_tr_shift=0, tr=None, slice_time_ref=0.0, run_subjects=None, exclude_subjects=None, exclude_niftis=None, pipeline_name=None, n_cores=None, parallel_log_config=None, verbose=True, flush=False, progress_bar=False)[source]#

Retrieve Preprocessed BOLD Data from BIDS Datasets.

This function uses pybids for querying and requires the BOLD data directory (specified in bids_dir) to be BIDS-compliant, including a "dataset_description.json" file. It assumes the dataset contains a derivatives folder with BOLD data preprocessed using a standard pipeline, specifically fMRIPrep. The pipeline directory must also include a "dataset_description.json" file for proper querying.

The timeseries data of all subjects are appended to a single dictionary self.subject_timeseries. Additional information regarding the structure of this dictionary can be found in the "Note" section.

This pipeline is most optimized for BOLD data preprocessed by fMRIPrep. Refer to neurocaps' BIDS Structure and Entities Documentation for additional information on the expected directory structure and file naming scheme (entities) needed for querying.

Parameters:
  • bids_dir (os.PathLike) -- Path to a BIDS compliant directory. A "dataset_description.json" file must be located in this directory or an error will be raised.

  • task (str) -- Name of task to extract timeseries data from (i.e "rest", "n-back", etc).

  • session (int, str, or None, default=None) -- The session ID to extract timeseries data from. Only a single session can be extracted at a time and an error will be raised if more than one session is detected during querying. The value can be an integer (e.g. session=2) or a string (e.g. session="001").

  • runs (int, str, list[int], list[str], or None, default=None) -- List of run numbers to extract timeseries data from. Extracts all runs if unspecified. For instance, to extract only "run-0" and "run-1", use runs=[0, 1]. For non-integer run IDs, use strings: runs=["000", "001"].

  • condition (str or None, default=None) -- Isolates the timeseries data corresponding to a specific condition (listed in the "trial_type" column of the "events.tsv" file), only after the timeseries has been extracted and subjected to nuisance regression. Only a single condition can be extracted at a time.

  • condition_tr_shift (int, default=0) --

    Number of TR units to units to offset both the start and end scan indices of a condition. This parameter only applies when a condition is specified. For more details about how this offset affects the calculation of task conditions, see the "Extraction of Task Conditions" section below.

    Added in version 0.20.0.

  • tr (int, float or None, default=None) -- Repetition time (TR), in seconds, for the specified task. If not provided, the TR will be automatically extracted from the first BOLD metadata file found for the task, searching first in the pipeline directory, then in the bids_dir if not found.

  • slice_time_ref (int or float, default=0.0) --

    The reference slice expressed as a fraction of the tr that is subtracted from the condition onset times to adjust for slice time correction when condition is not None (onset - slice_time_ref * tr). Values can range from 0 to 1.

    Added in version 0.21.0.

  • run_subjects (list[str] or None, default=None) -- List of subject IDs to process (e.g. run_subjects=["01", "02"]). Processes all subjects if None.

  • exclude_subjects (list[str] or None, default=None) -- List of subject IDs to exclude (e.g. exclude_subjects=["01", "02"]).

  • exclude_niftis (list[str] or None, default=None) -- List of the specific preprocessed NIfTI files to exclude, preventing their timeseries data from being extracted. Used if there are specific runs across different participants that need to be excluded.

  • pipeline_name (str or None, default=None) -- The name of the pipeline folder in the derivatives folder containing the preprocessed data. Used if multiple pipeline folders exist in the derivatives folder or the pipeline folder is nested (e.g. "fmriprep/fmriprep-20.0.0").

  • n_cores (int or None, default=None) -- The number of cores to use for multiprocessing with Joblib. The "loky" backend is used.

  • parallel_log_config (dict[str, Union[multiprocessing.Manager.Queue, int]]) --

    Passes a user-defined managed queue and logging level to the internal timeseries extraction function when parallel processing (n_cores) is used. Additionally, this parameter must be a dictionary and the available keys are:

    • "queue": The instance of multiprocessing.Manager.Queue to pass to QueueHandler. If not specified, all logs will output to sys.stdout.

    • "level": The logging level (e.g. logging.INFO, logging.WARNING). If not specified, the default level is logging.INFO.

    Refer to the neurocaps' Logging Documentation for a detailed example of setting up this parameter.

  • verbose (bool, default=True) -- If True, logs detailed subject-specific information including: subjects skipped due to missing required files, current subject being processed for timeseries extraction, confounds identified for nuisance regression in addition to requested confounds that are missing for a subject, and additional warnings encountered during the timeseries extraction process.

  • flush (bool, default=False) -- If True, flushes the logged subject-specific information produced during the timeseries extraction process.

  • progress_bar (bool, default=False) --

    If True, displays a progress bar.

    Added in version 0.21.5.

Returns:

self

Raises:

BIDSQueryError -- Subject IDs were not found during querying.

Note

Subject Timeseries Dictionary: This method stores the extracted timeseries of all subjects in self.subject_timeseries. The structure is a dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array:

subject_timeseries = {
        "101": {
            "run-0": np.array([timeseries]), # Shape: TRs x ROIs
            "run-1": np.array([timeseries]), # Shape: TRs x ROIs
            "run-2": np.array([timeseries]), # Shape: TRs x ROIs
        },
        "102": {
            "run-0": np.array([timeseries]), # Shape: TRs x ROIs
            "run-1": np.array([timeseries]), # Shape: TRs x ROIs
        }
    }

NifTI Files Without "run-" Entity: By default, "run-0" will be used as a placeholder, if run IDs are not specified in the NifTI file.

Parcellation & Nuisance Regression: For timeseries extraction, nuisance regression, and spatial dimensionality reduction using a parcellation, Nilearn's NiftiLabelsMasker function is used. If requested, dummy scans are removed from the NIfTI images and confound dataset prior to timeseries extraction. For volumes exceeding a specified framewise displacement (FD) threshold, if the "use_sample_mask" key in the fd_threshold dictionary is set to True, then a boolean sample mask is generated (where False indicates the high motion volumes) and passed to the sample_mask parameter in Nilearn's NiftiLabelsMasker. If, "use_sample_mask" key is False or not specified in the fd_threshold dictionary, then censoring is done after nuisance regression, which is the default behavior.

Extraction of Task Conditions: The formula used for computing the scan indices corresponding to the corresponding to a specific condition:

adjusted_onset = onset - slice_time_ref * tr
adjusted_onset = adjusted_onset if adjusted_onset >= 0 else 0
start_scan = int(adjusted_onset / tr) + condition_tr_shift
end_scan = math.ceil((adjusted_onset + duration) / tr) + condition_tr_shift
scans.extend(list(range(onset_scan, end_scan)))
scans = sorted(list(set(scans)))

When partial scans are computed, int is used to round down for the beginning scan index and math.ceil is used to round up for the ending scan index. Negative scan indices are set to 0 to avoid unintentional negative indexing. For simplicity, note that when slice_time_ref and condition_tr_shift are 0, the formula simplifies to:

start_scan = int(onset / tr)
end_scan = math.ceil((onset + duration) / tr)
scans.extend(list(range(onset_scan, end_scan)))
scans = sorted(list(set(scans)))

Filtering a specific condition from the timeseries is done after nuisance regression and the indices are used to extract the TRs corresponding to the condition from the timeseries. Additionally, if the "use_sample_mask" key in the fd_threshold dictionary is set to True, then the truncated 2D timeseries is temporarily padded to ensure the correct rows corresponding to the condition are obtained.

timeseries_to_pickle(output_dir, filename=None)[source]#

Save the Extracted Subject Timeseries.

Saves the extracted timeseries stored in the self.subject_timeseries dictionary (obtained from running self.get_bold) as a pickle file. This allows for data persistence and easy conversion back into dictionary form for later use.

Parameters:
  • output_dir (os.PathLike) -- Directory to save self.subject_timeseries dictionary as a pickle file. The directory will be created if it does not exist.

  • filename (str or None, default=None) -- Name of the file with or without the "pkl" extension.

Returns:

self

visualize_bold(subj_id, run, roi_indx=None, region=None, show_figs=True, output_dir=None, filename=None, **kwargs)[source]#

Plot the Extracted Subject Timeseries.

Uses the self.subject_timeseries to visualize the extracted BOLD timeseries data of data Regions of Interest (ROIs) or regions for a specific subject and run.

Parameters:
  • subj_id (str or int) -- The ID of the subject.

  • run (int or str) -- The run ID of the subject to plot.

  • roi_indx (int, str, list[int], list[int] or None, default=None) -- The indices of the parcellation nodes to plot. See "nodes" in self.parcel_approach for valid nodes.

  • region (str or None, default=None) -- The region of the parcellation to plot. If not None, all nodes in the specified region will be averaged then plotted. See "regions" in self.parcel_approach for valid region.

  • show_figs (bool, default=True) -- Display figures.

  • output_dir (os.PathLike or None, default=None) -- Directory to save plot as png image. The directory will be created if it does not exist. If None, plot will not be saved.

  • filename (str or None, default=None) -- Name of the file without the extension.

  • **kwargs --

    Keyword arguments used when saving figures. Valid keywords include:

    • dpi: int, default=300 -- Dots per inch for the figure.

    • figsize: tuple, default=(11, 5) -- Size of the figure in inches.

    • bbox_inches: str or None, default="tight" -- Alters size of the whitespace in the saved image.

Returns:

self

Note

Parcellation Approach: the "nodes" and "regions" sub-keys are required in parcel_approach.