neurocaps.extraction.TimeseriesExtractor.get_bold#

TimeseriesExtractor.get_bold(bids_dir, task, session=None, runs=None, condition=None, condition_tr_shift=0, tr=None, slice_time_ref=0.0, run_subjects=None, exclude_subjects=None, exclude_niftis=None, pipeline_name=None, n_cores=None, parallel_log_config=None, verbose=True, flush=False, progress_bar=False)[source]#

Retrieve Preprocessed BOLD Data from BIDS Datasets.

This function uses PyBIDS for querying and requires the BOLD data directory (specified in bids_dir) to be BIDS-compliant, including a “dataset_description.json” file. It assumes the dataset contains a derivatives folder with BOLD data preprocessed using a standard pipeline, specifically fMRIPrep. The pipeline directory must also include a “dataset_description.json” file for proper querying.

The timeseries data of all subjects are appended to a single dictionary self.subject_timeseries. Additional information regarding the structure of this dictionary can be found in the “Note” section.

This pipeline is most optimized for BOLD data preprocessed by fMRIPrep. Refer to NeuroCAPs’ BIDS Structure and Entities Documentation for additional information on the expected directory structure and file naming scheme (entities) needed for querying.

Parameters:
  • bids_dir (str) – Path to a BIDS compliant directory. A “dataset_description.json” file must be located in this directory or an error will be raised.

  • task (str) – Name of task to extract timeseries data from (i.e “rest”, “n-back”, etc).

  • session (int, str, or None, default=None) – The session ID to extract timeseries data from. Only a single session can be extracted at a time and an error will be raised if more than one session is detected during querying. The value can be an integer (e.g. session=2) or a string (e.g. session="001").

  • runs (int, str, list[int], list[str], or None, default=None) – List of run numbers to extract timeseries data from. Extracts all runs if unspecified. For instance, to extract only “run-0” and “run-1”, use runs=[0, 1]. For non-integer run IDs, use strings: runs=["000", "001"].

  • condition (str or None, default=None) – Isolates the timeseries data corresponding to a specific condition (listed in the “trial_type” column of the “events.tsv” file), only after the timeseries has been extracted and subjected to nuisance regression. Only a single condition can be extracted at a time.

  • condition_tr_shift (int, default=0) –

    Number of TR units to units to offset both the start and end scan indices of a condition. This parameter only applies when a condition is specified. For more details about how this offset affects the calculation of task conditions, see the “Extraction of Task Conditions” section below.

    Added in version 0.20.0.

  • tr (int, float or None, default=None) – Repetition time (TR), in seconds, for the specified task. If not provided, the TR will be automatically extracted from the first BOLD metadata file found for the task, searching first in the pipeline directory, then in the bids_dir if not found.

  • slice_time_ref (int or float, default=0.0) –

    The reference slice expressed as a fraction of the tr that is subtracted from the condition onset times to adjust for slice time correction when condition is not None (onset - slice_time_ref * tr). Values can range from 0 to 1.

    Added in version 0.21.0.

  • run_subjects (list[str] or None, default=None) – List of subject IDs to process (e.g. run_subjects=["01", "02"]). Processes all subjects if None.

  • exclude_subjects (list[str] or None, default=None) – List of subject IDs to exclude (e.g. exclude_subjects=["01", "02"]).

  • exclude_niftis (list[str] or None, default=None) – List of the specific preprocessed NIfTI files to exclude, preventing their timeseries data from being extracted. Used if there are specific runs across different participants that need to be excluded.

  • pipeline_name (str or None, default=None) – The name of the pipeline folder in the derivatives folder containing the preprocessed data. Used if multiple pipeline folders exist in the derivatives folder or the pipeline folder is nested (e.g. “fmriprep/fmriprep-20.0.0”).

  • n_cores (int or None, default=None) – The number of cores to use for multiprocessing with Joblib. The “loky” backend is used.

  • parallel_log_config (dict[str, multiprocessing.Manager.Queue | int]) –

    Passes a user-defined managed queue and logging level to the internal timeseries extraction function when parallel processing (n_cores) is used. Additionally, this parameter must be a dictionary and the available keys are:

    • ”queue”: The instance of multiprocessing.Manager.Queue to pass to QueueHandler. If not specified, all logs will output to sys.stdout.

    • ”level”: The logging level (e.g. logging.INFO, logging.WARNING). If not specified, the default level is logging.INFO.

    Refer to the NeuroCAPs’ Logging Documentation for a detailed example of setting up this parameter.

  • verbose (bool, default=True) – If True, logs detailed subject-specific information including: subjects skipped due to missing required files, current subject being processed for timeseries extraction, confounds identified for nuisance regression in addition to requested confounds that are missing for a subject, and additional warnings encountered during the timeseries extraction process.

  • flush (bool, default=False) – If True, flushes the logged subject-specific information produced during the timeseries extraction process.

  • progress_bar (bool, default=False) –

    If True, displays a progress bar.

    Added in version 0.21.5.

Returns:

self

Raises:

BIDSQueryError – Subject IDs were not found during querying.

Note

Subject Timeseries Dictionary: This method stores the extracted timeseries of all subjects in self.subject_timeseries. The structure is a dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array:

subject_timeseries = {
        "101": {
            "run-0": np.array([timeseries]), # Shape: TRs x ROIs
            "run-1": np.array([timeseries]), # Shape: TRs x ROIs
            "run-2": np.array([timeseries]), # Shape: TRs x ROIs
        },
        "102": {
            "run-0": np.array([timeseries]), # Shape: TRs x ROIs
            "run-1": np.array([timeseries]), # Shape: TRs x ROIs
        }
    }

NifTI Files Without “run-” Entity: By default, “run-0” will be used as a placeholder, if run IDs are not specified in the NifTI file.

Parcellation & Nuisance Regression: For timeseries extraction, nuisance regression, and spatial dimensionality reduction using a parcellation, Nilearn’s NiftiLabelsMasker function is used. If requested, dummy scans are removed from the NIfTI images and confound dataset prior to timeseries extraction. For volumes exceeding a specified framewise displacement (FD) threshold, if the “use_sample_mask” key in the fd_threshold dictionary is set to True, then a boolean sample mask is generated (where False indicates the high motion volumes) and passed to the sample_mask parameter in Nilearn’s NiftiLabelsMasker. If, “use_sample_mask” key is False or not specified in the fd_threshold dictionary, then censoring is done after nuisance regression, which is the default behavior.

Extraction of Task Conditions: The formula used for computing the scan indices corresponding to the corresponding to a specific condition:

adjusted_onset = onset - slice_time_ref * tr
adjusted_onset = adjusted_onset if adjusted_onset >= 0 else 0
start_scan = int(adjusted_onset / tr) + condition_tr_shift
end_scan = math.ceil((adjusted_onset + duration) / tr) + condition_tr_shift
scans.extend(list(range(onset_scan, end_scan)))
scans = sorted(list(set(scans)))

When partial scans are computed, int is used to round down for the beginning scan index and math.ceil is used to round up for the ending scan index. Negative scan indices are set to 0 to avoid unintentional negative indexing. For simplicity, note that when slice_time_ref and condition_tr_shift are 0, the formula simplifies to:

start_scan = int(onset / tr)
end_scan = math.ceil((onset + duration) / tr)
scans.extend(list(range(onset_scan, end_scan)))
scans = sorted(list(set(scans)))

Filtering a specific condition from the timeseries is done after nuisance regression and the indices are used to extract the TRs corresponding to the condition from the timeseries. Additionally, if the “use_sample_mask” key in the fd_threshold dictionary is set to True, then the truncated 2D timeseries is temporarily padded to ensure the correct rows corresponding to the condition are obtained.