neurocaps.extraction.TimeseriesExtractor#
- class TimeseriesExtractor(space='MNI152NLin2009cAsym', parcel_approach={'Schaefer': {'n_rois': 400, 'resolution_mm': 1, 'yeo_networks': 7}}, standardize='zscore_sample', detrend=True, low_pass=None, high_pass=None, fwhm=None, use_confounds=True, confound_names='basic', fd_threshold=None, n_acompcor_separate=None, dummy_scans=None, dtype=None)[source]#
Timeseries Extractor.
Performs timeseries denoising, extraction, serialization (pickling), and BOLD visualization.
- Parameters:
space (
str, default="MNI152NLin2009cAsym") -- The standard template space that the preprocessed bold data is registered to. Used for querying with pybids to locate preprocessed BOLD-related files.parcel_approach (
dictoros.PathLike, default={"Schaefer": {"n_rois": 400, "yeo_networks": 7, "resolution_mm": 1}}) --The approach used to parcellate NifTI images into distinct regions-of-interests (ROIs).
To initialize a
parcel_approach, the configuration requires a nested dictionary with:First Level Key: The parcellation name ("Schaefer", "AAL", or "Custom").
Second Level Keys: Parameters specific to each parcellation method.
Supported parcellation approaches and their parameters, includes:
"Schaefer":
"n_rois": Number of ROIs (100, 200, 300, 400, 500, 600, 700, 800, 900, or 1000). Defaults to 400.
"yeo_networks": Number of Yeo networks (7 or 17). Defaults to 7.
"resolution_mm": Spatial resolution in millimeters (1 or 2). Defaults to 1.
"AAL":
"version": AAL parcellation version to use ("SPM5", "SPM8", "SPM12", or "3v2"). Defaults to "SPM12" if
{"AAL": {}}is given.
"Custom" (user-defined):
"maps": Directory path to the location of the parcellation file.
"nodes": A list of node names in the order of the label IDs in the parcellation.
"regions": The regions or networks in the parcellation.
Notes:
Input can also be a pickle file containing a processed parcel approach with required keys ("maps", "nodes", and "regions").
For detailed parameter information, see:
Custom: See Notes section below for structure requirements.
standardize ({"zscore_sample", "zscore", "psc", True, False}, default="zscore_sample") --
Standardizes the timeseries.
Note: Refer to nilearn.maskers.NiftiLabelsMasker for an explanation of each available option.
detrend (
bool, default=True) -- Detrends the timeseries.low_pass (
float,int, orNone, default=None) -- Filters out signals above the specified cutoff frequency.high_pass (
float,int, orNone`, default=None) -- Filters out signals below the specified cutoff frequency.fwhm (
float,int, orNone, default=None) -- Applies spatial smoothing to data (in millimeters).use_confounds (
bool, default=True) --If True, performs nuisance regression during timeseries extraction using the default or user-specified confounds in
confound_names.Note: requires that confound tsv files to be in same directory as preprocessed BOLD images.
confound_names ({"basic"},
list[str], orNone, default="basic") --Names of confounds extracted from the confound tsv files if
use_confounds=True.If "basic", the following confounds are used by default:
All cosine-basis parameters.
Six head-motion parameters and their first-order derivatives.
First six combined aCompcor components.
Notes:
Confound names follow fMRIPrep's naming scheme (versions >= 1.2.0).
Wildcards are supported: e.g., "cosine*" matches all confounds starting with "cosine".
Changed in version 0.23.0: Changed default from
Noneto"basic". The"basic"option provides the same functionality thatNonedid in previous versions.fd_threshold (
float,dict[str, float], orNone, default=None) --Threshold for volume censoring based on framewise displacement (FD).
If float, removes volumes where FD > threshold.
If dict, the following sub-keys are available:
"threshold": A float (Default=None). Removes volumes where FD > threshold.
"outlier_percentage": A float in interval [0,1] (Default=None). Removes entire runs where proportion of censored volumes exceeds this threshold. Proportion calculated after dummy scan removal. Issues warning when runs are flagged. If
conditionspecified inself.get_bold, only considers volumes associated with the condition."n_before": An integer indicating the number of volumes to remove before each flagged volume (Default=None). For instance, if volume 5 flagged and
{"auto": True, "n_before": 2}, then volumes 3, 4, and 5 are discarded."n_after": An integer indicating the of volumes to remove after each flagged volume (Default=False). For instance, if volume 5 flagged and
{"auto": True, "n_after": 2}, then volumes 5, 6, and 7 are discarded."use_sample_mask": A boolean (Default=False). If True, censors before nuisance regression using Nilearn's
NiftiLabelsMasker. Also, setsclean__extrapolate=Falseto prevent interpolation of end volumes. If False, censors after nuisance regression."interpolate": A boolean (Default=None). If True, uses scipy's
CubicSplinefunction withextrapolate=Falseto perform cubic spline interpolation on censored frames that are not at the ends of the timeseries. For example, given acensor_mask=[0, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0]where "0" indicates censored volumes, only the volumes at index 3, 5, 6, and 8 would be interpolated. When False or None (default behavior), no interpolation is performed and all censored frames are discarded.Added in version 0.22.3: "interpolate" key added.
Notes:
A column named "framewise_displacement" must be available in the confounds file.
use_confoundsmust be set to True.Do not specify "framewise_displacement" in
confound_names.See Nilearn's documentation for details on censored volume handling:
When
{"use_sample_mask": False}andstandardize=True, applying an additional within-run standardization (usingneurocaps.analysis.standardize) is recommended after outlier removal.If
{"interpolation: True}, then interpolation is only applied nuisance regression and parcellation steps have been completed. It is also applied prior to the condition being extracted from the timeseries.See Scipy's documentation on their CubicSpline function.
n_acompcor_separate (
intorNone, default=None) --Number of aCompCor components to extract separately from the white-matter (WM) and CSF masks. Uses first "n" components from each mask separately. For instance, if
n_acompcor_separate=5, then the the first 5 WM components and the first 5 CSF components (totaling 10 components) are regressed out.Notes: -
use_confoundsmust be set to True. - If specified, this parameter overrides any aCompCor components listed inconfound_names.dummy_scans (
int,dict[str, Union[bool, int]], orNone, default=None) --Number of initial volumes to remove before timeseries extraction.
If int, removes first "n" volumes.
If dict, the following keys are supported:
"auto": A boolean (Default=None). If True, Automatically determines dummy scans from fMRIPrep confounds file by counting the number of "non_steady_state_outlier_XX" columns in confounds.tsv file. For instance, if two columns are found,then the first two columns are removed.
"min": An integer (Default=None). Minimum volumes to remove when auto is set to True. If "auto" finds 2 outliers but
{"min": 3}, removes 3 volumes."max": An integer (Default=None). Maximum volumes to remove when auto=True. If "auto" finds 6 outliers but
{"max": 5}, removes 5 volumes.
Note: "min" and "max" keys only apply when "auto" is True.
dtype (
stror "auto", default=None) -- The NumPy dtype the NIfTI images are converted to when passed to Nilearn'sload_imgfunction.
Properties#
- space:
str The standard template space that the preprocessed BOLD data is registered to. The space can also be set after class initialization using
self.space = "New Space"if the template space needs to be changed.- parcel_approach:
dict A dictionary containing information about the parcellation. Can also be used as a setter, which accepts a dictionary or a dictionary saved as pickle file. If "Schaefer" or "AAL" was specified during initialization of the
TimeseriesExtractorclass, thennilearn.datasets.fetch_atlas_schaefer_2018andnilearn.datasets.fetch_atlas_aalwill be used to obtain the "maps" and the "nodes". Then string splitting is used on the "nodes" to obtain the "regions":# Structure of Schaefer { "Schaefer": { "maps": "path/to/parcellation.nii.gz", "nodes": ["LH_Vis1", "LH_SomSot1", "RH_Vis1", "RH_Somsot1"], "regions": ["Vis", "SomSot"] } } # Structure of AAL { "AAL": { "maps": "path/to/parcellation.nii.gz", "nodes": ["Precentral_L", "Precentral_R", "Frontal_Sup_L", "Frontal_Sup_R"], "regions": ["Precentral", "Frontal"] } }
Refer to the example for "Custom" in the Note section below for the expected structure.
- signal_clean_info:
dict[str, Union[bool, int, float, str]]orNone Dictionary containing parameters for signal cleaning specified during initialization of the
TimeseriesExtractorclass. This information includesstandardize,detrend,low_pass,high_pass,fwhm,dummy_scans,use_confounds,n_compcor_separate, andfd_threshold.- task_info:
dict[str, Union[str, int]]orNone If
self.get_bold()ran, is a dictionary containing all task-related information such astask,condition,session,runs, andtr(if specified) else None.- subject_ids:
list[str]orNone A list containing all subject IDs that have retrieved from pybids and subjected to timeseries extraction.
- n_cores:
intorNone Number of cores used for multiprocessing with Joblib.
- subject_timeseries:
dict[str, dict[str, np.ndarray]]orNone A dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array. Can also be a path to a pickle file containing this same structure. If this property needs to be deleted due to memory issues,
del self.subject_timeseriescan be used to delete this property and only have it return None. The structure is as follows:subject_timeseries = { "101": { "run-0": np.array([...]), # Shape: TRs x ROIs "run-1": np.array([...]), # Shape: TRs x ROIs "run-2": np.array([...]), # Shape: TRs x ROIs }, "102": { "run-0": np.array([...]), # Shape: TRs x ROIs "run-1": np.array([...]), # Shape: TRs x ROIs } }
Note
Passed Parameters:
standardize,detrend,low_pass,high_pass,fwhm, and nuisance regression (confound_names) usesnilearn.maskers.NiftiLabelsMasker. Thedtypeparameter is used bynilearn.image.load_img. For framewise displacement, if the "use_sample_mask" key is set to True in thefd_thresholddictionary, then a boolean sample mask is generated (setting indices corresponding to high motion volumes as False) and is passed to thesample_maskparameter innilearn.maskers.NiftiLabelsMasker.Custom Parcellations: If using a "Custom" parcellation approach, ensure that the parcellation is lateralized (where each region/network has nodes in the left and right hemisphere). This is due to certain visualization functions assuming that each region consists of left and right hemisphere nodes. Additionally, certain visualization functions in this class also assume that the background label is 0. Therefore, do not add a background label in the "nodes" or "regions" keys.
The recognized sub-keys for the "Custom" parcellation approach includes:
"maps": Directory path containing the parcellation in a supported format (e.g., .nii or .nii.gz for NifTI).
"nodes": A list or numpy array of all node labels arranged in ascending order based on their numerical IDs from the parcellation. The 0th index should contain the label corresponding to the lowest, non-background numerical ID.
"regions": A dictionary defining major brain regions or networks, with each region containing "lh" (left hemisphere) and "rh" (right hemisphere) sub-keys listing node indices.
Refer to the neurocaps Parcellation Documentation for more detailed explanations and example structures for the "nodes" and "regions" sub-keys.
Note: Different sub-keys are required depending on the function used. Refer to the Note section under each function for information regarding the sub-keys required for that specific function.
Methods
get_bold(bids_dir, task[, session, runs, ...])Retrieve Preprocessed BOLD Data from BIDS Datasets.
timeseries_to_pickle(output_dir[, filename])Save the Extracted Subject Timeseries.
visualize_bold(subj_id, run[, roi_indx, ...])Plot the Extracted Subject Timeseries.
- get_bold(bids_dir, task, session=None, runs=None, condition=None, condition_tr_shift=0, tr=None, slice_time_ref=0.0, run_subjects=None, exclude_subjects=None, exclude_niftis=None, pipeline_name=None, n_cores=None, parallel_log_config=None, verbose=True, flush=False, progress_bar=False)[source]#
Retrieve Preprocessed BOLD Data from BIDS Datasets.
This function uses pybids for querying and requires the BOLD data directory (specified in
bids_dir) to be BIDS-compliant, including a "dataset_description.json" file. It assumes the dataset contains a derivatives folder with BOLD data preprocessed using a standard pipeline, specifically fMRIPrep. The pipeline directory must also include a "dataset_description.json" file for proper querying.The timeseries data of all subjects are appended to a single dictionary
self.subject_timeseries. Additional information regarding the structure of this dictionary can be found in the "Note" section.This pipeline is most optimized for BOLD data preprocessed by fMRIPrep. Refer to neurocaps' BIDS Structure and Entities Documentation for additional information on the expected directory structure and file naming scheme (entities) needed for querying.
- Parameters:
bids_dir (
os.PathLike) -- Path to a BIDS compliant directory. A "dataset_description.json" file must be located in this directory or an error will be raised.task (
str) -- Name of task to extract timeseries data from (i.e "rest", "n-back", etc).session (
int,str, orNone, default=None) -- The session ID to extract timeseries data from. Only a single session can be extracted at a time and an error will be raised if more than one session is detected during querying. The value can be an integer (e.g.session=2) or a string (e.g.session="001").runs (
int,str,list[int],list[str], orNone, default=None) -- List of run numbers to extract timeseries data from. Extracts all runs if unspecified. For instance, to extract only "run-0" and "run-1", useruns=[0, 1]. For non-integer run IDs, use strings:runs=["000", "001"].condition (
strorNone, default=None) -- Isolates the timeseries data corresponding to a specific condition (listed in the "trial_type" column of the "events.tsv" file), only after the timeseries has been extracted and subjected to nuisance regression. Only a single condition can be extracted at a time.condition_tr_shift (
int, default=0) --Number of TR units to units to offset both the start and end scan indices of a condition. This parameter only applies when a
conditionis specified. For more details about how this offset affects the calculation of task conditions, see the "Extraction of Task Conditions" section below.Added in version 0.20.0.
tr (
int,floatorNone, default=None) -- Repetition time (TR), in seconds, for the specified task. If not provided, the TR will be automatically extracted from the first BOLD metadata file found for the task, searching first in the pipeline directory, then in thebids_dirif not found.slice_time_ref (
intorfloat, default=0.0) --The reference slice expressed as a fraction of the
trthat is subtracted from the condition onset times to adjust for slice time correction whenconditionis not None (onset - slice_time_ref * tr). Values can range from 0 to 1.Added in version 0.21.0.
run_subjects (
list[str]orNone, default=None) -- List of subject IDs to process (e.g.run_subjects=["01", "02"]). Processes all subjects if None.exclude_subjects (
list[str]orNone, default=None) -- List of subject IDs to exclude (e.g.exclude_subjects=["01", "02"]).exclude_niftis (
list[str]orNone, default=None) -- List of the specific preprocessed NIfTI files to exclude, preventing their timeseries data from being extracted. Used if there are specific runs across different participants that need to be excluded.pipeline_name (
strorNone, default=None) -- The name of the pipeline folder in the derivatives folder containing the preprocessed data. Used if multiple pipeline folders exist in the derivatives folder or the pipeline folder is nested (e.g. "fmriprep/fmriprep-20.0.0").n_cores (
intorNone, default=None) -- The number of cores to use for multiprocessing with Joblib. The "loky" backend is used.parallel_log_config (
dict[str, Union[multiprocessing.Manager.Queue, int]]) --Passes a user-defined managed queue and logging level to the internal timeseries extraction function when parallel processing (
n_cores) is used. Additionally, this parameter must be a dictionary and the available keys are:"queue": The instance of
multiprocessing.Manager.Queueto pass toQueueHandler. If not specified, all logs will output tosys.stdout."level": The logging level (e.g.
logging.INFO,logging.WARNING). If not specified, the default level islogging.INFO.
Refer to the neurocaps' Logging Documentation for a detailed example of setting up this parameter.
verbose (
bool, default=True) -- If True, logs detailed subject-specific information including: subjects skipped due to missing required files, current subject being processed for timeseries extraction, confounds identified for nuisance regression in addition to requested confounds that are missing for a subject, and additional warnings encountered during the timeseries extraction process.flush (
bool, default=False) -- If True, flushes the logged subject-specific information produced during the timeseries extraction process.progress_bar (
bool, default=False) --If True, displays a progress bar.
Added in version 0.21.5.
- Returns:
self
- Raises:
BIDSQueryError -- Subject IDs were not found during querying.
Note
Subject Timeseries Dictionary: This method stores the extracted timeseries of all subjects in
self.subject_timeseries. The structure is a dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a NumPy array:subject_timeseries = { "101": { "run-0": np.array([timeseries]), # Shape: TRs x ROIs "run-1": np.array([timeseries]), # Shape: TRs x ROIs "run-2": np.array([timeseries]), # Shape: TRs x ROIs }, "102": { "run-0": np.array([timeseries]), # Shape: TRs x ROIs "run-1": np.array([timeseries]), # Shape: TRs x ROIs } }
NifTI Files Without "run-" Entity: By default, "run-0" will be used as a placeholder, if run IDs are not specified in the NifTI file.
Parcellation & Nuisance Regression: For timeseries extraction, nuisance regression, and spatial dimensionality reduction using a parcellation, Nilearn's
NiftiLabelsMaskerfunction is used. If requested, dummy scans are removed from the NIfTI images and confound dataset prior to timeseries extraction. For volumes exceeding a specified framewise displacement (FD) threshold, if the "use_sample_mask" key in thefd_thresholddictionary is set to True, then a boolean sample mask is generated (where False indicates the high motion volumes) and passed to thesample_maskparameter in Nilearn'sNiftiLabelsMasker. If, "use_sample_mask" key is False or not specified in thefd_thresholddictionary, then censoring is done after nuisance regression, which is the default behavior.Extraction of Task Conditions: The formula used for computing the scan indices corresponding to the corresponding to a specific condition:
adjusted_onset = onset - slice_time_ref * tr adjusted_onset = adjusted_onset if adjusted_onset >= 0 else 0 start_scan = int(adjusted_onset / tr) + condition_tr_shift end_scan = math.ceil((adjusted_onset + duration) / tr) + condition_tr_shift scans.extend(list(range(onset_scan, end_scan))) scans = sorted(list(set(scans)))
When partial scans are computed,
intis used to round down for the beginning scan index andmath.ceilis used to round up for the ending scan index. Negative scan indices are set to 0 to avoid unintentional negative indexing. For simplicity, note that whenslice_time_refandcondition_tr_shiftare 0, the formula simplifies to:start_scan = int(onset / tr) end_scan = math.ceil((onset + duration) / tr) scans.extend(list(range(onset_scan, end_scan))) scans = sorted(list(set(scans)))
Filtering a specific condition from the timeseries is done after nuisance regression and the indices are used to extract the TRs corresponding to the condition from the timeseries. Additionally, if the "use_sample_mask" key in the
fd_thresholddictionary is set to True, then the truncated 2D timeseries is temporarily padded to ensure the correct rows corresponding to the condition are obtained.
- timeseries_to_pickle(output_dir, filename=None)[source]#
Save the Extracted Subject Timeseries.
Saves the extracted timeseries stored in the
self.subject_timeseriesdictionary (obtained from runningself.get_bold) as a pickle file. This allows for data persistence and easy conversion back into dictionary form for later use.- Parameters:
output_dir (
os.PathLike) -- Directory to saveself.subject_timeseriesdictionary as a pickle file. The directory will be created if it does not exist.filename (
strorNone, default=None) -- Name of the file with or without the "pkl" extension.
- Returns:
self
- visualize_bold(subj_id, run, roi_indx=None, region=None, show_figs=True, output_dir=None, filename=None, **kwargs)[source]#
Plot the Extracted Subject Timeseries.
Uses the
self.subject_timeseriesto visualize the extracted BOLD timeseries data of data Regions of Interest (ROIs) or regions for a specific subject and run.- Parameters:
subj_id (
strorint) -- The ID of the subject.run (
intorstr) -- The run ID of the subject to plot.roi_indx (
int,str,list[int],list[int]orNone, default=None) -- The indices of the parcellation nodes to plot. See "nodes" inself.parcel_approachfor valid nodes.region (
strorNone, default=None) -- The region of the parcellation to plot. If not None, all nodes in the specified region will be averaged then plotted. See "regions" inself.parcel_approachfor valid region.show_figs (
bool, default=True) -- Display figures.output_dir (
os.PathLikeorNone, default=None) -- Directory to save plot as png image. The directory will be created if it does not exist. If None, plot will not be saved.filename (
strorNone, default=None) -- Name of the file without the extension.**kwargs --
Keyword arguments used when saving figures. Valid keywords include:
dpi:
int, default=300 -- Dots per inch for the figure.figsize:
tuple, default=(11, 5) -- Size of the figure in inches.bbox_inches:
strorNone, default="tight" -- Alters size of the whitespace in the saved image.
- Returns:
self
Note
Parcellation Approach: the "nodes" and "regions" sub-keys are required in
parcel_approach.