Glossary

There are a variety of terms that you’ll come across throughout the ALS Knowledge Portal (ALS-KP). Refer back to this page if you’re ever unsure about what something means, or the difference between terms.

For more insights, visit our discussion forum, where you can browse the existing discussion topics, and/or post a new question for the community to help you with.

Access Request

An electronic application submitted via Synapse by a user seeking permission to access and use controlled-access data. An Access Request may be submitted by a single user on behalf of several collaborating Synapse users from the same institution.

Annotations

Annotations are labels or tags that you add to data (such as a project, file, folder, table, or view). They provide extra information using a standardized set of terms so that the data can be described consistently, organized more clearly, and discovered more easily in searches or filters.

For data contributors uploading data into Synapse, you can find more information on how to assign and edit annotations here: Annotating Data With Metadata.

As a user, annotations are what allow you to systematically search for and find specific data of interest. If you haven’t already, learn how to filter and find data on the ALS-KP Portal here: Navigating the Portal.

Controlled-Access Data

The data in ALSKP is hosted by diverse data contributors on several platforms. Access to data in ALSKP follows a tiered system to safeguard participant privacy and data confidentiality. While some data is openly available, sensitive data (particularly individual-level human omics data, and data derivatives) requires controlled access protection. Individuals wishing to access controlled-access data must obtain approval. You can find more information here: Data Access.

Controlled Value

A pre-formatted value that must be used as defined. Ex: True instead of yes; female instead of woman

Data

In our context, this refers to the data generated across studies. Find data on the ALSKP here: Explore.

Data Subtype

Data Subtype (also referred to as dataSubtype) is a file annotation that indicates if data in the file are raw, processed, or normalized, or if the file contains metadata.

General Research Use (GRU)

General Research Use or GRU means that the data can be used for broad research purposes without limitations, like disease-specific research or institution-specific research. GRU is the broadest standard NIH consent group option for controlled-access data.

Individual ID

An individual ID is the identifier for a specific individual (human subject or single animal).

Individual-Level Data

Individual-level data refers to any file that has values for an individual, as opposed to aggregate data, which are data combined from several individuals.

Intended Data Use Statement (IDU)

An Intended Data Use (IDU) is a description submitted with an Access Request that explains why you want to use controlled-access data, what you plan to do with it, and how you will do it. The Data Access Committee (DAC) reviews this information to decide whether access should be granted. Your IDU should answer three key questions: What do you want to do? Why are you doing it? How will you do it?

File Schema

The JSON schema associated with the Synapse File Entity.

Governance

Due to the open-access nature of the platform, ALSKP operates under comprehensive governance policies that define the rights and responsibilities of portal users. Learn more about Synapse Governance, and access our terms of service, privacy policy, and more from the Trust Center.

Grant

A grant is represented by a contract number assigned to a NIH-funded project.

HIPAA-Limited Data Set (LDS)

Data that excludes all PHI (as defined by HIPPA) except for at least one of the following:

dates such as admission, discharge, service, DOB, DOD
city, state, five-digit or more zip code
ages in years, months or days or hours

Note: HIPAA-Limited Data Set should always be categorized as Controlled-Access Data.

Manifest

A list of files and their metadata. There are several different types of manifests used throughout Synapse:

Upload manifest: This is a .tsv file used to upload metadata. A template is provided here: Uploading a File Programmatically.

Download manifest: This is used when downloading data programmatically. A template is provided here: Downloading Data Programmatically. You can learn more here: Synapse Python Client.
File Schema Driven Manifest: This is based on the new File Schema.
Portals Manifest: This is currently provided when exporting data.

Metadata

Metadata is structured data that describes other data. It is used to facilitate curation and supports the discovery and interrogation of data without disclosing individual-level records. Examples of metadata include study design, sample characteristics, data types, assay details, and links to associated publications.

Metadata can also include unstructured information like study descriptions, and structured data that helps with curation, discovery, and interrogation of data without revealing individual-level records. Open-access participant-level phenotypic data is not considered metadata.

National Institutes of Health (NIH)

The NIH, a part of the United States Department of Health and Human Services, is the nation’s medical research agency. NIH’s mission is to seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability. The NIH is made up of 27 Institutes and Centers.

Open Data/Open Science

Open data represents transparent and accessible knowledge that is shared and developed through collaborative networks, based on the principles of open science. The goal of open science is to make scientific research - including publications, data, samples, and software - and its dissemination accessible to all levels of an inquiring society, whether amateur or professional.

The general driving idea behind open science and data is that scientific research can and should be accessible to anyone. This system benefits all parties involved. Researchers gain wider-reaching recognition and appreciation for their work; Research participants get to witness the value of their contributions; Scientists and other professionals access quality data to aid in their research and work; The general public gains helpful information and knowledge from trusted sources. This is truly a win-win. Collective consciousness is a global good!

Program

In our context, a program represents a group of scientists working together towards a common research goal. More information may be found here: Programs.

Results

Results are data analyses generated using biological and computational tools, supported by additional information on the portal, such as metadata and data provenance.

From a reusability perspective, data is the most useful to future users. Both results and data can be shared, but “data” is more important for reproducibility and reuse.

We consider data to be raw or partially processed information, depending on the type of experiment. Results are generally post-analysis information or manuscript figures. For example, if you are sharing gene expression information, raw data would be the raw, zipped, fastq.gz files, while differential expression analysis and volcano plots would be considered results. This distinction is well-defined for many types of data, but for assays that we encounter less often, this may be less clear. Results might also be acceptable for assays that do not lend themselves to re-analysis, such as western blotting. We can work with you to help figure this out.

Schema

An overlapping concept to a data model, a metadata schema provides further rules and standardization of a data model. It outlines additional rules governing the management of metadata through constraints such as the optionality or valid values of attributes.

Sensitive Data

Data that must be protected from unauthorized access to safeguard the privacy or security of an individual or organization. This includes human data at risk of re-identification.

Note: “De-identified” data (maintained in a way that does not allow association with a specific person) is not considered sensitive.

Specimen ID

A specimen ID is the identifier for a sample from a specific individual – for example, a brain sample from a specific region or a blood sample.