Quickstart Guide¶
Note
The tools listed here are under development and may be subject to change.
About¶
gen3 tracker, or g3t, is a command line tool for the ACED-IDP platform. It provides a set of utilities for users to upload data to and download data from the platform. The following tutorial will walk you through the steps for two different use cases:
- Uploading files for a new project to the platform
- Downloading an existing project from the platform
Each step will outline the command to execute followed by a brief description of the command's functionality.
Requirements¶
Please ensure you have completed the following setup from the Requirements page:
- Installed gen3-client
- Configured a gen3-client profile with credentials
- Installed gen3-tracker
To confirm all dependencies are set up as expected, run
You should get a message like this
msg: 'Configuration OK: Connected using profile:aced' endpoint: https://aced-idp.org username: someone@example.com
along with the set of projects you have been provided access to.
General Usage¶
g3t is built on git, so many commands behave similarly to git with some key differences. These differences will be outlined for each step in the submission process.
1. Upload Data to a Newly Approved Project¶
The first use case we will cover is how to add data to a new project on the ACED-IDP.
Note
The following examples will use the aced
program with a project called myproject
and an aced
g3t profile.
Check Project Permissions¶
To start, check what projects you have access to using the command
Check that you have permission to edit aced-myproject
. This is what allows you to push data up to the platform. If you do not have the correct permissions, please contact a system administrator.
Specify a gen3 Profile¶
For most g3t commands, you need to specify the gen3-client profile you want to use. This ensures that you are uploading projects to the right platform with the right credentials. There are two ways to set your profile...
To set a profile using an environmental variable:
To pass the profile as a flag to the ping
command for example:
For the rest of the tutorial, we will assume you have exported a G3T_PROFILE
environment variable so we don't have to use the --profile
flag each time.
Initialize a new project¶
To initialize your new project locally, you can use g3t init
- Similar to
git init
, this command creates a new project in the current directory - Within the project, there are a couple important directories...
MANIFEST/
: stores file metadata entriesMETA/
: stores metadata converted into the FHIR standard.g3t/
: hidden, stores and manages g3t state for the project
- The project ID is
aced-myproject
made from the program nameaced
and project namemyproject
. Specifically,- Program name: is predefined by the institution, defining what remote data buckets and endpoints you have access to
- Project name: must be unique within the server, be alphanumeric, and contain no spaces or hyphens
- For more information, see creating a project
Add files to the manifest¶
Once your project is initialized, you can add files to the project's manifest. For example, let's say you have tsv files in a folder/
directory within your current repository. Each of the tsv files are associated with a particular subject, say patient_1
and patient_2
. To add them using g3t add
,
- Each
g3t add
above creates a metadata entry for the specified data file, automatically calculating metadata like the file's md5sum, type, date modified, size, and path.- Just as a ship's manifest is an inventory of its cargo, the
MANIFEST/
directory is an inventory for each file's metadata - Each metadata entry is stored as a
.dvc
file in theMANIFEST
directory, where the dvc file path mirrors the original file path - Example:
folder/file.tsv
creates aMANIFEST/folder/file.tsv.dvc
entry
- Just as a ship's manifest is an inventory of its cargo, the
- Using the patient flag is one way to associate a file with a particular subject, in this case associating each file with a specified patient identifier.
g3t add
varies fromgit add
, as the.dvc
file is what gets staged rather than the potentially large data file- Multiple files can be added at the same time by wrapping a wildcard string in quotes, for example,
g3t add "*.csv"
. - For more information on usage, such as adding entries for remote files or how to associate files with a sample, see adding files
Create metadata¶
Now that your files have been staged with metadata entries, you can create FHIR-compliant metadata using the g3t meta init
command
- Using the file metadata entries created by the
g3t add
command,g3t meta init
creates FHIR-compliant metadata files in theMETA/
directory, where each file corresponds to a FHIR resource. At a minimum, the directory will contain:
File | Contents |
---|---|
ResearchStudy.ndjson | Description of the project |
DocumentReference.ndjson | File information |
- Additional metadata files for patient, specimen, and other entities will be generated based on options provided to the
add
command.
File | Contents |
---|---|
Patient.ndjson | Patient information |
ResearchSubject.ndjson | Enrollment information |
Specimen.ndjson | Sample information |
meta init
is a good example of where g3t differs from git! While you might go fromgit add
straight togit commit
in a git workflow, we have to dog3t add
>g3t meta init
>g3t commit
to track both the files and each file's metadata in g3t.meta init
focuses on creating metadata specific to the files you added. For your particular use case, you may also want to supply your own FHIR data, see adding FHIR metadata
Check that the metadata is valid¶
To ensure that the FHIR data has been properly formatted, you can call g3t meta validate
.
- The system will print summary counts and informative messages if the metadata is invalid.
Check that the expected files are queued for upload¶
You can double-check that all of your files have been staged with g3t status
Commit files¶
With all checks complete, you can commit the metadata we created using g3t commit
.
- Like git, this command bundles the staged files into a single set of changes.
- The
-m
flag adds a commit message to the changes - If the commit is successful, you will see a summary of the changes logged
- As a reminder, the files that are committed to git are the FHIR metadata in META/ and the .dvc entries in MANIFEST/, not the data files themselves
- See publishing a project for more info
Push to ACED-IDP¶
To submit the files and metadata to the data platform, we can use g3t push
- This command launches a job to upload project data to the specified data platform.
- Specifically, it...
- Checks that all files are committed before pushing
- Checks that the
META/
metadata is valid - Indexes the data files using the file metadata in the
MANIFEST/
directory - Uploads the FHIR metadata in the
META/
directory into our databases
- A push will fail if no new files are being submitted. If you need to update existing files in the manifest or update the FHIR metadata, use the
--overwrite
option to force an upload. - A job is successful if you get a green success message.
- For other publishing options and specialized use cases, see publishing a project
View the Data on the Platform¶
Congratulations, you have submitted data to the platform! To check that your data was uploaded, login and navigate to the Exploration page on aced-idp.org!
2. Download Data from a Project on ACED-IDP¶
Sometimes you might want the most recent version of a data project that has already been published to the platform. To download the metadata for an existing project, use the g3t clone
command.
- The clone command will download the metadata associated with the project into a new directory
- Specifically, it downloads the metadata
.dvc
entries inMANIFEST/
and the FHIR-compliant metadata inMETA/
To retrieve the actual data files described by manifest as opposed to just the file metadata, use the pull command.
- The pull command will retrieve the actual data files associated with the metadata.
To download only a subset of files, refer to the downloads page. For more information on other commands or use cases, see the Use Cases & Workflows section.