Quickstart Guide¶

Note

The tools listed here are under development and may be subject to change.

About¶

gen3 tracker, or g3t, is a command line tool for the ACED-IDP platform. It provides a set of utilities for users to upload data to and download data from the platform. The following tutorial will walk you through the steps for two different use cases:

Uploading files for a new project to the platform
Downloading an existing project from the platform

Each step will outline the command to execute followed by a brief description of the command's functionality.

Requirements¶

Please ensure you have completed the following setup from the Requirements page:

Installed gen3-client
Configured a gen3-client profile with credentials
Installed gen3-tracker

To confirm all dependencies are set up as expected, run

g3t --profile <your_g3t_profile_name> ping

You should get a message like this

msg: 'Configuration OK: Connected using profile:aced' endpoint: https://aced-idp.org username: someone@example.com

along with the set of projects you have been provided access to.

General Usage¶

g3t [OPTIONS] COMMAND [ARGS]...

g3t is built on git, so many commands behave similarly to git with some key differences. These differences will be outlined for each step in the submission process.

1. Upload Data to a Newly Approved Project¶

The first use case we will cover is how to add data to a new project on the ACED-IDP.

Note

The following examples will use the aced program with a project called myproject and an aced g3t profile.

Check Project Permissions¶

To start, check what projects you have access to using the command

g3t projects ls

Check that you have permission to edit aced-myproject. This is what allows you to push data up to the platform. If you do not have the correct permissions, please contact a system administrator.

Specify a gen3 Profile¶

For most g3t commands, you need to specify the gen3-client profile you want to use. This ensures that you are uploading projects to the right platform with the right credentials. There are two ways to set your profile...

To set a profile using an environmental variable:

export G3T_PROFILE=aced

To pass the profile as a flag to the ping command for example:

g3t --profile aced ping

For the rest of the tutorial, we will assume you have exported a G3T_PROFILE environment variable so we don't have to use the --profile flag each time.

Initialize a new project¶

To initialize your new project locally, you can use g3t init

mkdir aced-myproject
cd aced-myproject
g3t init aced-myproject

Similar to git init, this command creates a new project in the current directory
Within the project, there are a couple important directories...
- MANIFEST/: stores file metadata entries
- META/: stores metadata converted into the FHIR standard
- .g3t/: hidden, stores and manages g3t state for the project
The project ID is aced-myproject made from the program name aced and project name myproject. Specifically,
- Program name: is predefined by the institution, defining what remote data buckets and endpoints you have access to
- Project name: must be unique within the server, be alphanumeric, and contain no spaces or hyphens
For more information, see creating a project

Add files to the manifest¶

Once your project is initialized, you can add files to the project's manifest. For example, let's say you have tsv files in a folder/ directory within your current repository. Each of the tsv files are associated with a particular subject, say patient_1 and patient_2. To add them using g3t add,

g3t add folder/file.tsv --patient patient_1
g3t add folder/file2.tsv --patient patient_2

Each g3t add above creates a metadata entry for the specified data file, automatically calculating metadata like the file's md5sum, type, date modified, size, and path.
- Just as a ship's manifest is an inventory of its cargo, the MANIFEST/ directory is an inventory for each file's metadata
- Each metadata entry is stored as a .dvc file in the MANIFEST directory, where the dvc file path mirrors the original file path
- Example: folder/file.tsv creates a MANIFEST/folder/file.tsv.dvc entry
Using the patient flag is one way to associate a file with a particular subject, in this case associating each file with a specified patient identifier.
g3t add varies from git add, as the .dvc file is what gets staged rather than the potentially large data file
Multiple files can be added at the same time by wrapping a wildcard string in quotes, for example, g3t add "*.csv".
For more information on usage, such as adding entries for remote files or how to associate files with a sample, see adding files

Create metadata¶

Now that your files have been staged with metadata entries, you can create FHIR-compliant metadata using the g3t meta init command

g3t meta init

Using the file metadata entries created by the g3t add command, g3t meta init creates FHIR-compliant metadata files in the META/ directory, where each file corresponds to a FHIR resource. At a minimum, the directory will contain:

File	Contents
ResearchStudy.ndjson	Description of the project
DocumentReference.ndjson	File information

Additional metadata files for patient, specimen, and other entities will be generated based on options provided to the add command.

File	Contents
Patient.ndjson	Patient information
ResearchSubject.ndjson	Enrollment information
Specimen.ndjson	Sample information

meta init is a good example of where g3t differs from git! While you might go from git add straight to git commit in a git workflow, we have to do g3t add > g3t meta init > g3t commit to track both the files and each file's metadata in g3t.
meta init focuses on creating metadata specific to the files you added. For your particular use case, you may also want to supply your own FHIR data, see adding FHIR metadata

Check that the metadata is valid¶

To ensure that the FHIR data has been properly formatted, you can call g3t meta validate.

g3t meta validate

The system will print summary counts and informative messages if the metadata is invalid.

Check that the expected files are queued for upload¶

You can double-check that all of your files have been staged with g3t status

g3t status

Commit files¶

With all checks complete, you can commit the metadata we created using g3t commit.

g3t commit -m "adding tsv metadata"

Like git, this command bundles the staged files into a single set of changes.
The -m flag adds a commit message to the changes
If the commit is successful, you will see a summary of the changes logged
As a reminder, the files that are committed to git are the FHIR metadata in META/ and the .dvc entries in MANIFEST/, not the data files themselves
See publishing a project for more info

Push to ACED-IDP¶

To submit the files and metadata to the data platform, we can use g3t push

g3t push

This command launches a job to upload project data to the specified data platform.
Specifically, it...
1. Checks that all files are committed before pushing
2. Checks that the META/ metadata is valid
3. Indexes the data files using the file metadata in the MANIFEST/ directory
4. Uploads the FHIR metadata in the META/ directory into our databases
A push will fail if no new files are being submitted. If you need to update existing files in the manifest or update the FHIR metadata, use the --overwrite option to force an upload.
A job is successful if you get a green success message.
For other publishing options and specialized use cases, see publishing a project

View the Data on the Platform¶

Congratulations, you have submitted data to the platform! To check that your data was uploaded, login and navigate to the Exploration page on aced-idp.org!

2. Download Data from a Project on ACED-IDP¶

Sometimes you might want the most recent version of a data project that has already been published to the platform. To download the metadata for an existing project, use the g3t clone command.

g3t clone aced-myproject

The clone command will download the metadata associated with the project into a new directory
Specifically, it downloads the metadata .dvc entries in MANIFEST/ and the FHIR-compliant metadata in META/

To retrieve the actual data files described by manifest as opposed to just the file metadata, use the pull command.

mkdir aced-myproject
cd aced-myproject
g3t init aced-myproject
g3t pull

The pull command will retrieve the actual data files associated with the metadata.

To download only a subset of files, refer to the downloads page. For more information on other commands or use cases, see the Use Cases & Workflows section.