SRA submission

At this point, you are ready to submit the raw reads that are associated with your BioProject. Sequences can be submitted exactly as received from the sequencing center. Alternatively, processed reads (i.e. those that have been subjected to quality trimming, contaminant removal, etc.) may be submitted. If processed reads are submitted, is important to include information about all the processing procedures applied.

 

6.        Go to https://submit.ncbi.nlm.nih.gov/subs/sra/ and sign in to NCBI with your user account.

Choose New submission.

 

During this step, the submitter will provide information about the sequencing project, including organism information, funding sources, etc.. The submission process progresses through five fillable forms (presented as tabs at the top of the page). In order, these are:

 

        a.   Submitter: Provide information about the person submitting the data and the submitting organization (typically, this will be the submitter's organizational affiliation). An email address from the submitting organization's domain is required. If desired, a shared submission group can be created, allowing multiple authors to access and contribute to the submission.

 

        b.   General info: Here the user will enter the BioProject ID created earlier, indicate whether BioSample IDs have been created, and choose a release date. If Release immediately following processing is selected, the raw data will become publicly accessible right away. If the user wishes to delay release, a future release date must be selected.

 

        c.  SRA metadata: Here the submitter must provide a metadata file containing information about the sequencing procedures used. A template file can be downloaded from the SRA site as a tab-delimited TXT file or as an Excel file. The Excel file is easier to work with and provides helpful details. In either case, the user must save the edited template (the sheet called "SRA_Data" in the Excel file) as a new tab-delimited file. To save the SRA_Data worksheet as a tab-delimited file, use "Save As" "Tab Delimited Text (.txt)."

 

Gcitizenii_SRA_metadata_acc.xlsx

Gcitizenii_SRA_metadata_acc.txt


The metadata template has 17 fields. The following 13 fields are mandatory for all data types:

        i.   bioproject_accession: The BioProject ID associated with the raw reads.

        ii.   biosample_accession: The BioSample ID(s) associated with the raw reads.

        iii.   library_ID: A user-defined unique identifier. Each sequencing library must have its own unique ID.

        iv.   title: A short, publicly viewable description of the data. NCBI recommends the format "<methodology> of <organism>: <sample info>" (e.g. "RNA-Seq of Drosophila melanogaster: adult female antennae").

        v.   library_strategy: The user must choose from a provided set of options. For most transcriptome studies, the user should choose "RNA-Seq".

        vi.   library_source: The nucleic acid type that was used to prepare the library. For most transcriptome studies, the user will choose "transcriptomic," but "metatranscriptomic" and "single cell transcriptomic" are also available choices.

        vii.   library_selection: The method of selection or enrichment used in preparing the sequencing library. For RNA-Seq studies using polyA selection for enrichment for messenger RNA (mRNA), the user should choose "PolyA." For other RNA-Seq methods, such as Total RNA, choose "cDNA." More specialized options are available as appropriate (e.g. "cDNA_oligo_dT").

        viii.   library_layout: Specify whether paired or single end sequencing was done.

        ix.   platform: The sequencing platform used (Illumina, PacBio, etc.).

        x.   instrument_model: The specific model of the sequencing instrument

        xi.   design_description: A short methods section describing how the libraries were prepared. Users are encouraged to provide all relevant details, including, e.g., specific tissues extracted or whether sequencing represents pooled individuals. If processed reads are being submitted, describe the filtering or other steps carried out.

        xii.   filetype: The format of the raw reads file (typically FASTQ).

        xiii.   filename: The exact file name, including extension. This must match the name of the file(s) you upload in the next step. In the case of paired read files (R1 and R2), the submitter will enter filenames in the columns filename and filename2.

Upload the newly created file and click Continue.

 

        d.      Files: Here the user uploads the raw reads. There are two main ways to do this: (i) Web-based direct upload of files stored on the user's local computer. Files smaller than 2GB can be uploaded directly, larger files will require the Aspera connect plugin. (ii) Command line-based interface to upload files via FTP or Aspera connect command line. The command line options are preferred, especially for larger data sets. We here describe each option more thoroughly:

·         Direct upload: On the "Files" page, choose I will upload all the files now via HTTP/Aspera, then use the Browse button to select files stored locally. After all the files are transferred, click on Continue to move to the final tab.

·         Aspera plugin: If the user has installed the Aspera connect plugin, choosing I will upload all the files now via HTTP/Aspera will automatically launch an Aspera dialogue and let the user select a locally stored file. Multiple files can be selected at once. After selecting the files, the user must confirm the transfer by allowing Aspera to connect to the NCBI web page. After all the files are transferred, click on "Continue" to move to the final tab.

·         Command line FTP: The command ftp is standard in Unix and Linux environments. On the "Files" page, choose I have all files preloaded for this submission, then click on FTP upload instructions.

This will create a temporary NCBI user directory, and display all the information required for logging in. The information is a numbered list (1 through 7) and provides variable names that will be entered as shown below. Keep this webpage open (or copy all the information).

Now the user should open a session on their local server and proceed as shown below. If there are difficulties in having the password accepted, try copying it from the SRA page into a text editor, then copying it from the text editor to the terminal.


Commands used:

cd SRA_reads
ftp ftp-private.ncbi.nlm.nih.gov
cd uploads/goodcitizen@mail.com_GQUXdxyX
put GS7H_ACAGTGAT-_BC6GFHANXX_L008_001.R1.fastq
put GS7H_ACAGTGAT-_BC6GFHANXX_L008_001.R2.fastq
exit


After all the files are transferred, return to the SRA submission webpage. Click on Select preload folder to see and select the folder that was just created.

Click on Use selected folder and Continue to move to the final tab.

 

·         Aspera command line: If the user has installed the Aspera connect software on the server where the FASTQ files are stored, a single command can be used to transfer all the FASTQ files at once. On the "Files" page, choose I have all files preloaded for this submission, then click on Aspera command line upload instructions.

Click on the link Get the key file to download a file called "aspera.openssh." This file must be transferred to the server where the FASTQ reads are stored, which can be done with the command scp from the terminal or using free FTP software such as FileZilla (available at https://filezilla-project.org/).

Make sure that all the FASTQ files for upload are in a single folder that contains nothing else, then execute the following:


Commands used:

ascp -i /Home/GoodCitizen/aspera.openssh -QT -l100m -k1 \

-d /Home/GoodCitizen/SRA_reads/ \

subasp@upload.ncbi.nlm.nih.gov:uploads/goodcitizen@mail.com_qLWrURRuftp


After all the files are transferred, return to the SRA submission webpage. Click on Select preload folder to see and select the folder that was just created.

Click on Use selected folder and Continue to move to the final tab.

 

        e.  Overview: Here the user can review all the provided information and check for errors. Once satisfied with all entries, the user should click Submit.

 

When the process is complete, the user will receive an email containing the final SRA ID in the form SRAxxxxxx.

7.  Submit