Protocol Steps

 

2.        Once the assembly file is properly prepared, go to https://submit.ncbi.nlm.nih.gov/ to begin the TSA submission. Sign in to NCBI with your user account, choose TSA and click on New Submission.

 

During this step, the submitter will provide information about the assembly, including the BioSample, BioProject, and SRA identifiers. The user must decide between submitting the assembly as a FASTA file or as an ASN.1 file. An ASN.1 file is mandatory if the submitter also plans to provide annotation; otherwise, either method can be chosen. If FASTA upload is selected, the user will manually provide some additional data; if ASN.1 is chosen, the required information is embedded in the ASN.1 file. The submission process progresses through either five (ASN.1) or six (FASTA) fillable forms, depending on the sequence format chosen. We first present the steps for FASTA files, followed by the steps for ASN.1 files.

 

3.        For FASTA files:

        a.   Submitter: As in earlier examples.

        b.   General info: Here the user will provide the BioProject identifier (PRJNAxxxxxx), the BioSample identifier (SAMNxxxxxx), the release date, the data type (EST or NGS), and the SRA accession(s) (SRRxxxxxx).

 

In addition, the user must provide "Assembly metadata", which includes:

        i.   Information about the assembly method: Provide the name of the assembly program used (e.g. Trinity, Abyss, etc.) and the version number (or date of assembly, if program version is not known);

        ii.  Assembly name (optional);

        iii.  Assembly coverage (optional);

        iv.  Description of Assembly method (required): This should be as detailed as possible and include and read processing steps, whether default program settings were used, and any other information that would be required to exactly reproduce the assembly process; and,

        v.  Sequencing Technology: The platform used for sequencing (i.e. Illumina HiSeq, PacBio, etc.).

 

        c.  File: Here the user must choose between submitting the assembly as a FASTA file or as an ASN.1 file. Choose File type FASTA and click Continue.

 

        d.   Sequence: Click on Browse to select an assembly.fsa file stored on the user's local machine. An Aspera connect window will open to display the progress of the upload. Once the upload is complete, the message "Please wait! Processing the data" is displayed as an initial TSA validation check is conducted. If errors are displayed, click on report.txt for more information. If no report.txt link is shown, simply copy all the errors from the webpage and save as a file called TSA_report.txt.

The report contains a list of all the problem contigs (see TSA_report.txt for example). Use the following steps to remove any problematic contigs:

 

Create a list of all the contig IDs in the assembly:


Commands used:

grep ">" Gcitizenii_Spec7_assembly.fas | cut -d " " -f1 | sed 's/>//g' > all_ids


Get the IDs of the contaminant contig IDs from TSA_report.txt:


Commands used:

cut -d " " -f5 TSA_report.txt | sed 's/,//g' > contaminant_ids


Sort each ID list:


Commands used:

sort all_ids -o all_ids
sort contaminant_ids -o contaminant_ids


Create a list of the good (non-contaminant) contig IDs:


Commands used:

comm -23 all_ids contaminant_ids > good_ids


Use this Perl one-liner to make a new assembly file containing only the good contigs. Be sure to save the filtered assembly with a new name:


Commands used:

perl -ne 'if(/^>(\S+)/){$c=$i{$1}}$c?print:chomp;$i{$_}=1 if @ARGV' \

good_ids Gcitizenii_Spec7_assembly.fsa \

> Gcitizenii_Spec7_filtered_assembly.fsa


Upload the filtered assembly:

        e.   References: Provide the name(s) of the "Sequence authors", i.e. the people responsible for generating the raw reads upon which the assembly is based, and information about publications (unpublished, in-press, or published) that include the assembly.

 

        f.   Overview: Here the user can look over all the provided information and decide if changes are needed. After the user clicks Submit, the submission will undergo further assessment at NCBI and a complete VecScreen analysis will occur. This process can take 12 or more hours. If there are no problems, NCBI will send a confirmation email with a TSA accession number in the format GAAxxxxxx.

 

4.        For ASN.1 files:

Before submitting an ASN.1 file, some additional preparations are required. The following files must be generated for each assembly that will be submitted:

        a.  Create a "GenBank Submission Template" (SBT file): Go to https://submit.ncbi.nlm.nih.gov/genbank/template/submission/ and complete the required fields. The same BioProject identifier can be used for as many assemblies as are part of the project, but each BioSample identifier should have its own SBT file. After filling all fields, click Create Template to download the SBT file. Save the file with a name that reflects which assembly it is for.

        b.  Create a "Structured Comment - Non Genome Submissions" (CMT file): Go to https://submit.ncbi.nlm.nih.gov/structcomment/nongenomes/ and complete the required fields. The Assembly name field should be specific to a single assembly. Click Download to download the CMT file. Save the file with a name that reflects which assembly it is for.

        c.  Place the following files in a single folder (either on the user's local machine or on a server): The correctly formatted assembly.fsa file, as described in the beginning of section 3; the .sbt and .cmt files just created; and an appropriate version of the tbl2asn executable (e.g. linux64.tbl2asn, mac.tbl2asn, etc.).

        d.   From within the folder, run the tbl2asn command as follows:


Commands used:

linux64.tbl2asn -t Gcitizenii.sbt -i Gcitizenii_Spec7_assembly.fsa -a s -V v -w Gcitizenii.cmt


 

The resulting output files will be (i) assembly.val, a validation file that will report errors-if this file is empty, no errors were detected; (ii) assembly.sqn, which is the ASN.1 file for TSA submission.

 

        e.  Once the assembly.sqn file is prepared, go to https://submit.ncbi.nlm.nih.gov/ to begin the TSA submission.

        f.  Submitter: As in earlier examples

        g.  General info: As in earlier examples

        h.  File: Choose File type ASN and click Continue.

        i.  Sequence: Click on Browse to select an assembly.sqn file stored on the user's local machine. An Aspera connect window will open to display the progress of the upload. Once the upload is complete, the message "Please wait! Processing the data" is displayed as an initial TSA validation check is conducted. If no errors are displayed, click Continue.

If there are errors, see the previous section "For FASTA files: d" for instructions on removing problematic contigs. Be sure to save the filtered assembly with a new name.

        j.  Overview: Here the user can look over all the provided information and decide if changes are needed.

5.         Submission: After the user clicks Submit, the submission will undergo further assessment at NCBI and a complete VecScreen analysis will occur. This process can take 12 or more hours. If there are no problems, NCBI will send a confirmation email with a TSA accession number in the format GAAxxxxxx.

If the assembly fails to pass the more thorough quality checks that occur post-submission, the submitter will receive an email stating that the submission failed.

An included link directs the user to the submission portal. Click on the Fix link.

This will lead you to a downloadable Contamination.txt file describing the type of contaminant identified and list the corresponding sequence identifiers.

See Gcitizenii_sqn_contamination.txt for an example.

 

Sequences that represent exogenous contamination are listed under "Exclude," and these should be entirely removed from the assembly. Sequences with strong matches to primers or adaptors are listed under "Trim" and we suggest that these also be removed. If there are duplicated sequences, remove them.

 

6.        See the previous section "For FASTA files: d" for instructions on removing problematic contigs. Be sure to save the filtered assembly with a new name.

 

Once the assembly has been filtered, return to the Sequence tab and upload the filtered assembly.

 

If all is well, you will receive a conformation email.