Submission ID | S00003 |
Submission Date | 2018-04-16 00:00:00 |
Submission Status | complete |
Submitter | Linnea Thörnqvist |
Submitter Address | Department of Immunotechnology, Lund University, Medicon Village (building 406), S-223 81 Lund, Sweden |
Species | Homo sapiens |
Ethnicity | UN |
Name | Institution | ORCID ID |
---|---|---|
Mats Ohlin | Department of Immunotechnology, Lund University, Lund, Sweden | 0000-0002-5105-1938 |
Christian Busse | Division of B Cell Immunology, German Cancer Research Center (DKFZ), Heidelberg, Germany |
Repository | NCBI SRA |
Accession Number | PRJNA349143 |
Project/Study Title | Dynamics of the human antibody repertoire after influenza vaccination |
Dataset URL | https://www.ncbi.nlm.nih.gov/bioproject/PRJNA349143 |
MiAIRR Compliant? | No |
MiAIRR URL | |
Sequencing Platform | Illumina |
Read Length | 325 (read 1) + 300 (read 2) |
Primers Overlapping? | No |
PubMed ID | Title | Authors |
---|---|---|
24639495 | High-resolution antibody dynamics of vaccine-induced immune responses. | Laserson U, Vigneault F, Gadala-Maria D, Yaari G, Uduman M, Vander Heiden JA, Kelton W, Taek Jung S, Liu Y, Laserson J, Chari R, Lee JH, Bachelet I, Hickey B, Lieberman-Aiden E, Hanczaruk B, Simen BB, Egholm M, Koller D, Georgiou G, Kleinstein SH, Church GM |
30172112 | Critical steps for computational inference of the 3'-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7. | Thörnqvist L, Ohlin M |
Only the IgM-transcriptome was analyzed
Primer Name | Primer Sequence |
---|---|
Human-IGHM | GAATTCTCACAGGAGACGAGG |
Human-IGHD | TGTCTGCACCCTGATATGATGG |
Human-IGHA | GGGTGCTGYMGAGGCTCAG |
Human-IGHE | TTGCAGCAGCGGGTCAAGG |
Human-IGHG | CCAGGGGGAAGACSGATG |
Human-IGK | GACAGATGGTGCAGCCACAG |
Human-IGL | AGGGYGGGAACAGAGTGAC |
Primer Name | Primer Sequence |
---|---|
TS-shift0 | TACGGG |
TS-shift1 | ATACGGG |
TS-shift2 | TCTACGGG |
TS-shift3 | CGATACGGG |
TS-shift4 | GATCTACGGG |
Sample Preparation and Sequencing
“The blood samples collected in the influenza vaccination study by Laserson et al (PMID:24639495) were re-sequenced using the Illumina MiSeq platform. RNA was reverse-transcribed into cDNA using a biotinylated oligo dT primer. An adaptor sequence was added to the 3’ end of all cDNA, which contains the Illumina P7 universal priming site and a 17-nucleotide unique molecular identifier (UMI). Products were purified using streptavidin-coated magnetic beads followed by a primary PCR reaction using a pool of primers targeting the IGHA, IGHD, IGHE, IGHG, IGHM, IGKC and IGLC regions, as well as a sample-indexed Illumina P7C7 primer. The immunoglobulin-specific primers contained tails corresponding to the Illumina P5 sequence. PCR products were then purified using AMPure XP beads. A secondary PCR was then performed to add the Illumina C5 clustering sequence to the end of the molecule containing the constant region. The number of secondary PCR cycles was tailored to each sample to avoid entering plateau phase, as judged by a prior quantitative PCR analysis. Final products were purified, quantified with Agilent Tapestation and pooled in equimolar proportions, followed by high-throughput paired-end sequencing on the Illumina MiSeq platform. For sequencing, the Illumina 600 cycle kit was used with the modifications that 325 cycles was used for read 1, 6 cycles for the index reads, 300 cycles for read 2 and a 10%PhiX spike-in to increase sequence diversity.” (As described by the original submitter of the sequencing data, PRJNA349143)
Data pre-processing Pipeline
The following pRESTO pipeline was used:
1. Filtering out all sequences of low quality (q<20) (FilterSeq.py quality)
2. Pairing of forward and reverse reads. (PairSeq.py)
3. Assembly of forward and reverse reads into single sequences (AssemblePairs.py align)
Inference
IgDiscover has been run on IgM-encoding sequences of a single donor multiple times, but with different starting databases:
Database 1: IMGT database but with IGHV3-7*02 extended by two bases (GA), i.e. with a 3’-end that reads GCGAGAGA
Database 2: IMGT database but with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGGGA
Database 3: IMGT database but with IGHV3-7*02 extended by two bases (GA), and with the IGHV3-7*02 with a 3’-end that reads GCGAGGGA, (i.e. two variants of IGHV3-7*02)
Database 4: IMGT database as it is, and in addition with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGG (i.e. two variants of IGHV3-7*02)
Database 5: IMGT database as it is (i.e. with IGHV3-7*02 that ends as GCGAGA)
In all cases IGHV3-7*01 is inferred with an ending that reads: GCGAGAGA
For IGHV3-7*02, the following is inferred when each of the databases are used:
Database 1: IGHV3-7*02 with a 3’-end that is: GCGAGAGA (base 318: A)
Database 2: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 3: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 4: IGHV3-7*02 with a 3’-end that is: GCGAGG (base 318: G)
Database 5: IGHV3-7*02 with a 3’-end that is: GCGAGA (base 318: A)
Data presented herein are for inference using database 3.
In the Figure and additional data tab (in Supplementary info), error histogram for IGHV3-47D*01_S3103 and IGHV3-7*02/IGHV3-7*02_A318G and figures of the nucleotide composition for position 315-321 of sequences inferred to IGHV3-7*02/IGHV3-7*02_A318G, are attached. For database 3, a figure of the nucleotide composition for sequences inffered to IGHV3-7*01 is also attached.”