Submission ID | S00003 |
Submission Date | 2018-04-16 00:00:00 |
Submission Status | complete |
Submitter | Linnea Thörnqvist |
Submitter Address | Department of Immunotechnology, Lund University, Medicon Village (building 406), S-223 81 Lund, Sweden |
Species | Homo sapiens |
Ethnicity | UN |
The inferred novel alleles from each genotype that are submitted for review. This table lists all inferences put forward by the submitter. Where IARC has affirmed a sequence based on an inference, the corresponding sequence record will be listed in the Published column. Inferences for which no published sequence is shown have not been affirmed.
Each genotype that has been inferred, along with the descriptive name of the inference tool and settings that were used.
Genotype Name | Subject ID | Locus | Sequence Type | Genotype Filename | Tool/Setting Name | |
---|---|---|---|---|---|---|
Genotype - with Database 3 | IB | IGH | V | genotype_database_3.csv | IgDiscover with Database 3 |
Individuals who should be acknowledged as contributing to the inferences listed in this submission.
Name | Institution | ORCID ID |
---|---|---|
Mats Ohlin | Department of Immunotechnology, Lund University, Lund, Sweden | 0000-0002-5105-1938 |
Christian Busse | Division of B Cell Immunology, German Cancer Research Center (DKFZ), Heidelberg, Germany |
Details of the repertoire from which the inferences are based. This corresponds, for example, to an NIH Project or an ENA study.
Repository | NCBI SRA |
Accession Number | PRJNA349143 |
Project/Study Title | Dynamics of the human antibody repertoire after influenza vaccination |
Dataset URL | https://www.ncbi.nlm.nih.gov/bioproject/PRJNA349143 |
MiAIRR Compliant? | No |
MiAIRR URL | |
Sequencing Platform | Illumina |
Read Length | 325 (read 1) + 300 (read 2) |
Primers Overlapping? | No |
Publications associated with this study.
PubMed ID | Title | Authors |
---|---|---|
24639495 | High-resolution antibody dynamics of vaccine-induced immune responses. | Laserson U, Vigneault F, Gadala-Maria D, Yaari G, Uduman M, Vander Heiden JA, Kelton W, Taek Jung S, Liu Y, Laserson J, Chari R, Lee JH, Bachelet I, Hickey B, Lieberman-Aiden E, Hanczaruk B, Simen BB, Egholm M, Koller D, Georgiou G, Kleinstein SH, Church GM |
30172112 | Critical steps for computational inference of the 3'-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7. | Thörnqvist L, Ohlin M |
Sequences of the PCR primers used in the study.
Only the IgM-transcriptome was analyzed
Primer Name | Primer Sequence |
---|---|
Human-IGHM | GAATTCTCACAGGAGACGAGG |
Human-IGHD | TGTCTGCACCCTGATATGATGG |
Human-IGHA | GGGTGCTGYMGAGGCTCAG |
Human-IGHE | TTGCAGCAGCGGGTCAAGG |
Human-IGHG | CCAGGGGGAAGACSGATG |
Human-IGK | GACAGATGGTGCAGCCACAG |
Human-IGL | AGGGYGGGAACAGAGTGAC |
Primer Name | Primer Sequence |
---|---|
TS-shift0 | TACGGG |
TS-shift1 | ATACGGG |
TS-shift2 | TCTACGGG |
TS-shift3 | CGATACGGG |
TS-shift4 | GATCTACGGG |
Details of the inference tools and settings used to infer novel alleles. Each combination of tool and setting is listed here, and provided with a descriptive name.
Tool/Settings Name | Tool Name | Tool Version | |
---|---|---|---|
IgDiscover with Database 3 | IgDiscover | 0.9 |
Sample Preparation and Sequencing
“The blood samples collected in the influenza vaccination study by Laserson et al (PMID:24639495) were re-sequenced using the Illumina MiSeq platform. RNA was reverse-transcribed into cDNA using a biotinylated oligo dT primer. An adaptor sequence was added to the 3’ end of all cDNA, which contains the Illumina P7 universal priming site and a 17-nucleotide unique molecular identifier (UMI). Products were purified using streptavidin-coated magnetic beads followed by a primary PCR reaction using a pool of primers targeting the IGHA, IGHD, IGHE, IGHG, IGHM, IGKC and IGLC regions, as well as a sample-indexed Illumina P7C7 primer. The immunoglobulin-specific primers contained tails corresponding to the Illumina P5 sequence. PCR products were then purified using AMPure XP beads. A secondary PCR was then performed to add the Illumina C5 clustering sequence to the end of the molecule containing the constant region. The number of secondary PCR cycles was tailored to each sample to avoid entering plateau phase, as judged by a prior quantitative PCR analysis. Final products were purified, quantified with Agilent Tapestation and pooled in equimolar proportions, followed by high-throughput paired-end sequencing on the Illumina MiSeq platform. For sequencing, the Illumina 600 cycle kit was used with the modifications that 325 cycles was used for read 1, 6 cycles for the index reads, 300 cycles for read 2 and a 10%PhiX spike-in to increase sequence diversity.” (As described by the original submitter of the sequencing data, PRJNA349143)
Data pre-processing Pipeline
The following pRESTO pipeline was used:
1. Filtering out all sequences of low quality (q<20) (FilterSeq.py quality)
2. Pairing of forward and reverse reads. (PairSeq.py)
3. Assembly of forward and reverse reads into single sequences (AssemblePairs.py align)
Inference
IgDiscover has been run on IgM-encoding sequences of a single donor multiple times, but with different starting databases:
Database 1: IMGT database but with IGHV3-7*02 extended by two bases (GA), i.e. with a 3’-end that reads GCGAGAGA
Database 2: IMGT database but with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGGGA
Database 3: IMGT database but with IGHV3-7*02 extended by two bases (GA), and with the IGHV3-7*02 with a 3’-end that reads GCGAGGGA, (i.e. two variants of IGHV3-7*02)
Database 4: IMGT database as it is, and in addition with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGG (i.e. two variants of IGHV3-7*02)
Database 5: IMGT database as it is (i.e. with IGHV3-7*02 that ends as GCGAGA)
In all cases IGHV3-7*01 is inferred with an ending that reads: GCGAGAGA
For IGHV3-7*02, the following is inferred when each of the databases are used:
Database 1: IGHV3-7*02 with a 3’-end that is: GCGAGAGA (base 318: A)
Database 2: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 3: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 4: IGHV3-7*02 with a 3’-end that is: GCGAGG (base 318: G)
Database 5: IGHV3-7*02 with a 3’-end that is: GCGAGA (base 318: A)
Data presented herein are for inference using database 3.
In the Figure and additional data tab (in Supplementary info), error histogram for IGHV3-47D*01_S3103 and IGHV3-7*02/IGHV3-7*02_A318G and figures of the nucleotide composition for position 315-321 of sequences inferred to IGHV3-7*02/IGHV3-7*02_A318G, are attached. For database 3, a figure of the nucleotide composition for sequences inffered to IGHV3-7*01 is also attached.”