Submission Details

Submission IDS00003
Submission Date2018-04-16 00:00:00
Submission Statuscomplete
SubmitterLinnea Thörnqvist
Submitter AddressDepartment of Immunotechnology, Lund University, Medicon Village (building 406), S-223 81 Lund, Sweden
SpeciesHomo sapiens
EthnicityUN

Inferred Sequences

The inferred novel alleles from each genotype that are submitted for review. This table lists all inferences put forward by the submitter. Where IARC has affirmed a sequence based on an inference, the corresponding sequence record will be listed in the Published column. Inferences for which no published sequence is shown have not been affirmed.

No Items

Genotypes

Each genotype that has been inferred, along with the descriptive name of the inference tool and settings that were used.

Genotype NameSubject IDLocusSequence TypeGenotype FilenameTool/Setting Name
 Genotype - with Database 3IBIGHVgenotype_database_3.csvIgDiscover with Database 3

Acknowledgements

Individuals who should be acknowledged as contributing to the inferences listed in this submission.

NameInstitutionORCID ID
Mats OhlinDepartment of Immunotechnology, Lund University, Lund, Sweden0000-0002-5105-1938
Christian BusseDivision of B Cell Immunology, German Cancer Research Center (DKFZ), Heidelberg, Germany

Repertoire Details

Details of the repertoire from which the inferences are based. This corresponds, for example, to an NIH Project or an ENA study.

RepositoryNCBI SRA
Accession NumberPRJNA349143
Project/Study TitleDynamics of the human antibody repertoire after influenza vaccination
Dataset URLhttps://www.ncbi.nlm.nih.gov/bioproject/PRJNA349143
MiAIRR Compliant?No
MiAIRR URL
Sequencing PlatformIllumina
Read Length325 (read 1) + 300 (read 2)
Primers Overlapping?No

Repertoire Publications

Publications associated with this study.

PubMed IDTitleAuthors
24639495High-resolution antibody dynamics of vaccine-induced immune responses.Laserson U, Vigneault F, Gadala-Maria D, Yaari G, Uduman M, Vander Heiden JA, Kelton W, Taek Jung S, Liu Y, Laserson J, Chari R, Lee JH, Bachelet I, Hickey B, Lieberman-Aiden E, Hanczaruk B, Simen BB, Egholm M, Koller D, Georgiou G, Kleinstein SH, Church GM
30172112Critical steps for computational inference of the 3'-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7.Thörnqvist L, Ohlin M

Sequences of the PCR primers used in the study.

Constant region primers

Only the IgM-transcriptome was analyzed

Primer NamePrimer Sequence
Human-IGHMGAATTCTCACAGGAGACGAGG
Human-IGHDTGTCTGCACCCTGATATGATGG
Human-IGHAGGGTGCTGYMGAGGCTCAG
Human-IGHETTGCAGCAGCGGGTCAAGG
Human-IGHGCCAGGGGGAAGACSGATG
Human-IGKGACAGATGGTGCAGCCACAG
Human-IGLAGGGYGGGAACAGAGTGAC

5'-template switching primers

Primer NamePrimer Sequence
TS-shift0TACGGG
TS-shift1ATACGGG
TS-shift2TCTACGGG
TS-shift3CGATACGGG
TS-shift4GATCTACGGG

Inference Tools and Settings

Details of the inference tools and settings used to infer novel alleles. Each combination of tool and setting is listed here, and provided with a descriptive name.

Tool/Settings NameTool NameTool Version
 IgDiscover with Database 3IgDiscover0.9

Notes

Sample Preparation and Sequencing

“The blood samples collected in the influenza vaccination study by Laserson et al (PMID:24639495) were re-sequenced using the Illumina MiSeq platform. RNA was reverse-transcribed into cDNA using a biotinylated oligo dT primer. An adaptor sequence was added to the 3’ end of all cDNA, which contains the Illumina P7 universal priming site and a 17-nucleotide unique molecular identifier (UMI). Products were purified using streptavidin-coated magnetic beads followed by a primary PCR reaction using a pool of primers targeting the IGHA, IGHD, IGHE, IGHG, IGHM, IGKC and IGLC regions, as well as a sample-indexed Illumina P7C7 primer. The immunoglobulin-specific primers contained tails corresponding to the Illumina P5 sequence. PCR products were then purified using AMPure XP beads. A secondary PCR was then performed to add the Illumina C5 clustering sequence to the end of the molecule containing the constant region. The number of secondary PCR cycles was tailored to each sample to avoid entering plateau phase, as judged by a prior quantitative PCR analysis. Final products were purified, quantified with Agilent Tapestation and pooled in equimolar proportions, followed by high-throughput paired-end sequencing on the Illumina MiSeq platform. For sequencing, the Illumina 600 cycle kit was used with the modifications that 325 cycles was used for read 1, 6 cycles for the index reads, 300 cycles for read 2 and a 10%PhiX spike-in to increase sequence diversity.” (As described by the original submitter of the sequencing data, PRJNA349143)

Data pre-processing Pipeline

The following pRESTO pipeline was used:
1. Filtering out all sequences of low quality (q<20) (FilterSeq.py quality)
2. Pairing of forward and reverse reads. (PairSeq.py)
3. Assembly of forward and reverse reads into single sequences (AssemblePairs.py align)

Inference

IgDiscover has been run on IgM-encoding sequences of a single donor multiple times, but with different starting databases:

Database 1: IMGT database but with IGHV3-7*02 extended by two bases (GA), i.e. with a 3’-end that reads GCGAGAGA
Database 2: IMGT database but with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGGGA
Database 3: IMGT database but with IGHV3-7*02 extended by two bases (GA), and with the IGHV3-7*02 with a 3’-end that reads GCGAGGGA, (i.e. two variants of IGHV3-7*02)
Database 4: IMGT database as it is, and in addition with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGG (i.e. two variants of IGHV3-7*02)
Database 5: IMGT database as it is (i.e. with IGHV3-7*02 that ends as GCGAGA)

In all cases IGHV3-7*01 is inferred with an ending that reads: GCGAGAGA

For IGHV3-7*02, the following is inferred when each of the databases are used:

Database 1: IGHV3-7*02 with a 3’-end that is: GCGAGAGA (base 318: A)
Database 2: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 3: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 4: IGHV3-7*02 with a 3’-end that is: GCGAGG (base 318: G)
Database 5: IGHV3-7*02 with a 3’-end that is: GCGAGA (base 318: A)

Data presented herein are for inference using database 3.

In the Figure and additional data tab (in Supplementary info), error histogram for IGHV3-47D*01_S3103 and IGHV3-7*02/IGHV3-7*02_A318G and figures of the nucleotide composition for position 315-321 of sequences inferred to IGHV3-7*02/IGHV3-7*02_A318G, are attached. For database 3, a figure of the nucleotide composition for sequences inffered to IGHV3-7*01 is also attached.”

Supplementary Files
 iarc_submission_supplementary_info.xlsx