Submission Details


Submission ID	S00003
Submission Date	2018-04-16 00:00:00
Submission Status	complete
Submitter	Linnea Thörnqvist
Submitter Address	Department of Immunotechnology, Lund University, Medicon Village (building 406), S-223 81 Lund, Sweden
Species	Human
Ethnicity	UN

Inferred Sequences

The inferred novel alleles from each genotype that are submitted for review. This table lists all inferences put forward by the submitter. Where IARC has affirmed a sequence based on an inference, the corresponding sequence record will be listed in the Published column. Inferences for which no published sequence is shown have not been affirmed.

No Items

Genotypes

Each genotype that has been inferred, along with the descriptive name of the inference tool and settings that were used.

	Genotype Name	Subject ID	Locus	Sequence Type	Genotype Filename	Tool/Setting Name
	Genotype - with Database 3	IB	IGH	V	genotype_database_3.csv	IgDiscover with Database 3

Acknowledgements

Individuals who should be acknowledged as contributing to the inferences listed in this submission.

Name	Institution	ORCID ID
Mats Ohlin	Department of Immunotechnology, Lund University, Lund, Sweden	0000-0002-5105-1938
Christian Busse	Division of B Cell Immunology, German Cancer Research Center (DKFZ), Heidelberg, Germany

Repertoire Details

Details of the repertoire from which the inferences are based. This corresponds, for example, to an NIH Project or an ENA study.


Repository	NCBI SRA
Accession Number	PRJNA349143
Project/Study Title	Dynamics of the human antibody repertoire after influenza vaccination
Dataset URL	https://www.ncbi.nlm.nih.gov/bioproject/PRJNA349143
MiAIRR Compliant?	No
MiAIRR URL
Sequencing Platform	Illumina
Read Length	325 (read 1) + 300 (read 2)
Primers Overlapping?	No

Repertoire Publications

Publications associated with this study.

PubMed ID	Title	Authors
24639495	High-resolution antibody dynamics of vaccine-induced immune responses.	Laserson U, Vigneault F, Gadala-Maria D, Yaari G, Uduman M, Vander Heiden JA, Kelton W, Taek Jung S, Liu Y, Laserson J, Chari R, Lee JH, Bachelet I, Hickey B, Lieberman-Aiden E, Hanczaruk B, Simen BB, Egholm M, Koller D, Georgiou G, Kleinstein SH, Church GM
30172112	Critical steps for computational inference of the 3'-end of novel alleles of immunoglobulin heavy chain variable genes - illustrated by an allele of IGHV3-7.	Thörnqvist L, Ohlin M

Sequences of the PCR primers used in the study.

Constant region primers

Only the IgM-transcriptome was analyzed

Primer Name	Primer Sequence
Human-IGHM	GAATTCTCACAGGAGACGAGG
Human-IGHD	TGTCTGCACCCTGATATGATGG
Human-IGHA	GGGTGCTGYMGAGGCTCAG
Human-IGHE	TTGCAGCAGCGGGTCAAGG
Human-IGHG	CCAGGGGGAAGACSGATG
Human-IGK	GACAGATGGTGCAGCCACAG
Human-IGL	AGGGYGGGAACAGAGTGAC

5'-template switching primers

Primer Name	Primer Sequence
TS-shift0	TACGGG
TS-shift1	ATACGGG
TS-shift2	TCTACGGG
TS-shift3	CGATACGGG
TS-shift4	GATCTACGGG

Inference Tools and Settings

Details of the inference tools and settings used to infer novel alleles. Each combination of tool and setting is listed here, and provided with a descriptive name.

	Tool/Settings Name	Tool Name	Tool Version
	IgDiscover with Database 3	IgDiscover	0.9

Notes

Sample Preparation and Sequencing

“The blood samples collected in the influenza vaccination study by Laserson et al (PMID:24639495) were re-sequenced using the Illumina MiSeq platform. RNA was reverse-transcribed into cDNA using a biotinylated oligo dT primer. An adaptor sequence was added to the 3’ end of all cDNA, which contains the Illumina P7 universal priming site and a 17-nucleotide unique molecular identifier (UMI). Products were purified using streptavidin-coated magnetic beads followed by a primary PCR reaction using a pool of primers targeting the IGHA, IGHD, IGHE, IGHG, IGHM, IGKC and IGLC regions, as well as a sample-indexed Illumina P7C7 primer. The immunoglobulin-specific primers contained tails corresponding to the Illumina P5 sequence. PCR products were then purified using AMPure XP beads. A secondary PCR was then performed to add the Illumina C5 clustering sequence to the end of the molecule containing the constant region. The number of secondary PCR cycles was tailored to each sample to avoid entering plateau phase, as judged by a prior quantitative PCR analysis. Final products were purified, quantified with Agilent Tapestation and pooled in equimolar proportions, followed by high-throughput paired-end sequencing on the Illumina MiSeq platform. For sequencing, the Illumina 600 cycle kit was used with the modifications that 325 cycles was used for read 1, 6 cycles for the index reads, 300 cycles for read 2 and a 10%PhiX spike-in to increase sequence diversity.” (As described by the original submitter of the sequencing data, PRJNA349143)

Data pre-processing Pipeline

The following pRESTO pipeline was used:
1. Filtering out all sequences of low quality (q<20) (FilterSeq.py quality)
2. Pairing of forward and reverse reads. (PairSeq.py)
3. Assembly of forward and reverse reads into single sequences (AssemblePairs.py align)

Inference

IgDiscover has been run on IgM-encoding sequences of a single donor multiple times, but with different starting databases:

Database 1: IMGT database but with IGHV3-7*02 extended by two bases (GA), i.e. with a 3’-end that reads GCGAGAGA
Database 2: IMGT database but with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGGGA
Database 3: IMGT database but with IGHV3-7*02 extended by two bases (GA), and with the IGHV3-7*02 with a 3’-end that reads GCGAGGGA, (i.e. two variants of IGHV3-7*02)
Database 4: IMGT database as it is, and in addition with a variant of IGHV3-7*02 with a 3’-end that reads GCGAGG (i.e. two variants of IGHV3-7*02)
Database 5: IMGT database as it is (i.e. with IGHV3-7*02 that ends as GCGAGA)

In all cases IGHV3-7*01 is inferred with an ending that reads: GCGAGAGA

For IGHV3-7*02, the following is inferred when each of the databases are used:

Database 1: IGHV3-7*02 with a 3’-end that is: GCGAGAGA (base 318: A)
Database 2: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 3: IGHV3-7*02 with a 3’-end that is: GCGAGGGA (base 318: G)
Database 4: IGHV3-7*02 with a 3’-end that is: GCGAGG (base 318: G)
Database 5: IGHV3-7*02 with a 3’-end that is: GCGAGA (base 318: A)

Data presented herein are for inference using database 3.

In the Figure and additional data tab (in Supplementary info), error histogram for IGHV3-47D*01_S3103 and IGHV3-7*02/IGHV3-7*02_A318G and figures of the nucleotide composition for position 315-321 of sequences inferred to IGHV3-7*02/IGHV3-7*02_A318G, are attached. For database 3, a figure of the nucleotide composition for sequences inffered to IGHV3-7*01 is also attached.”

	Attachment File Name
	iarc_submission_supplementary_info.xlsx