Welcome to HLA-HD!

Home / Welcome to HLA-HD!

What is HLA-HD?

HLA-HD (HLA typing from High-quality Dictionary) can accurately determine HLA alleles with 6-digit precision from NGS data (fastq format). RNA-Seq data can also be applied.
HLA-HD is freely available for academic use and research purposes upon registration.

News

June 25, 2018 : Version 1.2.0 was released !!

Released versions

#Version 1.2.0.1 July 11, 2018
Correct bug that hlahd outputs incorrect positions to read.txt in some genes (DRB6,DRB8,DRB9).

#Version 1.2.0 June 25, 2018
Modify default dictionary to type HLA-DRB5 and add some genes to HLA_gene.split.txt
(HLA-DPA2, –T, –W, –Y were added to HLA_gene.split.
3.32.0.txt for current release, see Ruuning)

#Version 1.1.0.1 November 15, 2017
Modify to adapt the reference data of IPD-IMGT/HLA after the release 3.30.0.

#Version 1.1.0 October 02, 2017
The database update feature was implemented (see section Updating the HLA dictionary).

#Version 1.0.0 April 27, 2017

Download

Download request

Installation

HLA-HD requires bowtie2 to map NGS reads.
Please install bowtie2 on your computer and set path to your environment variables.
For example, if you are using bash, add to your .bashrc the following command.
export PATH=$PATH:/path_to_bowtie2

Uncompress the downloaded tar.gz file by
> tar -zxvf hlahd.version.tar.gz
Then, move to the uncompressed directory and type
> sh install.sh
For the installation, the g++ compiler by the GNU Compiler Collection must be installed on your computer.

After the installation, add the current directory to your PATH.
export PATH=$PATH:/path_to_HLA-HD_install_directory/bin

Updating the HLA dictionary (after v.1.1.0)

You can update the HLA allele dictionary to the current release of the IPD-IMGT/HLA database by the command,
> sh update.dictionary.sh
Wget is required for the database update.

You can also use any release by getting hla.dat file from the github site.
Put hla.dat file on parent dicretory of hlahd and executing the update.dictionary.sh by deleting the line of the first wget command.

The latest release can adopt the newest rare alleles.
In contrast, the old release tend to yield conservative result.

Default dictionary of the installation is created from release 3.15.0.

 

Running

Before running the HLA-HD, check the value of open files on your computer by typing:
> ulimit -Sa
If open files are less than 1024, please type:
> ulimit -n 1024
or change /etc/security/limits.conf according to your system environment.

If you have fastq.gz file, unzip gz file in advance. 

You can run the HLA-HD by typing the following commands:
> hlahd.sh -t [thread_num] -m [minimum length of reads] -c [trimming rate] -f [path_to freq_data directory] fastq_1 fastq_2 gene_split_filt path_to_dictionary_directory IDNAME[any name] output_directory

For example:
> hlahd.sh -t 2 -m 100 -c 0.95 -f freq_data/ data/sample_1.fastq data/sample_2.fastq HLA_gene.split.txt dictionary/ sampleID estimation

If you want to type HLA-DPA2, –T, –W, –Y, replace HLA_gene.split.txt to HLA_gene.split.3.32.0.txt and update the dictionary to current release. (after v.1.2.0)

Options

-m : A read whose length is shorter than this parameter is ignored. Default size is 100.

-t : Number of cores used to execute the program.

-c : Trimming option. If a match sequence is not found in the dictionary, trim the read until some sequence is matched to or reaches this ratio. Default is 1.0.

-f : Use information of allele frequencies. The default data exist in the installed directory (/hlahd.version/freq_data).

Tips

Usage of multiple fastq files
HLA-HD can not adopt to multiple fastq, so merge them in advance.
>cat sample.1_1.fastq sample.1_2.fastq > sample_1.fastq
>cat sample.1_2.fastq sample.2_2.fastq > sample_2.fastq

Using  bam files mapped to human genome
If you have mapped result to human genome, you can create fastq of mhc region and unmapped reads by using samtools and picard tools as follows:
#Extract MHC region
:for GRCh38.p12
>samtools view -h -b sample.hgmap.sorted.bam chr6:28,510,120-33,480,577 > sample.mhc.bam
:for GRCh37
>samtools view -h -b sample.hgmap.sorted.bam chr6:28,477,797-33,448,354 > sample.mhc.bam
#Extract unmap reads
>samtools view -b -f 4 sample.sorted.bam > sample.unmap.bam
#Merge bam files
>samtools merge -f sample.unmap.bam sample.mhc.bam > sample.merge.bam
#Create fastq
>java -jar picard.jar SamToFastq I=sample.merge.bam F=sample.hlatmp.1.fastq F2=sample.hlatmp.2.fastq
#Change fastq ID
>cat sample.hlatmp.1.fastq |awk '{if(NR%4 == 1){O=$0;gsub("/1"," 1",O);print O}else{print $0}}' > sample.hla.1.fastq
>cat sample.hlatmp.2.fastq |awk '{if(NR%4 == 1){O=$0;gsub("/2"," 2",O);print O}else{print $0}}' > sample.hla.2.fastq

Filtering of reads (March 6, 2019)
For WES or WGS data, bowtie2 is rarely aborted because it requires vast computer resources. To avoid the problem, you can filter reads in advance as follows:
#Get full resolution (8-digit) hla sequence information
>wget ftp://ftp.ebi.ac.uk/pub/databases/ipd/imgt/hla/hla_gen.fasta
#Create bowtie2 index
>bowtie2-build hla_gen.fasta hla_gen
#Map fastq to hla sequence
>bowtie2 -x hla_gen -1 sample_1.fastq -2 sample_2.fastq -S sample.hlamap.sam
or
>bowtie2 -p number_of_cores -x hla_gen -1 sample_1.fastq -2 sample_2.fastq -S sample.hlamap.sam
#Extract mapped reads
>samtools view -h -F 4 sample.hlamap.sam > sample.mapped.sam
#Convert mapped sam to fastq
>java -jar picard.jar SamToFastq I=sample.mapped.sam F=sample.hlatmp.1.fastq F2=sample.hlatmp.2.fastq
#Change fastq ID
>cat sample.hlatmp.1.fastq |awk '{if(NR%4 == 1){O=$0;gsub("/1"," 1",O);print O}else{print $0}}' > sample.hla.1.fastq
>cat sample.hlatmp.2.fastq |awk '{if(NR%4 == 1){O=$0;gsub("/2"," 2",O);print O}else{print $0}}' > sample.hla.2.fastq
After the filtering, use sample.hla.1.fastq and sample.hla.2.fastq as new hlahd input.

Reference

Kawaguchi, S. et al. “Comprehensive HLA Typing from a Current Allele Database Using Next-Generation Sequencing Data”, Methods Mol Biol., 2018;1802:225-233, doi: 10.1007/978-1-4939-8546-3_16, 2018.

Kawaguchi, S. et al. “HLA-HD: An accurate HLA typing algorithm for next-generation sequencing data” Hum Mutat., Jul;38(7):788-797, doi: 10.1002/humu.23230, 2017.

Contact:

Shuji Kawaguchi: shuji@genome.med.kyoto-u.ac.jp