Details of FitHiChIP outputs

Summary HTML file

Successful execution of FitHiChIP pipeline generates an HTML file Summary_results_FitHiChIP.html within the output directory OutDir specified in the configuration file. This file lists the output files containing significant interactions from the FitHiChIP pipeline, according to the given input parameters.

Details of output files

Here we describe all the files and folders existing within the specified output directory OutDir.

User should additionally open this page Setting up configuration file to tally with the configuration parameters:

  1. Parameters.txt: file listing the parameters used.

  2. HiCPro_Matrix_BinSize"BINSIZE":

    Where "BINSIZE" is specified in the input parameter.

    This directory contains the interaction matrix files (named as Matrix_abs.bed and Matrix.matrix) generated from the input interaction files, provided that these matrix and interval files are not provided as input.

    • L_"l"_U"u": folder name according to the distance thresholds "l" and "u" specified in the configuration file.

      • *.cis.interactions.DistThr.bed: File containing CIS interactions within the distance range from "l" to "u".

  3. NormFeatures:

    Folder containing various statistical features computed for individual genomic intervals / bins (with respect to the specified "BINSIZE").

    • User should specifically look at the file *.AllBin_CompleteFeat.bed, which has the following format:

      • Columns 1 to 3: chromosome bin.

      • Column 4: Coverage of this bin (number of reads mapped onto it)

      • Column 5: if 1, this bin is a peak-bin (overlapping with the reference peaks, as provided in the configuration file); otherwise (0) non-peak bin.

      • Column 6: Bias value of a bin; either coverage or ICE bias, depending on the input options.

      • Columns 7 and 8: unused columns.

  4. FitHiChIP_"INTTYPE"_b"BINSIZE"_L"l"_U"u"

    Folder according to the specified foreground (type of interactions to be reported), bin size, and the distance thresholds.

    • INTTYPE is a string with one of the following values:

      • ALL2ALL (when the parameter IntType = 4)

      • Peak2Peak (parameter IntType = 1),

      • Peak2NonPeak (parameter IntType = 2), or

      • Peak2ALL (parameter IntType = 3).

    • Within this folder, a directory structure of the following name (pattern) is created:

      P2PBckgr_[UseP2PBackgrnd]/[BiasType]

    Where:

    • [UseP2PBackgrnd]: Value of the parameter UseP2PBackgrnd - 0 (loose background) or 1 (stringent background).

    • [BiasType]: A string, either "Coverage_Bias" or "ICE_Bias", according to the parameter BiasType.

    • Within this directory, following files/folders are important:

      • FitHiC_BiasCorr: Directory containing the significant interactions:

        • Bin_Info.log: Equal occupancy bins employed in FitHiChIP

        • configuration.txt: list of configuration parameters for calling significant interactions from FitHiChIP

        • *PREFIX*.interactions_FitHiC.bed: Lists all input interactions along with their significance (p-value and FDR) computed using FitHiChIP.

          • Value of PREFIX is provided in the configuration parameter.

          Note

          Important: For differential analysis of HiChIP loops (page Differential analysis of HiChIP loops), this file (containing all interactions along with FitHiChIP significance values) is to be provided as input.

        • *PREFIX*.interactions_FitHiC_Q"qval".bed:

          FitHiChIP generated set of significant interactions according to the q-value threshold (QVALUE) specified in input parameter.

          For example, if QVALUE=0.01 (default), the file name becomes *PREFIX*.interactions_FitHiC_Q0.01.bed.

          Note

          1. Important: This file is the set of significant interactions generated by FitHiChIP, without any merge filtering.

          2. For peak-to-all foreground (default output mode), if UseP2PBackgrnd = 0, then these loops refer to FitHiChIP(L) output.

          3. If UseP2PBackgrnd = 1, these loops refer to FitHiChIP(S) output.

          Note

          If user requires loops filtered by a FDR threshold different from the parameter QVALUE (say the custom threshold is 0.05 and the QVALUE is 0.01), user may employ the following awk script to generate the significant interactions.

          awk '$NF<0.05' *PREFIX*.interactions_FitHiC.bed > FitHiChIP_out_Q0.05.bed

        • Merge_Nearby_Interactions:

          Directory containing the output of merge filtering operation, if the parameter MergeInt = 1.

          • *PREFIX*.interactions_FitHiC_Q"qval"_MergeNearContacts.bed:

            FitHiChIP significant loops returned after the merge filtering operation.

            Note

            1. For peak-to-all foreground (default output mode), if UseP2PBackgrnd = 0, this set of loops refer to FitHiChIP(L+M) output.

            2. If UseP2PBackgrnd = 1, this set of loops refer to FitHiChIP(S+M) output.

Details of FitHiChIP significant interactions

As mentioned above, the file *PREFIX*.interactions_FitHiC.bed contains all the loops and their significance values.

The file *PREFIX*.interactions_FitHiC_Q"qval".bed contains the significant interactions with respect to the specified FDR threshold qval

Both of these files contain the following fields:

  1. Fields 1 to 6: interacting bins for a given interaction.

  2. Field 7: contact count.

  3. Fields 8 to 13: various properties of the first interacting bin. This include the HiChIP coverage, whether the bin overlaps with ChIP-seq peak (isPeak), bias (coverage or ICE depending on the bias parameter), mappability, GC content, and number of restriction sites.

    Note

    Currently, the values of mappability, GC content, and number of restriction sites are set as 0. We did not remove the fields though, to keep the same output format.

  4. Fields 14 to 19: respective properties for the second interacting bin.

  5. Field 20 (p): spline fit probability.

  6. Field 21 (exp_cc_Bias): expected contact count of the current interaction, after applying bias regression.

  7. Fields 22 and 23: probability computed from the "exp_cc_Bias" value, and binomial distribution based refined probability value.

  8. Field 24: final FitHiChiP p-value of the current interaction.

  9. Field 25: final FitHiChiP q-value of the current interaction.

Details of merge filtering output

As mentioned above, the file *PREFIX*.interactions_FitHiC_Q"qval"_MergeNearContacts.bed within the directory Merge_Nearby_Interactions contains the significant loops returned after merge filtering operation.

This file contains the following fields:

  1. First 6 fields are the interacting bins corresponding to the retained set of interactions (after merge filtering).

  2. Next 3 fields denote the contact count, p-value and q-value of the retained interactions.

  3. bin1_low and bin1_high denotes the lower and upper coordinates of the first set of interacting bins (x axis of the proposed connected component model). For example, if the set of bins (x-1, y-1), (x, y) and (x+1,y+1) are provided as inputs, bin1_low becomes (x-1) * BINSIZE while bin1_high becomes (x+1) * BINSIZE.

  4. bin2_low and bin2_high, denotes the lower and upper coordinates of the second set of interacting bins. Considering the previous example, bin2_low = (y-1) * BINSIZE and bin2_high = (y+1) * BINSIZE

  5. sumCC denotes the sum of contact counts of all the significant interactions applied to the connected component model, with respect to the current retained interaction. Considering the above example, sumCC would be equal to the contact counts of interactions (x-1, y-1), (x, y) and (x+1,y+1).

  6. StrongConn is between 0 and 1, and denotes the fraction of all possible interactions within the connected component which are significant. Considering the above example, the connected component enclosing the bins (x-1), x, and (x+1) at one end, and the bins (y-1), y, and (y+1) at the other end, can have at most 9 interactions, out of which 3 are significant. So, StrongConn is 0.33 in this case. Higher value of StrongConn indicates dense connected region.

Visualizing significant interactions in epigenome browsers

FitHiChIP produces multiple output files compatible to various epigenome browsers.

*PREFIX*.interactions_FitHiC_Q"qval"_WashU.bed.gz

File compatible to WashU epigenome browser.

To visualize this interaction as a browser track, user needs to perform the following steps:

  1. Open https://epigenomegateway.wustl.edu/browser/ in a browser.

  2. Select reference genome. For the test data, human (hg19) is the reference genome.

  3. select Tracks -> upload Local Track

  4. Choose track file type: longrange

  5. Choose track file: select both *_WashU.bed.gz and *_WashU.bed.gz.tbi files together (2 files).

  6. Close the track upload box (click at the (X) sign in the right corner)

  7. Now the track is displayed in the browser window. Right click on the track name, and select DISPLAY MODE as ARC

*PREFIX*.interactions_FitHiC_Q"qval"_UCSC.bed

FitHiChIP output file compatible to UCSC genome browser. The file is in bigBed format (or precisely BigInteract format).

*PREFIX*.interactions_FitHiC_Q"qval"_IGV.bed

FitHiChIP output file compatible to IGV genome browser.

Visualizing significant interactions (after merge filtering) in epigenome browsers

*PREFIX*.interactions_FitHiC_Q"qval"_MergeNearContacts_WashU.bed.gz

File compatible to WashU epigenome browser.

*PREFIX*.interactions_FitHiC_Q"qval"_MergeNearContacts_UCSC.bed

File compatible to UCSC genome browser.

*PREFIX*.interactions_FitHiC_Q"qval"_MergeNearContacts_IGV.bed

File compatible to IGV genome browser.