This directory contains BED files of merged Transcription Start Sites (TSSs) generated within the GENCODE Capture Long-Seq project using PacBio HCGM reads, and split by known/novel status w.r.t GENCODE 20 (human) GENCODE M3 (mouse) . These were produced by comparing *All_Cap1_all_TSSs.clusters.bed files listed in ../ (parent directory) to GENCODE TSSs using the following command: $ cat | sortbed | bedtools closest -s -t first -D b -a stdin -b | awk '$7!="."' | awk '$NF>=-50 && $NF<=50' | cut -f1-6 > .known.bed $ cat | sortbed | bedtools closest -s -t first -D b -a stdin -b | awk '$7!="."' | awk '$NF<-50 || $NF>50' | cut -f1-6 > .novel.bed All files correspond to genome assemblies hg38 and mm10. # File naming scheme: All_Cap1__TSSs.clusters..bed where: species: "mm": mouse "hs": human tissue: "all": all TSSs merged across all tissues. gencode_status: "known": CLS TSSs <= 50 bases away from a GENCODE TSS on the same genomic strand "novel": CLS TSSs > 50 bases away from a GENCODE TSS on the same genomic strand # BED file format (BED6): There is one read per BED record. column 1: chromosome column 2: chromosome start of merged TSS column 3: chromosome end of merged TSS column 4: comma-separated list of read identifiers contributing to the TSS column 5: number of reads contributing to the TSS column 6: genomic strand of the TSS