LEHA-CVQAD is the novel large dataset providing a diverse collection of 6000+ compressed video streams generated with 186 modern codecs and encoding presets. The dataset is featuring a variety of real‑world content and backed by crowdsourced subjective quality scores from Subjectify.us, it offers a reliable foundation for evaluating and advancing video‑quality assessment methods.
LEHA-CVQAD is subjective dataset with:
To gather source videos we parse high‐quality, openly licensed mostly FullHD clips from Vimeo, media.xiph, and YouTubeUGC. Then we clasterized collected 25,562 videos in terms of SI/TI to sample 60 of them. Sampled videos were first transcoded to a uniform YUV 4:2:0 format. Each reference video was then encoded with a suite of modern codecs (AVC/H.264, HEVC/H.265, AV1, VVC/H.266, VP9, etc.) using multiple presets and bitrate levels, yielding a broad spectrum of compressed streams for subjective quality evaluation.
To obtain reliable subjective scores for each video in LEHA-CVQAD, we employed two consecutive subjective studies. During the first one, we performed pairwise comparisons for each reference video and then applied ELO and Bradley-Terry models. This way, obtained pairwise scores do not consider content of the video, which could potentially distort the resulting scores, but apper to be highly accurate. For the second subjective study, we sampled three videos from each group to assess their MOS values. After that we use MAP to merge these two types of subjective score, projecting them onto single scale. More detailes can be seen in the paper.
BibTex Code Will Be Here