video super-resolution¶
Summary¶
Number of checkpoints: 27
Number of configs: 29
Number of papers: 6
ALGORITHM: 7
RealBasicVSR (CVPR’2022)¶
Task: Video Super-Resolution
Abstract¶
The diversity and complexity of degradations in real-world video super-resolution (VSR) pose non-trivial challenges in inference and training. First, while long-term propagation leads to improved performance in cases of mild degradations, severe in-the-wild degradations could be exaggerated through propagation, impairing output quality. To balance the tradeoff between detail synthesis and artifact suppression, we found an image pre-cleaning stage indispensable to reduce noises and artifacts prior to propagation. Equipped with a carefully designed cleaning module, our RealBasicVSR outperforms existing methods in both quality and efficiency. Second, real-world VSR models are often trained with diverse degradations to improve generalizability, requiring increased batch size to produce a stable gradient. Inevitably, the increased computational burden results in various problems, including 1) speed-performance tradeoff and 2) batch-length tradeoff. To alleviate the first tradeoff, we propose a stochastic degradation scheme that reduces up to 40% of training time without sacrificing performance. We then analyze different training settings and suggest that employing longer sequences rather than larger batches during training allows more effective uses of temporal information, leading to more stable performance during inference. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking. Code, models, and the dataset will be made publicly available.

Results and models¶
Evaluated on Y channel. The code for computing NRQM, NIQE, and PI can be found here. MATLAB official code is used to compute BRISQUE.
Model | Dataset | NRQM (Y) | NIQE (Y) | PI (Y) | BRISQUE (Y) | Training Resources | Download |
---|---|---|---|---|---|---|---|
realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds | REDS | 6.0477 | 3.7662 | 3.8593 | 29.030 | 8 (Tesla V100-SXM2-32GB) | model/log |
realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds | REDS | - | - | - | - | 8 (Tesla V100-SXM2-32GB) | model/log |
Quick Start¶
Train
Train Instructions
You can use the following commands to train a model with cpu or single/multiple GPUs.
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py
## single-gpu train
python tools/train.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py
## multi-gpu train
./tools/dist_train.sh configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py 8
For more details, you can refer to Train a model part in train_test.md.
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth
## single-gpu test
python tools/test.py python tools/test.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth
## multi-gpu test
./tools/dist_test.sh configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
Citation¶
@InProceedings{chan2022investigating,
author = {Chan, Kelvin C.K. and Zhou, Shangchen and Xu, Xiangyu and Loy, Chen Change},
title = {RealBasicVSR: Investigating Tradeoffs in Real-World Video Super-Resolution},
booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
year = {2022}
}
BasicVSR++ (CVPR’2022)¶
Task: Video Super-Resolution
Abstract¶
A recurrent structure is a popular framework choice for the task of video super-resolution. The state-of-the-art method BasicVSR adopts bidirectional propagation with feature alignment to effectively exploit information from the entire input video. In this study, we redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment. We show that by empowering the recurrent framework with the enhanced propagation and alignment, one can exploit spatiotemporal information across misaligned video frames more effectively. The new components lead to an improved performance under a similar computational constraint. In particular, our model BasicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with similar number of parameters. In addition to video super-resolution, BasicVSR++ generalizes well to other video restoration tasks such as compressed video enhancement. In NTIRE 2021, BasicVSR++ obtains three champions and one runner-up in the Video Super-Resolution and Compressed Video Enhancement Challenges. Codes and models will be released to MMagic.

Results and models¶
The pretrained weights of SPyNet can be found here.
Model | Dataset | PSNR (RGB) | SSIM (RGB) | PSNR (Y) | SSIM (Y) | Training Resources | Download |
---|---|---|---|---|---|---|---|
basicvsr_plusplus_c64n7_8x1_600k_reds4 | REDS4 (BIx4) | 32.3855 | 0.9069 | - | - | 8 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_8x1_600k_reds4 | UDM10 (BDx4) | - | - | 34.6868 | 0.9417 | 8 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_8x1_600k_reds4 | Vid4 (BIx4) | - | - | 27.7674 | 0.8444 | 8 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_8x1_600k_reds4 | Vid4 (BDx4) | - | - | 24.6209 | 0.7540 | 8 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_8x1_600k_reds4 | Vimeo-90K-T (BIx4) | - | - | 36.4445 | 0.9411 | 8 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_8x1_600k_reds4 | Vimeo-90K-T (BDx4) | - | - | 34.0372 | 0.9244 | 8 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi | REDS4 (BIx4) | 31.0126 | 0.8804 | - | - | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi | UDM10 (BDx4) | - | - | 33.1211 | 0.9270 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi | Vid4 (BIx4) | - | - | 27.7882 | 0.8401 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi | Vid4 (BDx4) | - | - | 23.6086 | 0.7033 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi | Vimeo-90K-T (BIx4) | - | - | 37.7864 | 0.9500 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi | Vimeo-90K-T (BDx4) | - | - | 33.8972 | 0.9195 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd | REDS4 (BIx4) | 29.2041 | 0.8528 | - | - | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd | UDM10 (BDx4) | - | - | 40.7216 | 0.9722 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd | Vid4 (BIx4) | - | - | 26.4377 | 0.8074 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd | Vid4 (BDx4) | - | - | 29.0400 | 0.8753 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd | Vimeo-90K-T (BIx4) | - | - | 34.7248 | 0.9351 | 4 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd | Vimeo-90K-T (BDx4) | - | - | 38.2054 | 0.9550 | 4 (Tesla V100-PCIE-32GB) | model | log |
NTIRE 2021 checkpoints
Note that the following models are finetuned from smaller models. The training schemes of these models will be released when MMagic reaches 5k stars. We provide the pre-trained models here.
Model | Dataset | Download |
---|---|---|
basicvsr-pp_c128n25_600k_ntire-vsr | NTIRE 2021 Video Super-Resolution - Track 1 | model |
basicvsr-pp_c128n25_600k_ntire-decompress-track1 | NTIRE 2021 Quality Enhancement of Compressed Video - Track 1 | model |
basicvsr-pp_c128n25_600k_ntire-decompress-track2 | NTIRE 2021 Quality Enhancement of Compressed Video - Track 2 | model |
basicvsr-pp_c128n25_600k_ntire-decompress-track3 | NTIRE 2021 Quality Enhancement of Compressed Video - Track 3 | model |
Quick Start¶
Train
Train Instructions
You can use the following commands to train a model with cpu or single/multiple GPUs.
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py
## single-gpu train
python tools/train.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py
## multi-gpu train
./tools/dist_train.sh configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py 8
For more details, you can refer to Train a model part in train_test.md.
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr_plusplus/basicvsr_plusplus_c64n7_8x1_600k_reds4_20210217-db622b2f.pth
## single-gpu test
python tools/test.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr_plusplus/basicvsr_plusplus_c64n7_8x1_600k_reds4_20210217-db622b2f.pth
## multi-gpu test
./tools/dist_test.sh configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr_plusplus/basicvsr_plusplus_c64n7_8x1_600k_reds4_20210217-db622b2f.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
Citation¶
@InProceedings{chan2022basicvsrplusplus,
author = {Chan, Kelvin C.K. and Zhou, Shangchen and Xu, Xiangyu and Loy, Chen Change},
title = {BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment},
booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
year = {2022}
}
BasicVSR (CVPR’2021)¶
Task: Video Super-Resolution
Abstract¶
Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. In this study, we wish to untangle the knots and reconsider some most essential components for VSR guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting an information-refill mechanism and a coupled propagation scheme to facilitate information aggregation. The BasicVSR and its extension, IconVSR, can serve as strong baselines for future VSR approaches.

Results and models¶
Evaluated on RGB channels for REDS4 and Y channel for others. The metrics are PSNR
/ SSIM
.
The pretrained weights of SPyNet can be found here.
Model | Dataset | PSNR (RGB) | SSIM (RGB) | PSNR (Y) | SSIM (Y) | Training Resources | Download |
---|---|---|---|---|---|---|---|
basicvsr_reds4 | REDS4 (BIx4) | 31.4170 | 0.8909 | - | - | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_reds4 | UDM10 (BDx4) | - | - | 33.4478 | 0.9306 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_reds4 | Vimeo-90K-T (BIx4) | - | - | 36.2848 | 0.9395 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_reds4 | Vimeo-90K-T (BDx4) | - | - | 34.4700 | 0.9286 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_reds4 | Vid4 (BIx4) | - | - | 27.2694 | 0.8318 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_reds4 | Vid4 (BDx4) | - | - | 24.4541 | 0.7455 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bi | REDS4 (BIx4) | 30.3128 | 0.8660 | - | - | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bi | UDM10 (BDx4) | - | - | 34.5554 | 0.9451 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bi | Vimeo-90K-T (BIx4) | - | - | 37.2026 | 0.9434 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bi | Vimeo-90K-T (BDx4) | - | - | 34.8097 | 0.9316 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bi | Vid4 (BIx4) | - | - | 27.2755 | 0.8248 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bi | Vid4 (BDx4) | - | - | 25.0517 | 0.7636 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bd | REDS4 (BIx4) | 29.0376 | 0.8481 | - | - | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bd | UDM10 (BDx4) | - | - | 39.9953 | 0.9695 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bd | Vimeo-90K-T (BIx4) | - | - | 34.6427 | 0.9335 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bd | Vimeo-90K-T (BDx4) | - | - | 37.5501 | 0.9499 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bd | Vid4 (BIx4) | - | - | 26.2708 | 0.8022 | 2 (Tesla V100-PCIE-32GB) | model | log |
basicvsr_vimeo90k_bd | Vid4 (BDx4) | - | - | 27.9791 | 0.8556 | 2 (Tesla V100-PCIE-32GB) | model | log |
Quick Start¶
Train
Train Instructions
You can use the following commands to train a model with cpu or single/multiple GPUs.
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/basicvsr/basicvsr_2xb4_reds4.py
## single-gpu train
python tools/train.py configs/basicvsr/basicvsr_2xb4_reds4.py
## multi-gpu train
./tools/dist_train.sh configs/basicvsr/basicvsr_2xb4_reds4.py 8
For more details, you can refer to Train a model part in train_test.md.
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/basicvsr/basicvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr/basicvsr_reds4_20120409-0e599677.pth
## single-gpu test
python tools/test.py configs/basicvsr/basicvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr/basicvsr_reds4_20120409-0e599677.pth
## multi-gpu test
./tools/dist_test.sh configs/basicvsr/basicvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr/basicvsr_reds4_20120409-0e599677.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
Citation¶
@InProceedings{chan2021basicvsr,
author = {Chan, Kelvin CK and Wang, Xintao and Yu, Ke and Dong, Chao and Loy, Chen Change},
title = {BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond},
booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
year = {2021}
}
IconVSR (CVPR’2021)¶
Task: Video Super-Resolution
Abstract¶
Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. In this study, we wish to untangle the knots and reconsider some most essential components for VSR guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting an information-refill mechanism and a coupled propagation scheme to facilitate information aggregation. The BasicVSR and its extension, IconVSR, can serve as strong baselines for future VSR approaches.

Results and models¶
Evaluated on RGB channels for REDS4 and Y channel for others. The metrics are PSNR
/ SSIM
.
The pretrained weights of the IconVSR components can be found here: SPyNet, EDVR-M for REDS, and EDVR-M for Vimeo-90K.
| Model | Dataset | PSNR (RGB) | SSIM (RGB) | PSNR (Y) | SSIM (Y) | Training Resources | Download | | :—————: | :———————: | :————————-: | :——————: | :——————-: | :————————-: | :——————: | :———————-: | :——————-: | | iconvsr_reds4 | REDS4 (BIx4) | 31.6926 | 0.8951 |-|-| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | UDM10 (BDx4) | -|-|35.3377| 0.9471| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vid4 (BIx4) | -|-|27.4809| 0.8354 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vid4 (BDx4) | -|-| 25.2110 | 0.7732 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vimeo-90K-T (BIx4) | -|-|36.4983 | 0.9416 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vimeo-90K-T (BDx4) | -|-| 34.4299 | 0.9287 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | REDS4 (BIx4) | 30.3452 |0.8659 |-|-| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | UDM10 (BDx4) |-|-|34.2595 |0.9398 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vid4 (BIx4) |-|-|27.4238 |0.8297 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vid4 (BDx4) |-|-|24.6666|0.7491 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vimeo-90K-T (BIx4) |-|-|37.3729| 0.9467 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vimeo-90K-T (BDx4) |-|-|34.5548 | 0.9295 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | REDS4 (BIx4) |29.0150 | 0.8465 |-|-| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | UDM10 (BDx4) |-|-| 40.0640 | 0.9697 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vid4 (BIx4) |-|-|26.3109| 0.8028 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vid4 (BDx4) | -|-| 28.2464 | 0.8612 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vimeo-90K-T (BIx4) | -|-|34.6780| 0.9339| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vimeo-90K-T (BDx4) |-|-|37.7573 | 0.9517 | 2 (Tesla V100-PCIE-32GB) | model | log |
Quick Start¶
Train
Train Instructions
You can use the following commands to train a model with cpu or single/multiple GPUs.
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/iconvsr/iconvsr_2xb4_reds4.py
## single-gpu train
python tools/train.py configs/iconvsr/iconvsr_2xb4_reds4.py
## multi-gpu train
./tools/dist_train.sh configs/iconvsr/iconvsr_2xb4_reds4.py 8
For more details, you can refer to Train a model part in train_test.md.
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/iconvsr/iconvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/iconvsr/iconvsr_reds4_20210413-9e09d621.pth
## single-gpu test
python tools/test.py configs/iconvsr/iconvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/iconvsr/iconvsr_reds4_20210413-9e09d621.pth
## multi-gpu test
./tools/dist_test.sh configs/iconvsr/iconvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/iconvsr/iconvsr_reds4_20210413-9e09d621.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
Citation¶
@InProceedings{chan2021basicvsr,
author = {Chan, Kelvin CK and Wang, Xintao and Yu, Ke and Dong, Chao and Loy, Chen Change},
title = {BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond},
booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
year = {2021}
}
TDAN (CVPR’2020)¶
Task: Video Super-Resolution
Abstract¶
Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). Due to varying motion of cameras or objects, the reference frame and each support frame are not aligned. Therefore, temporal alignment is a challenging yet important problem for VSR. Previous VSR methods usually utilize optical flow between the reference frame and each supporting frame to wrap the supporting frame for temporal alignment. Therefore, the performance of these image-level wrapping-based models will highly depend on the prediction accuracy of optical flow, and inaccurate optical flow will lead to artifacts in the wrapped supporting frames, which also will be propagated into the reconstructed HR video frame. To overcome the limitation, in this paper, we propose a temporal deformable alignment network (TDAN) to adaptively align the reference frame and each supporting frame at the feature level without computing optical flow. The TDAN uses features from both the reference frame and each supporting frame to dynamically predict offsets of sampling convolution kernels. By using the corresponding kernels, TDAN transforms supporting frames to align with the reference frame. To predict the HR video frame, a reconstruction network taking aligned frames and the reference frame is utilized. Experimental results demonstrate the effectiveness of the proposed TDAN-based VSR model.

Results and models¶
Evaluated on Y-channel. 8 pixels in each border are cropped before evaluation.
The metrics are PSNR / SSIM
.
Model | Dataset | PSNR (Y) | SSIM (Y) | Training Resources | Download |
---|---|---|---|---|---|
tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi | - | - | - | 8 (Tesla V100-SXM2-32GB) | - |
tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bd | - | - | - | 8 (Tesla V100-SXM2-32GB) | - |
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi | Vid4 (BIx4) | 26.49 | 0.792 | 8 (Tesla V100-SXM2-32GB) | model | log |
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi | SPMCS-30 (BIx4) | 30.42 | 0.856 | 8 (Tesla V100-SXM2-32GB) | model | log |
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi | Vid4 (BDx4) | 25.93 | 0.772 | 8 (Tesla V100-SXM2-32GB) | model | log |
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi | SPMCS-30 (BDx4) | 29.69 | 0.842 | 8 (Tesla V100-SXM2-32GB) | model | log |
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd | Vid4 (BIx4) | 25.80 | 0.784 | 8 (Tesla V100-SXM2-32GB) | model | log |
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd | SPMCS-30 (BIx4) | 29.56 | 0.851 | 8 (Tesla V100-SXM2-32GB) | model | log |
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd | Vid4 (BDx4) | 26.87 | 0.815 | 8 (Tesla V100-SXM2-32GB) | model | log |
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd | SPMCS-30 (BDx4) | 30.77 | 0.868 | 8 (Tesla V100-SXM2-32GB) | model | log |
Quick Start¶
Train
Train Instructions
You can use the following commands to train a model with cpu or single/multiple GPUs.
TDAN is trained with two stages.
Stage 1: Train with a larger learning rate (1e-4)
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/tdan/tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi.py
## single-gpu train
python tools/train.py configs/tdan/tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi.py
## multi-gpu train
./tools/dist_train.sh cconfigs/tdan/tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi.py 8
Stage 2: Fine-tune with a smaller learning rate (5e-5)
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py
## single-gpu train
python tools/train.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py
## multi-gpu train
./tools/dist_train.sh configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py 8
For more details, you can refer to Train a model part in train_test.md.
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py https://download.openmmlab.com/mmediting/restorers/tdan/tdan_vimeo90k_bix4_20210528-739979d9.pth
## single-gpu test
python tools/test.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py https://download.openmmlab.com/mmediting/restorers/tdan/tdan_vimeo90k_bix4_20210528-739979d9.pth
## multi-gpu test
./tools/dist_test.sh configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py https://download.openmmlab.com/mmediting/restorers/tdan/tdan_vimeo90k_bix4_20210528-739979d9.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
Citation¶
@InProceedings{tian2020tdan,
title={TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution},
author={Tian, Yapeng and Zhang, Yulun and Fu, Yun and Xu, Chenliang},
booktitle = {Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
year = {2020}
}
TOFlow (IJCV’2019)¶
Task: Video Interpolation, Video Super-Resolution
Abstract¶
Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.

Results and models¶
Evaluated on Vimeo90k-triplet (RGB channels).
The metrics are PSNR / SSIM
.
Model | Dataset | Task | Pretrained SPyNet | PSNR | Training Resources | Download |
---|---|---|---|---|---|---|
tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Interpolation | spynet_chairs_final | 33.3294 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Interpolation | spynet_chairs_final | 33.3339 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Interpolation | spynet_chairs_final | 33.3170 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Interpolation | spynet_chairs_final | 33.3237 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Interpolation | spynet_chairs_final | 33.3426 | 1 (Tesla PG503-216) | model | log |
Model | Dataset | Task | Pretrained SPyNet | SSIM | Training Resources | Download |
---|---|---|---|---|---|---|
tof_vfi_spynet_chair_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Super-Resolution | spynet_chairs_final | 0.9465 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Super-Resolution | spynet_chairs_final | 0.9466 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Super-Resolution | spynet_chairs_final | 0.9464 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Super-Resolution | spynet_chairs_final | 0.9465 | 1 (Tesla PG503-216) | model | log |
tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k | Vimeo90k-triplet | Video Super-Resolution | spynet_chairs_final | 0.9467 | 1 (Tesla PG503-216) | model | log |
Note: These pretrained SPyNets don’t contain BN layer since batch_size=1
, which is consistent with https://github.com/Coldog2333/pytoflow
.
Evaluated on RGB channels.
The metrics are PSNR / SSIM
.
Model | Dataset | Task | Vid4 | Training Resources | Download |
---|---|---|---|---|---|
tof_x4_vimeo90k_official | vimeo90k | Video Super-Resolution | 24.4377 / 0.7433 | - | model |
Quick Start¶
Train
Train Instructions
You can use the following commands to train a model with cpu or single/multiple GPUs.
TOF only supports video interpolation task for training now.
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py
## single-gpu train
python tools/train.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py
## multi-gpu train
./tools/dist_train.sh configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py 8
For more details, you can refer to Train a model part in train_test.md.
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
TOF supports two tasks for testing.
Task 1: Video Interpolation
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth
## single-gpu test
python tools/test.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth
## multi-gpu test
./tools/dist_test.sh configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth 8
Task 2: Video Super-Resolution
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/tof/tof_x4_official_vimeo90k.py https://download.openmmlab.com/mmediting/restorers/tof/tof_x4_vimeo90k_official-a569ff50.pth
## single-gpu test
python tools/test.py configs/tof/tof_x4_official_vimeo90k.py https://download.openmmlab.com/mmediting/restorers/tof/tof_x4_vimeo90k_official-a569ff50.pth
## multi-gpu test
./tools/dist_test.sh configs/tof/tof_x4_official_vimeo90k.py https://download.openmmlab.com/mmediting/restorers/tof/tof_x4_vimeo90k_official-a569ff50.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
Citation¶
@article{xue2019video,
title={Video enhancement with task-oriented flow},
author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
journal={International Journal of Computer Vision},
volume={127},
number={8},
pages={1106--1125},
year={2019},
publisher={Springer}
}
EDVR (CVPRW’2019)¶
Task: Video Super-Resolution
Abstract¶
Video restoration tasks, including super-resolution, deblurring, etc, are drawing increasing attention in the computer vision community. A challenging benchmark named REDS is released in the NTIRE19 Challenge. This new benchmark challenges existing methods from two aspects: (1) how to align multiple frames given large motions, and (2) how to effectively fuse different frames with diverse motion and blur. In this work, we propose a novel Video Restoration framework with Enhanced Deformable networks, termed EDVR, to address these challenges. First, to handle large motions, we devise a Pyramid, Cascading and Deformable (PCD) alignment module, in which frame alignment is done at the feature level using deformable convolutions in a coarse-to-fine manner. Second, we propose a Temporal and Spatial Attention (TSA) fusion module, in which attention is applied both temporally and spatially, so as to emphasize important features for subsequent restoration. Thanks to these modules, our EDVR wins the champions and outperforms the second place by a large margin in all four tracks in the NTIRE19 video restoration and enhancement challenges. EDVR also demonstrates superior performance to state-of-the-art published methods on video super-resolution and deblurring.

Results and models¶
Evaluated on RGB channels.
The metrics are PSNR and SSIM
.
Model | Dataset | PSNR | SSIM | Training Resources | Download |
---|---|---|---|---|---|
edvrm_wotsa_x4_8x4_600k_reds | REDS | 30.3430 | 0.8664 | 8 | model | log |
edvrm_x4_8x4_600k_reds | REDS | 30.4194 | 0.8684 | 8 | model | log |
edvrl_wotsa_c128b40_8x8_lr2e-4_600k_reds4 | REDS | 31.0010 | 0.8784 | 8 (Tesla V100-PCIE-32GB) | model | log |
edvrl_c128b40_8x8_lr2e-4_600k_reds4 | REDS | 31.0467 | 0.8793 | 8 (Tesla V100-PCIE-32GB) | model | log |
Quick Start¶
Train
Train Instructions
You can use the following commands to train a model with cpu or single/multiple GPUs.
## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/edvr/edvrm_8xb4-600k_reds.py
## single-gpu train
python tools/train.py configs/edvr/edvrm_8xb4-600k_reds.py
## multi-gpu train
./tools/dist_train.sh configs/edvr/edvrm_8xb4-600k_reds.py 8
For more details, you can refer to Train a model part in train_test.md.
Test
Test Instructions
You can use the following commands to test a model with cpu or single/multiple GPUs.
## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py python tools/test.py configs/edvr/edvrm_8xb4-600k_reds.py https://download.openmmlab.com/mmediting/restorers/edvr/edvrm_x4_8x4_600k_reds_20210625-e29b71b5.pth
## single-gpu test
python tools/test.py configs/edvr/edvrm_8xb4-600k_reds.py https://download.openmmlab.com/mmediting/restorers/edvr/edvrm_x4_8x4_600k_reds_20210625-e29b71b5.pth
## multi-gpu test
./tools/dist_test.sh configs/edvr/edvrm_8xb4-600k_reds.py https://download.openmmlab.com/mmediting/restorers/edvr/edvrm_x4_8x4_600k_reds_20210625-e29b71b5.pth 8
For more details, you can refer to Test a pre-trained model part in train_test.md.
Citation¶
@InProceedings{wang2019edvr,
author = {Wang, Xintao and Chan, Kelvin C.K. and Yu, Ke and Dong, Chao and Loy, Chen Change},
title = {EDVR: Video restoration with enhanced deformable convolutional networks},
booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
month = {June},
year = {2019},
}