video super-resolution¶

Summary¶

Number of checkpoints: 27
Number of configs: 29
Number of papers: 6
- ALGORITHM: 7

RealBasicVSR (CVPR’2022)¶

RealBasicVSR: Investigating Tradeoffs in Real-World Video Super-Resolution

Task: Video Super-Resolution

Abstract¶

The diversity and complexity of degradations in real-world video super-resolution (VSR) pose non-trivial challenges in inference and training. First, while long-term propagation leads to improved performance in cases of mild degradations, severe in-the-wild degradations could be exaggerated through propagation, impairing output quality. To balance the tradeoff between detail synthesis and artifact suppression, we found an image pre-cleaning stage indispensable to reduce noises and artifacts prior to propagation. Equipped with a carefully designed cleaning module, our RealBasicVSR outperforms existing methods in both quality and efficiency. Second, real-world VSR models are often trained with diverse degradations to improve generalizability, requiring increased batch size to produce a stable gradient. Inevitably, the increased computational burden results in various problems, including 1) speed-performance tradeoff and 2) batch-length tradeoff. To alleviate the first tradeoff, we propose a stochastic degradation scheme that reduces up to 40% of training time without sacrificing performance. We then analyze different training settings and suggest that employing longer sequences rather than larger batches during training allows more effective uses of temporal information, leading to more stable performance during inference. To facilitate fair comparisons, we propose the new VideoLQ dataset, which contains a large variety of real-world low-quality video sequences containing rich textures and patterns. Our dataset can serve as a common ground for benchmarking. Code, models, and the dataset will be made publicly available.

Results and models¶

Evaluated on Y channel. The code for computing NRQM, NIQE, and PI can be found here. MATLAB official code is used to compute BRISQUE.

Model	Dataset	NRQM (Y)	NIQE (Y)	PI (Y)	BRISQUE (Y)	Training Resources	Download
realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds	REDS	6.0477	3.7662	3.8593	29.030	8 (Tesla V100-SXM2-32GB)	model/log
realbasicvsr_wogan-c64b20-2x30x8_8xb2-lr1e-4-300k_reds	REDS	-	-	-	-	8 (Tesla V100-SXM2-32GB)	model/log

Quick Start¶

Train

Train Instructions

You can use the following commands to train a model with cpu or single/multiple GPUs.

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py

## single-gpu train
python tools/train.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py

## multi-gpu train
./tools/dist_train.sh configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py 8

For more details, you can refer to Train a model part in train_test.md.

Test

Test Instructions

You can use the following commands to test a model with cpu or single/multiple GPUs.

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth

## single-gpu test
python tools/test.py python tools/test.py configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth

## multi-gpu test
./tools/dist_test.sh configs/real_basicvsr/realbasicvsr_c64b20-1x30x8_8xb1-lr5e-5-150k_reds.py https://download.openmmlab.com/mmediting/restorers/real_basicvsr/realbasicvsr_c64b20_1x30x8_lr5e-5_150k_reds_20211104-52f77c2c.pth 8

For more details, you can refer to Test a pre-trained model part in train_test.md.

Citation¶

@InProceedings{chan2022investigating,
  author = {Chan, Kelvin C.K. and Zhou, Shangchen and Xu, Xiangyu and Loy, Chen Change},
  title = {RealBasicVSR: Investigating Tradeoffs in Real-World Video Super-Resolution},
  booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
  year = {2022}
}

BasicVSR++ (CVPR’2022)¶

BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment

Task: Video Super-Resolution

Abstract¶

A recurrent structure is a popular framework choice for the task of video super-resolution. The state-of-the-art method BasicVSR adopts bidirectional propagation with feature alignment to effectively exploit information from the entire input video. In this study, we redesign BasicVSR by proposing second-order grid propagation and flow-guided deformable alignment. We show that by empowering the recurrent framework with the enhanced propagation and alignment, one can exploit spatiotemporal information across misaligned video frames more effectively. The new components lead to an improved performance under a similar computational constraint. In particular, our model BasicVSR++ surpasses BasicVSR by 0.82 dB in PSNR with similar number of parameters. In addition to video super-resolution, BasicVSR++ generalizes well to other video restoration tasks such as compressed video enhancement. In NTIRE 2021, BasicVSR++ obtains three champions and one runner-up in the Video Super-Resolution and Compressed Video Enhancement Challenges. Codes and models will be released to MMagic.

Results and models¶

The pretrained weights of SPyNet can be found here.

Model	Dataset	PSNR (RGB)	SSIM (RGB)	PSNR (Y)	SSIM (Y)	Training Resources	Download
basicvsr_plusplus_c64n7_8x1_600k_reds4	REDS4 (BIx4)	32.3855	0.9069	-	-	8 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_8x1_600k_reds4	UDM10 (BDx4)	-	-	34.6868	0.9417	8 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_8x1_600k_reds4	Vid4 (BIx4)	-	-	27.7674	0.8444	8 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_8x1_600k_reds4	Vid4 (BDx4)	-	-	24.6209	0.7540	8 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_8x1_600k_reds4	Vimeo-90K-T (BIx4)	-	-	36.4445	0.9411	8 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_8x1_600k_reds4	Vimeo-90K-T (BDx4)	-	-	34.0372	0.9244	8 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi	REDS4 (BIx4)	31.0126	0.8804	-	-	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi	UDM10 (BDx4)	-	-	33.1211	0.9270	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi	Vid4 (BIx4)	-	-	27.7882	0.8401	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi	Vid4 (BDx4)	-	-	23.6086	0.7033	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi	Vimeo-90K-T (BIx4)	-	-	37.7864	0.9500	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bi	Vimeo-90K-T (BDx4)	-	-	33.8972	0.9195	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd	REDS4 (BIx4)	29.2041	0.8528	-	-	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd	UDM10 (BDx4)	-	-	40.7216	0.9722	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd	Vid4 (BIx4)	-	-	26.4377	0.8074	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd	Vid4 (BDx4)	-	-	29.0400	0.8753	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd	Vimeo-90K-T (BIx4)	-	-	34.7248	0.9351	4 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_plusplus_c64n7_4x2_300k_vimeo90k_bd	Vimeo-90K-T (BDx4)	-	-	38.2054	0.9550	4 (Tesla V100-PCIE-32GB)	model \| log

NTIRE 2021 checkpoints

Note that the following models are finetuned from smaller models. The training schemes of these models will be released when MMagic reaches 5k stars. We provide the pre-trained models here.

Model	Dataset	Download
basicvsr-pp_c128n25_600k_ntire-vsr	NTIRE 2021 Video Super-Resolution - Track 1	model
basicvsr-pp_c128n25_600k_ntire-decompress-track1	NTIRE 2021 Quality Enhancement of Compressed Video - Track 1	model
basicvsr-pp_c128n25_600k_ntire-decompress-track2	NTIRE 2021 Quality Enhancement of Compressed Video - Track 2	model
basicvsr-pp_c128n25_600k_ntire-decompress-track3	NTIRE 2021 Quality Enhancement of Compressed Video - Track 3	model

Quick Start¶

Train

Train Instructions

You can use the following commands to train a model with cpu or single/multiple GPUs.

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py

## single-gpu train
python tools/train.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py

## multi-gpu train
./tools/dist_train.sh configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py 8

For more details, you can refer to Train a model part in train_test.md.

Test

Test Instructions

You can use the following commands to test a model with cpu or single/multiple GPUs.

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr_plusplus/basicvsr_plusplus_c64n7_8x1_600k_reds4_20210217-db622b2f.pth

## single-gpu test
python tools/test.py configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr_plusplus/basicvsr_plusplus_c64n7_8x1_600k_reds4_20210217-db622b2f.pth

## multi-gpu test
./tools/dist_test.sh configs/basicvsr_pp/basicvsr-pp_c64n7_8xb1-600k_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr_plusplus/basicvsr_plusplus_c64n7_8x1_600k_reds4_20210217-db622b2f.pth 8

For more details, you can refer to Test a pre-trained model part in train_test.md.

Citation¶

@InProceedings{chan2022basicvsrplusplus,
  author = {Chan, Kelvin C.K. and Zhou, Shangchen and Xu, Xiangyu and Loy, Chen Change},
  title = {BasicVSR++: Improving Video Super-Resolution with Enhanced Propagation and Alignment},
  booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
  year = {2022}
}

IconVSR (CVPR’2021)¶

BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

Task: Video Super-Resolution

Abstract¶

Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. In this study, we wish to untangle the knots and reconsider some most essential components for VSR guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting an information-refill mechanism and a coupled propagation scheme to facilitate information aggregation. The BasicVSR and its extension, IconVSR, can serve as strong baselines for future VSR approaches.

Results and models¶

Evaluated on RGB channels for REDS4 and Y channel for others. The metrics are PSNR / SSIM . The pretrained weights of the IconVSR components can be found here: SPyNet, EDVR-M for REDS, and EDVR-M for Vimeo-90K.

| Model | Dataset | PSNR (RGB) | SSIM (RGB) | PSNR (Y) | SSIM (Y) | Training Resources | Download | | :—————: | :———————: | :————————-: | :——————: | :——————-: | :————————-: | :——————: | :———————-: | :——————-: | | iconvsr_reds4 | REDS4 (BIx4) | 31.6926 | 0.8951 |-|-| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | UDM10 (BDx4) | -|-|35.3377| 0.9471| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vid4 (BIx4) | -|-|27.4809| 0.8354 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vid4 (BDx4) | -|-| 25.2110 | 0.7732 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vimeo-90K-T (BIx4) | -|-|36.4983 | 0.9416 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_reds4 | Vimeo-90K-T (BDx4) | -|-| 34.4299 | 0.9287 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | REDS4 (BIx4) | 30.3452 |0.8659 |-|-| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | UDM10 (BDx4) |-|-|34.2595 |0.9398 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vid4 (BIx4) |-|-|27.4238 |0.8297 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vid4 (BDx4) |-|-|24.6666|0.7491 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vimeo-90K-T (BIx4) |-|-|37.3729| 0.9467 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bi | Vimeo-90K-T (BDx4) |-|-|34.5548 | 0.9295 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | REDS4 (BIx4) |29.0150 | 0.8465 |-|-| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | UDM10 (BDx4) |-|-| 40.0640 | 0.9697 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vid4 (BIx4) |-|-|26.3109| 0.8028 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vid4 (BDx4) | -|-| 28.2464 | 0.8612 | 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vimeo-90K-T (BIx4) | -|-|34.6780| 0.9339| 2 (Tesla V100-PCIE-32GB) | model | log | | iconvsr_vimeo90k_bd | Vimeo-90K-T (BDx4) |-|-|37.7573 | 0.9517 | 2 (Tesla V100-PCIE-32GB) | model | log |

Quick Start¶

Train

Train Instructions

You can use the following commands to train a model with cpu or single/multiple GPUs.

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/iconvsr/iconvsr_2xb4_reds4.py

## single-gpu train
python tools/train.py configs/iconvsr/iconvsr_2xb4_reds4.py

## multi-gpu train
./tools/dist_train.sh configs/iconvsr/iconvsr_2xb4_reds4.py 8

For more details, you can refer to Train a model part in train_test.md.

Test

Test Instructions

You can use the following commands to test a model with cpu or single/multiple GPUs.

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/iconvsr/iconvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/iconvsr/iconvsr_reds4_20210413-9e09d621.pth

## single-gpu test
python tools/test.py configs/iconvsr/iconvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/iconvsr/iconvsr_reds4_20210413-9e09d621.pth

## multi-gpu test
./tools/dist_test.sh configs/iconvsr/iconvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/iconvsr/iconvsr_reds4_20210413-9e09d621.pth 8

For more details, you can refer to Test a pre-trained model part in train_test.md.

Citation¶

@InProceedings{chan2021basicvsr,
  author = {Chan, Kelvin CK and Wang, Xintao and Yu, Ke and Dong, Chao and Loy, Chen Change},
  title = {BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond},
  booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
  year = {2021}
}

BasicVSR (CVPR’2021)¶

BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond

Task: Video Super-Resolution

Abstract¶

Video super-resolution (VSR) approaches tend to have more components than the image counterparts as they need to exploit the additional temporal dimension. Complex designs are not uncommon. In this study, we wish to untangle the knots and reconsider some most essential components for VSR guided by four basic functionalities, i.e., Propagation, Alignment, Aggregation, and Upsampling. By reusing some existing components added with minimal redesigns, we show a succinct pipeline, BasicVSR, that achieves appealing improvements in terms of speed and restoration quality in comparison to many state-of-the-art algorithms. We conduct systematic analysis to explain how such gain can be obtained and discuss the pitfalls. We further show the extensibility of BasicVSR by presenting an information-refill mechanism and a coupled propagation scheme to facilitate information aggregation. The BasicVSR and its extension, IconVSR, can serve as strong baselines for future VSR approaches.

Results and models¶

Evaluated on RGB channels for REDS4 and Y channel for others. The metrics are PSNR / SSIM . The pretrained weights of SPyNet can be found here.

Model	Dataset	PSNR (RGB)	SSIM (RGB)	PSNR (Y)	SSIM (Y)	Training Resources	Download
basicvsr_reds4	REDS4 (BIx4)	31.4170	0.8909	-	-	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_reds4	UDM10 (BDx4)	-	-	33.4478	0.9306	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_reds4	Vimeo-90K-T (BIx4)	-	-	36.2848	0.9395	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_reds4	Vimeo-90K-T (BDx4)	-	-	34.4700	0.9286	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_reds4	Vid4 (BIx4)	-	-	27.2694	0.8318	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_reds4	Vid4 (BDx4)	-	-	24.4541	0.7455	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bi	REDS4 (BIx4)	30.3128	0.8660	-	-	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bi	UDM10 (BDx4)	-	-	34.5554	0.9451	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bi	Vimeo-90K-T (BIx4)	-	-	37.2026	0.9434	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bi	Vimeo-90K-T (BDx4)	-	-	34.8097	0.9316	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bi	Vid4 (BIx4)	-	-	27.2755	0.8248	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bi	Vid4 (BDx4)	-	-	25.0517	0.7636	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bd	REDS4 (BIx4)	29.0376	0.8481	-	-	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bd	UDM10 (BDx4)	-	-	39.9953	0.9695	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bd	Vimeo-90K-T (BIx4)	-	-	34.6427	0.9335	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bd	Vimeo-90K-T (BDx4)	-	-	37.5501	0.9499	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bd	Vid4 (BIx4)	-	-	26.2708	0.8022	2 (Tesla V100-PCIE-32GB)	model \| log
basicvsr_vimeo90k_bd	Vid4 (BDx4)	-	-	27.9791	0.8556	2 (Tesla V100-PCIE-32GB)	model \| log

Quick Start¶

Train

Train Instructions

You can use the following commands to train a model with cpu or single/multiple GPUs.

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/basicvsr/basicvsr_2xb4_reds4.py

## single-gpu train
python tools/train.py configs/basicvsr/basicvsr_2xb4_reds4.py

## multi-gpu train
./tools/dist_train.sh configs/basicvsr/basicvsr_2xb4_reds4.py 8

For more details, you can refer to Train a model part in train_test.md.

Test

Test Instructions

You can use the following commands to test a model with cpu or single/multiple GPUs.

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/basicvsr/basicvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr/basicvsr_reds4_20120409-0e599677.pth

## single-gpu test
python tools/test.py configs/basicvsr/basicvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr/basicvsr_reds4_20120409-0e599677.pth

## multi-gpu test
./tools/dist_test.sh configs/basicvsr/basicvsr_2xb4_reds4.py https://download.openmmlab.com/mmediting/restorers/basicvsr/basicvsr_reds4_20120409-0e599677.pth 8

For more details, you can refer to Test a pre-trained model part in train_test.md.

Citation¶

@InProceedings{chan2021basicvsr,
  author = {Chan, Kelvin CK and Wang, Xintao and Yu, Ke and Dong, Chao and Loy, Chen Change},
  title = {BasicVSR: The Search for Essential Components in Video Super-Resolution and Beyond},
  booktitle = {Proceedings of the IEEE conference on computer vision and pattern recognition},
  year = {2021}
}

TDAN (CVPR’2020)¶

TDAN: Temporally Deformable Alignment Network for Video Super-Resolution

Task: Video Super-Resolution

Abstract¶

Video super-resolution (VSR) aims to restore a photo-realistic high-resolution (HR) video frame from both its corresponding low-resolution (LR) frame (reference frame) and multiple neighboring frames (supporting frames). Due to varying motion of cameras or objects, the reference frame and each support frame are not aligned. Therefore, temporal alignment is a challenging yet important problem for VSR. Previous VSR methods usually utilize optical flow between the reference frame and each supporting frame to wrap the supporting frame for temporal alignment. Therefore, the performance of these image-level wrapping-based models will highly depend on the prediction accuracy of optical flow, and inaccurate optical flow will lead to artifacts in the wrapped supporting frames, which also will be propagated into the reconstructed HR video frame. To overcome the limitation, in this paper, we propose a temporal deformable alignment network (TDAN) to adaptively align the reference frame and each supporting frame at the feature level without computing optical flow. The TDAN uses features from both the reference frame and each supporting frame to dynamically predict offsets of sampling convolution kernels. By using the corresponding kernels, TDAN transforms supporting frames to align with the reference frame. To predict the HR video frame, a reconstruction network taking aligned frames and the reference frame is utilized. Experimental results demonstrate the effectiveness of the proposed TDAN-based VSR model.

Results and models¶

Evaluated on Y-channel. 8 pixels in each border are cropped before evaluation. The metrics are PSNR / SSIM .

Model	Dataset	PSNR (Y)	SSIM (Y)	Training Resources	Download
tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi	-	-	-	8 (Tesla V100-SXM2-32GB)	-
tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bd	-	-	-	8 (Tesla V100-SXM2-32GB)	-
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi	Vid4 (BIx4)	26.49	0.792	8 (Tesla V100-SXM2-32GB)	model \| log
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi	SPMCS-30 (BIx4)	30.42	0.856	8 (Tesla V100-SXM2-32GB)	model \| log
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi	Vid4 (BDx4)	25.93	0.772	8 (Tesla V100-SXM2-32GB)	model \| log
tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi	SPMCS-30 (BDx4)	29.69	0.842	8 (Tesla V100-SXM2-32GB)	model \| log
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd	Vid4 (BIx4)	25.80	0.784	8 (Tesla V100-SXM2-32GB)	model \| log
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd	SPMCS-30 (BIx4)	29.56	0.851	8 (Tesla V100-SXM2-32GB)	model \| log
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd	Vid4 (BDx4)	26.87	0.815	8 (Tesla V100-SXM2-32GB)	model \| log
tdan_x4ft_1xb16-lr5e-5-800k_vimeo90k-bd	SPMCS-30 (BDx4)	30.77	0.868	8 (Tesla V100-SXM2-32GB)	model \| log

Quick Start¶

Train

Train Instructions

You can use the following commands to train a model with cpu or single/multiple GPUs.

TDAN is trained with two stages.

Stage 1: Train with a larger learning rate (1e-4)

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/tdan/tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi.py

## single-gpu train
python tools/train.py configs/tdan/tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi.py

## multi-gpu train
./tools/dist_train.sh cconfigs/tdan/tdan_x4_1xb16-lr1e-4-400k_vimeo90k-bi.py 8

Stage 2: Fine-tune with a smaller learning rate (5e-5)

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py

## single-gpu train
python tools/train.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py

## multi-gpu train
./tools/dist_train.sh configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py 8

For more details, you can refer to Train a model part in train_test.md.

Test

Test Instructions

You can use the following commands to test a model with cpu or single/multiple GPUs.

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py https://download.openmmlab.com/mmediting/restorers/tdan/tdan_vimeo90k_bix4_20210528-739979d9.pth

## single-gpu test
python tools/test.py configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py https://download.openmmlab.com/mmediting/restorers/tdan/tdan_vimeo90k_bix4_20210528-739979d9.pth

## multi-gpu test
./tools/dist_test.sh configs/tdan/tdan_x4ft_1xb16-lr5e-5-400k_vimeo90k-bi.py https://download.openmmlab.com/mmediting/restorers/tdan/tdan_vimeo90k_bix4_20210528-739979d9.pth 8

For more details, you can refer to Test a pre-trained model part in train_test.md.

Citation¶

@InProceedings{tian2020tdan,
  title={TDAN: Temporally-Deformable Alignment Network for Video Super-Resolution},
  author={Tian, Yapeng and Zhang, Yulun and Fu, Yun and Xu, Chenliang},
  booktitle = {Proceedings of the IEEE conference on Computer Vision and Pattern Recognition},
  year = {2020}
}

TOFlow (IJCV’2019)¶

Video Enhancement with Task-Oriented Flow

Task: Video Interpolation, Video Super-Resolution

Abstract¶

Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a self-supervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution.

Results and models¶

Evaluated on Vimeo90k-triplet (RGB channels). The metrics are PSNR / SSIM .

Model	Dataset	Task	Pretrained SPyNet	PSNR	Training Resources	Download
tof_vfi_spynet_chair_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Interpolation	spynet_chairs_final	33.3294	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Interpolation	spynet_chairs_final	33.3339	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Interpolation	spynet_chairs_final	33.3170	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Interpolation	spynet_chairs_final	33.3237	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Interpolation	spynet_chairs_final	33.3426	1 (Tesla PG503-216)	model \| log

Model	Dataset	Task	Pretrained SPyNet	SSIM	Training Resources	Download
tof_vfi_spynet_chair_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Super-Resolution	spynet_chairs_final	0.9465	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_kitti_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Super-Resolution	spynet_chairs_final	0.9466	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_sintel_clean_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Super-Resolution	spynet_chairs_final	0.9464	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_sintel_final_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Super-Resolution	spynet_chairs_final	0.9465	1 (Tesla PG503-216)	model \| log
tof_vfi_spynet_pytoflow_nobn_1xb1_vimeo90k	Vimeo90k-triplet	Video Super-Resolution	spynet_chairs_final	0.9467	1 (Tesla PG503-216)	model \| log

Note: These pretrained SPyNets don’t contain BN layer since batch_size=1, which is consistent with https://github.com/Coldog2333/pytoflow.

Evaluated on RGB channels. The metrics are PSNR / SSIM .

Model	Dataset	Task	Vid4	Training Resources	Download
tof_x4_vimeo90k_official	vimeo90k	Video Super-Resolution	24.4377 / 0.7433	-	model

Quick Start¶

Train

Train Instructions

You can use the following commands to train a model with cpu or single/multiple GPUs.

TOF only supports video interpolation task for training now.

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py

## single-gpu train
python tools/train.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py

## multi-gpu train
./tools/dist_train.sh configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py 8

For more details, you can refer to Train a model part in train_test.md.

Test

Test Instructions

You can use the following commands to test a model with cpu or single/multiple GPUs.

TOF supports two tasks for testing.

Task 1: Video Interpolation

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth

## single-gpu test
python tools/test.py configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth

## multi-gpu test
./tools/dist_test.sh configs/tof/tof_spynet-chair-wobn_1xb1_vimeo90k-triplet.py https://download.openmmlab.com/mmediting/video_interpolators/toflow/pretrained_spynet_chair_20220321-4d82e91b.pth 8

Task 2: Video Super-Resolution

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py configs/tof/tof_x4_official_vimeo90k.py https://download.openmmlab.com/mmediting/restorers/tof/tof_x4_vimeo90k_official-a569ff50.pth

## single-gpu test
python tools/test.py configs/tof/tof_x4_official_vimeo90k.py https://download.openmmlab.com/mmediting/restorers/tof/tof_x4_vimeo90k_official-a569ff50.pth

## multi-gpu test
./tools/dist_test.sh configs/tof/tof_x4_official_vimeo90k.py https://download.openmmlab.com/mmediting/restorers/tof/tof_x4_vimeo90k_official-a569ff50.pth 8

For more details, you can refer to Test a pre-trained model part in train_test.md.

Citation¶

@article{xue2019video,
  title={Video enhancement with task-oriented flow},
  author={Xue, Tianfan and Chen, Baian and Wu, Jiajun and Wei, Donglai and Freeman, William T},
  journal={International Journal of Computer Vision},
  volume={127},
  number={8},
  pages={1106--1125},
  year={2019},
  publisher={Springer}
}

EDVR (CVPRW’2019)¶

EDVR: Video Restoration with Enhanced Deformable Convolutional Networks

Task: Video Super-Resolution

Abstract¶

Video restoration tasks, including super-resolution, deblurring, etc, are drawing increasing attention in the computer vision community. A challenging benchmark named REDS is released in the NTIRE19 Challenge. This new benchmark challenges existing methods from two aspects: (1) how to align multiple frames given large motions, and (2) how to effectively fuse different frames with diverse motion and blur. In this work, we propose a novel Video Restoration framework with Enhanced Deformable networks, termed EDVR, to address these challenges. First, to handle large motions, we devise a Pyramid, Cascading and Deformable (PCD) alignment module, in which frame alignment is done at the feature level using deformable convolutions in a coarse-to-fine manner. Second, we propose a Temporal and Spatial Attention (TSA) fusion module, in which attention is applied both temporally and spatially, so as to emphasize important features for subsequent restoration. Thanks to these modules, our EDVR wins the champions and outperforms the second place by a large margin in all four tracks in the NTIRE19 video restoration and enhancement challenges. EDVR also demonstrates superior performance to state-of-the-art published methods on video super-resolution and deblurring.

Results and models¶

Evaluated on RGB channels. The metrics are PSNR and SSIM .

Model	Dataset	PSNR	SSIM	Training Resources	Download
edvrm_wotsa_x4_8x4_600k_reds	REDS	30.3430	0.8664	8	model \| log
edvrm_x4_8x4_600k_reds	REDS	30.4194	0.8684	8	model \| log
edvrl_wotsa_c128b40_8x8_lr2e-4_600k_reds4	REDS	31.0010	0.8784	8 (Tesla V100-PCIE-32GB)	model \| log
edvrl_c128b40_8x8_lr2e-4_600k_reds4	REDS	31.0467	0.8793	8 (Tesla V100-PCIE-32GB)	model \| log

Quick Start¶

Train

Train Instructions

You can use the following commands to train a model with cpu or single/multiple GPUs.

## cpu train
CUDA_VISIBLE_DEVICES=-1 python tools/train.py configs/edvr/edvrm_8xb4-600k_reds.py

## single-gpu train
python tools/train.py configs/edvr/edvrm_8xb4-600k_reds.py

## multi-gpu train
./tools/dist_train.sh configs/edvr/edvrm_8xb4-600k_reds.py 8

For more details, you can refer to Train a model part in train_test.md.

Test

Test Instructions

You can use the following commands to test a model with cpu or single/multiple GPUs.

## cpu test
CUDA_VISIBLE_DEVICES=-1 python tools/test.py python tools/test.py configs/edvr/edvrm_8xb4-600k_reds.py https://download.openmmlab.com/mmediting/restorers/edvr/edvrm_x4_8x4_600k_reds_20210625-e29b71b5.pth

## single-gpu test
python tools/test.py configs/edvr/edvrm_8xb4-600k_reds.py https://download.openmmlab.com/mmediting/restorers/edvr/edvrm_x4_8x4_600k_reds_20210625-e29b71b5.pth

## multi-gpu test
./tools/dist_test.sh configs/edvr/edvrm_8xb4-600k_reds.py https://download.openmmlab.com/mmediting/restorers/edvr/edvrm_x4_8x4_600k_reds_20210625-e29b71b5.pth 8

For more details, you can refer to Test a pre-trained model part in train_test.md.

Citation¶

@InProceedings{wang2019edvr,
  author    = {Wang, Xintao and Chan, Kelvin C.K. and Yu, Ke and Dong, Chao and Loy, Chen Change},
  title     = {EDVR: Video restoration with enhanced deformable convolutional networks},
  booktitle = {The IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
  month     = {June},
  year      = {2019},
}