Shortcuts

Video Prediction Benchmarks

We provide benchmark results of spatiotemporal prediction learning (STL) methods on various video prediction datasets. More STL methods will be supported in the future. Issues and PRs are welcome! Currently, we only provide benchmark results, trained models and logs will be released soon (you can contact us if you require these files). You can download model files from Baidu Cloud (tgr6).

Table of Contents

Currently supported spatiotemporal prediction methods
Currently supported MetaFormer models for SimVP

Moving MNIST Benchmarks

We provide benchmark results on the popular Moving MNIST dataset using \(10\rightarrow 10\) frames prediction setting following PredRNN. Metrics (MSE, MAE, SSIM, pSNR) of the best models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. All methods are trained by Adam optimizer with Onecycle scheduler and single GPU.

STL Benchmarks on MMNIST

For a fair comparison of different methods, we report the best results when models are trained to converge. We provide config files in configs/mmnist.

Method

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

Download

ConvLSTM-S

200 epoch

15.0M

56.8G

113

29.80

90.64

0.9288

22.10

model | log

ConvLSTM-L

200 epoch

33.8M

127.0G

50

27.78

86.14

0.9343

22.44

model | log

PredNet

200 epoch

12.5M

8.6G

659

161.38

201.16

0.7783

14.33

model | log

PhyDNet

200 epoch

3.1M

15.3G

182

28.19

78.64

0.9374

22.62

model | log

PredRNN

200 epoch

23.8M

116.0G

54

23.97

72.82

0.9462

23.28

model | log

PredRNN++

200 epoch

38.6M

171.7G

38

22.06

69.58

0.9509

23.65

model | log

MIM

200 epoch

38.0M

179.2G

37

22.55

69.97

0.9498

23.56

model | log

MAU

200 epoch

4.5M

17.8G

201

26.86

78.22

0.9398

22.76

model | log

E3D-LSTM

200 epoch

51.0M

298.9G

18

35.97

78.28

0.9320

21.11

model | log

CrevNet

200 epoch

5.0M

270.7G

10

30.15

86.28

0.9350

model | log

PredRNN.V2

200 epoch

23.9M

116.6G

52

24.13

73.73

0.9453

23.21

model | log

DMVFN

200 epoch

3.5M

0.2G

1145

123.67

179.96

0.8140

16.15

model | log

SimVP+IncepU

200 epoch

58.0M

19.4G

209

32.15

89.05

0.9268

37.97

model | log

SimVP+gSTA-S

200 epoch

46.8M

16.5G

282

26.69

77.19

0.9402

38.35

model | log

TAU

200 epoch

44.7M

16.0G

283

24.60

71.93

0.9454

23.19

model | log

ConvLSTM-S

2000 epoch

15.0M

56.8G

113

22.41

73.07

0.9480

23.54

model | log

PredNet

2000 epoch

12.5M

8.6G

659

31.85

90.01

0.9273

21.85

model | log

PhyDNet

2000 epoch

3.1M

15.3G

182

20.35

61.47

0.9559

24.21

model | log

PredRNN

2000 epoch

23.8M

116.0G

54

26.43

77.52

0.9411

22.90

model | log

PredRNN++

2000 epoch

38.6M

171.7G

38

14.07

48.91

0.9698

26.37

model | log

MIM

2000 epoch

38.0M

179.2G

37

14.73

52.31

0.9678

25.99

model | log

MAU

2000 epoch

4.5M

17.8G

201

22.25

67.96

0.9511

23.68

model | log

E3D-LSTM

2000 epoch

51.0M

298.9G

18

24.07

77.49

0.9436

23.19

model | log

PredRNN.V2

2000 epoch

23.9M

116.6G

52

17.26

57.22

0.9624

25.01

model | log

SimVP+IncepU

2000 epoch

58.0M

19.4G

209

21.15

64.15

0.9536

23.99

model | log

SimVP+gSTA-S

2000 epoch

46.8M

16.5G

282

15.05

49.80

0.9675

25.97

model | log

TAU

2000 epoch

44.7M

16.0G

283

15.69

51.46

0.9661

25.71

model | log

Benchmark of MetaFormers Based on SimVP (MetaVP)

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with training times of 200-epoch and 2000-epoch. We provide config files in configs/mmnist/simvp.

MetaVP

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

Download

IncepU (SimVPv1)

200 epoch

58.0M

19.4G

209

32.15

89.05

0.9268

21.84

model | log

gSTA (SimVPv2)

200 epoch

46.8M

16.5G

282

26.69

77.19

0.9402

22.78

model | log

ViT

200 epoch

46.1M

16.9G

290

35.15

95.87

0.9139

21.67

model | log

Swin Transformer

200 epoch

46.1M

16.4G

294

29.70

84.05

0.9331

22.22

model | log

Uniformer

200 epoch

44.8M

16.5G

296

30.38

85.87

0.9308

22.13

model | log

MLP-Mixer

200 epoch

38.2M

14.7G

334

29.52

83.36

0.9338

22.22

model | log

ConvMixer

200 epoch

3.9M

5.5G

658

32.09

88.93

0.9259

21.93

model | log

Poolformer

200 epoch

37.1M

14.1G

341

31.79

88.48

0.9271

22.03

model | log

ConvNeXt

200 epoch

37.3M

14.1G

344

26.94

77.23

0.9397

22.74

model | log

VAN

200 epoch

44.5M

16.0G

288

26.10

76.11

0.9417

22.89

model | log

HorNet

200 epoch

45.7M

16.3G

287

29.64

83.26

0.9331

22.26

model | log

MogaNet

200 epoch

46.8M

16.5G

255

25.57

75.19

0.9429

22.99

model | log

TAU

200 epoch

44.7M

16.0G

283

24.60

71.93

0.9454

23.19

model | log

IncepU (SimVPv1)

2000 epoch

58.0M

19.4G

209

21.15

64.15

0.9536

23.99

model | log

gSTA (SimVPv2)

2000 epoch

46.8M

16.5G

282

15.05

49.80

0.9675

25.97

model | log

ViT

2000 epoch

46.1M

16.9.G

290

19.74

61.65

0.9539

24.59

model | log

Swin Transformer

2000 epoch

46.1M

16.4G

294

19.11

59.84

0.9584

24.53

model | log

Uniformer

2000 epoch

44.8M

16.5G

296

18.01

57.52

0.9609

24.92

model | log

MLP-Mixer

2000 epoch

38.2M

14.7G

334

18.85

59.86

0.9589

24.58

model | log

ConvMixer

2000 epoch

3.9M

5.5G

658

22.30

67.37

0.9507

23.73

model | log

Poolformer

2000 epoch

37.1M

14.1G

341

20.96

64.31

0.9539

24.15

model | log

ConvNeXt

2000 epoch

37.3M

14.1G

344

17.58

55.76

0.9617

25.06

model | log

VAN

2000 epoch

44.5M

16.0G

288

16.21

53.57

0.9646

25.49

model | log

HorNet

2000 epoch

45.7M

16.3G

287

17.40

55.70

0.9624

25.14

model | log

MogaNet

2000 epoch

46.8M

16.5G

255

15.67

51.84

0.9661

25.70

model | log

TAU

2000 epoch

44.7M

16.0G

283

15.69

51.46

0.9661

25.71

model | log

(back to top)

Moving FMNIST Benchmarks

Similar to Moving MNIST, we also provide the advanced version of MNIST, i.e., MFMNIST benchmark results, using \(10\rightarrow 10\) frames prediction setting following PredRNN. Metrics (MSE, MAE, SSIM, pSNR) of the best models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. All methods are trained by Adam optimizer with Onecycle scheduler and single GPU.

STL Benchmarks on MFMNIST

For a fair comparison of different methods, we report the best results when models are trained to convergence. We provide config files in configs/mfmnist.

Method

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

Download

ConvLSTM-S

200 epoch

15.0M

56.8G

113

28.87

113.20

0.8793

22.07

model | log

ConvLSTM-L

200 epoch

33.8M

127.0G

50

25.51

104.85

0.8928

22.67

model | log

PredNet

200 epoch

12.5M

8.6G

659

185.94

318.30

0.6713

14.83

model | log

PhyDNet

200 epoch

3.1M

15.3G

182

34.75

125.66

0.8567

22.03

model | log

PredRNN

200 epoch

23.8M

116.0G

54

22.01

91.74

0.9091

23.42

model | log

PredRNN++

200 epoch

38.6M

171.7G

38

21.71

91.97

0.9097

23.45

model | log

MIM

200 epoch

38.0M

179.2G

37

23.09

96.37

0.9043

23.13

model | log

MAU

200 epoch

4.5M

17.8G

201

26.56

104.39

0.8916

22.51

model | log

E3D-LSTM

200 epoch

51.0M

298.9G

18

35.35

110.09

0.8722

21.27

model | log

PredRNN.V2

200 epoch

23.9M

116.6G

52

24.13

97.46

0.9004

22.96

model | log

DMVFN

200 epoch

3.5M

0.2G

1145

118.32

220.02

0.7572

16.76

model | log

SimVP+IncepU

200 epoch

58.0M

19.4G

209

30.77

113.94

0.8740

21.81

model | log

SimVP+gSTA-S

200 epoch

46.8M

16.5G

282

25.86

101.22

0.8933

22.61

model | log

TAU

200 epoch

44.7M

16.0G

283

24.24

96.72

0.8995

22.87

model | log

Benchmark of MetaFormers Based on SimVP (MetaVP)

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with training times of 200 epochs. We provide config files in configs/mfmnist/simvp.

MetaFormer

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

Download

IncepU (SimVPv1)

200 epoch

58.0M

19.4G

209

30.77

113.94

0.8740

21.81

model | log

gSTA (SimVPv2)

200 epoch

46.8M

16.5G

282

25.86

101.22

0.8933

22.61

model | log

ViT

200 epoch

46.1M

16.9.G

290

31.05

115.59

0.8712

21.83

model | log

Swin Transformer

200 epoch

46.1M

16.4G

294

28.66

108.93

0.8815

22.08

model | log

Uniformer

200 epoch

44.8M

16.5G

296

29.56

111.72

0.8779

21.97

model | log

MLP-Mixer

200 epoch

38.2M

14.7G

334

28.83

109.51

0.8803

22.01

model | log

ConvMixer

200 epoch

3.9M

5.5G

658

31.21

115.74

0.8709

21.71

model | log

Poolformer

200 epoch

37.1M

14.1G

341

30.02

113.07

0.8750

21.95

model | log

ConvNeXt

200 epoch

37.3M

14.1G

344

26.41

102.56

0.8908

22.49

model | log

VAN

200 epoch

44.5M

16.0G

288

31.39

116.28

0.8703

22.82

model | log

HorNet

200 epoch

45.7M

16.3G

287

29.19

110.17

0.8796

22.03

model | log

MogaNet

200 epoch

46.8M

16.5G

255

25.14

99.69

0.8960

22.73

model | log

TAU

200 epoch

44.7M

16.0G

283

24.24

96.72

0.8995

22.87

model | log

(back to top)

Moving MNIST-CIFAR Benchmarks

Similar to Moving MNIST, we further design the advanced version of MNIST with complex backgrounds from CIFAR-10, i.e., MMNIST-CIFAR benchmark, using \(10\rightarrow 10\) frames prediction setting following PredRNN. Metrics (MSE, MAE, SSIM, pSNR) of the best models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. All methods are trained by Adam optimizer with Onecycle scheduler and single GPU.

STL Benchmarks on MMNIST-CIFAR

For a fair comparison of different methods, we report the best results when models are trained to convergence. We provide config files in configs/mmnist_cifar.

Method

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

Download

ConvLSTM-S

200 epoch

15.5M

58.8G

113

73.31

338.56

0.9204

23.09

model | log

ConvLSTM-L

200 epoch

34.4M

130.0G

50

62.86

291.05

0.9337

23.83

model | log

PredNet

200 epoch

12.5M

8.6G

945

286.70

514.14

0.8139

17.49

model | log

PhyDNet

200 epoch

3.1M

15.3G

182

142.54

700.37

0.8276

19.92

model | log

PredRNN

200 epoch

23.8M

116.0G

54

50.09

225.04

0.9499

24.90

model | log

PredRNN++

200 epoch

38.6M

171.7G

38

44.19

198.27

0.9567

25.60

model | log

MIM

200 epoch

38.8M

183.0G

37

48.63

213.44

0.9521

25.08

model | log

MAU

200 epoch

4.5M

17.8G

201

58.84

255.76

0.9408

24.19

model | log

E3D-LSTM

200 epoch

52.8M

306.0G

18

80.79

214.86

0.9314

22.89

model | log

PredRNN.V2

200 epoch

23.9M

116.6G

52

57.27

252.29

0.9419

24.24

model | log

DMVFN

200 epoch

3.6M

0.2G

960

298.73

606.92

0.7765

17.07

model | log

SimVP+IncepU

200 epoch

58.0M

19.4G

209

59.83

214.54

0.9414

24.15

model | log

SimVP+gSTA-S

200 epoch

46.8M

16.5G

282

51.13

185.13

0.9512

24.93

model | log

TAU

200 epoch

44.7M

16.0G

275

48.17

177.35

0.9539

25.21

model | log

Benchmark of MetaFormers Based on SimVP (MetaVP)

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with training times of 200 epochs. We provide config files in configs/mmnist_cifar/simvp.

MetaFormer

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

Download

IncepU (SimVPv1)

200 epoch

58.0M

19.4G

209

59.83

214.54

0.9414

24.15

model | log

gSTA (SimVPv2)

200 epoch

46.8M

16.5G

282

51.13

185.13

0.9512

24.93

model | log

ViT

200 epoch

46.1M

16.9G

290

64.94

234.01

0.9354

23.90

model | log

Swin Transformer

200 epoch

46.1M

16.4G

294

57.11

207.45

0.9443

24.34

model | log

Uniformer

200 epoch

44.8M

16.5G

296

56.96

207.51

0.9442

24.38

model | log

MLP-Mixer

200 epoch

38.2M

14.7G

334

57.03

206.46

0.9446

24.34

model | log

ConvMixer

200 epoch

3.9M

5.5G

658

59.29

219.76

0.9403

24.17

model | log

Poolformer

200 epoch

37.1M

14.1G

341

60.98

219.50

0.9399

24.16

model | log

ConvNeXt

200 epoch

37.3M

14.1G

344

51.39

187.17

0.9503

24.89

model | log

VAN

200 epoch

44.5M

16.0G

288

59.59

221.32

0.9398

25.20

model | log

HorNet

200 epoch

45.7M

16.3G

287

55.79

202.73

0.9456

24.49

model | log

MogaNet

200 epoch

46.8M

16.5G

255

49.48

184.11

0.9521

25.07

model | log

TAU

200 epoch

44.7M

16.0G

275

48.17

177.35

0.9539

25.21

model | log

(back to top)

KittiCaltech Benchmarks

We provide benchmark results on KittiCaltech Pedestrian dataset using \(10\rightarrow 1\) frames prediction setting following PredNet. Metrics (MSE, MAE, SSIM, pSNR, LPIPS) of the best models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. The default training setup is trained 100 epochs by Adam optimizer with Onecycle scheduler on single GPU, while some computational consuming methods (denoted by *) using 4GPUs.

STL Benchmarks on KittiCaltech

For a fair comparison of different methods, we report the best results when models are trained to convergence. We provide config files in configs/kitticaltech.

Method

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

LPIPS

Download

ConvLSTM-S

100 epoch

15.0M

595.0G

33

139.6

1583.3

0.9345

27.46

0.08575

model | log

E3D-LSTM*

100 epoch

54.9M

1004G

10

200.6

1946.2

0.9047

25.45

0.12602

model | log

PredNet

100 epoch

12.5M

42.8G

94

159.8

1568.9

0.9286

27.21

0.11289

model | log

PhyDNet

100 epoch

3.1M

40.4G

117

312.2

2754.8

0.8615

23.26

0.32194

model | log

MAU

100 epoch

24.3M

172.0G

16

177.8

1800.4

0.9176

26.14

0.09673

model | log

MIM

100 epoch

49.2M

1858G

39

125.1

1464.0

0.9409

28.10

0.06353

model | log

PredRNN

100 epoch

23.7M

1216G

17

130.4

1525.5

0.9374

27.81

0.07395

model | log

PredRNN++

100 epoch

38.5M

1803G

12

125.5

1453.2

0.9433

28.02

0.13210

model | log

PredRNN.V2

100 epoch

23.8M

1223G

52

147.8

1610.5

0.9330

27.12

0.08920

model | log

DMVFN

100 epoch

3.6M

1.2G

557

183.9

1531.1

0.9314

26.95

0.04942

model | log

SimVP+IncepU

100 epoch

8.6M

60.6G

57

160.2

1690.8

0.9338

26.81

0.06755

model | log

SimVP+gSTA-S

100 epoch

15.6M

96.3G

40

129.7

1507.7

0.9454

27.89

0.05736

model | log

TAU

100 epoch

44.7M

80.0G

55

131.1

1507.8

0.9456

27.83

0.05494

model | log

Benchmark of MetaFormers Based on SimVP (MetaVP)

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with 100-epoch training. We provide config files in configs/kitticaltech/simvp.

MetaFormer

Setting

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

LPIPS

Download

IncepU (SimVPv1)

100 epoch

8.6M

60.6G

57

160.2

1690.8

0.9338

26.81

0.06755

model | log

gSTA (SimVPv2)

100 epoch

15.6M

96.3G

40

129.7

1507.7

0.9454

27.89

0.05736

model | log

ViT*

100 epoch

12.7M

155.0G

25

146.4

1615.8

0.9379

27.43

0.06659

model | log

Swin Transformer

100 epoch

15.3M

95.2G

49

155.2

1588.9

0.9299

27.25

0.08113

model | log

Uniformer*

100 epoch

11.8M

104.0G

28

135.9

1534.2

0.9393

27.66

0.06867

model | log

MLP-Mixer

100 epoch

22.2M

83.5G

60

207.9

1835.9

0.9133

26.29

0.07750

model | log

ConvMixer

100 epoch

1.5M

23.1G

129

174.7

1854.3

0.9232

26.23

0.07758

model | log

Poolformer

100 epoch

12.4M

79.8G

51

153.4

1613.5

0.9334

27.38

0.07000

model | log

ConvNeXt

100 epoch

12.5M

80.2G

54

146.8

1630.0

0.9336

27.19

0.06987

model | log

VAN

100 epoch

14.9M

92.5G

41

127.5

1476.5

0.9462

27.98

0.05500

model | log

HorNet

100 epoch

15.3M

94.4G

43

152.8

1637.9

0.9365

27.09

0.06004

model | log

MogaNet

100 epoch

15.6M

96.2G

36

131.4

1512.1

0.9442

27.79

0.05394

model | log

TAU

100 epoch

44.7M

80.0G

55

131.1

1507.8

0.9456

27.83

0.05494

model | log

(back to top)

KTH Benchmarks

We provide long-term prediction benchmark results on KTH Action dataset using \(10\rightarrow 20\) frames prediction setting. Metrics (MSE, MAE, SSIM, pSNR, LPIPS) of the best models are reported in three trials. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. The default training setup is trained 100 epochs by Adam optimizer with a batch size of 16 and Onecycle scheduler on single GPU or 4GPUs, and we report the used GPU setups for each method (also shown in the config).

STL Benchmarks on KTH

For a fair comparison of different methods, we report the best results when models are trained to convergence. We provide config files in configs/kth. Notice that 4xbs4 indicates 4GPUs DDP training with the batch size of 4 on each GPU.

Method

Setting

GPUs

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

LPIPS

Download

ConvLSTM-S

100 epoch

1xbs16

14.9M

1368.0G

16

47.65

445.5

0.8977

26.99

0.26686

model | log

E3D-LSTM

100 epoch

2xbs8

53.5M

217.0G

17

136.40

892.7

0.8153

21.78

0.48358

model | log

PredNet

100 epoch

1xbs16

12.5M

3.4G

399

152.11

783.1

0.8094

22.45

0.32159

model | log

PhyDNet

100 epoch

1xbs16

3.1M

93.6G

58

91.12

765.6

0.8322

23.41

0.50155

model | log

MAU

100 epoch

1xbs16

20.1M

399.0G

8

51.02

471.2

0.8945

26.73

0.25442

model | log

MIM

100 epoch

1xbs16

39.8M

1099.0G

17

40.73

380.8

0.9025

27.78

0.18808

model | log

PredRNN

100 epoch

1xbs16

23.6M

2800.0G

7

41.07

380.6

0.9097

27.95

0.21892

model | log

PredRNN++

100 epoch

1xbs16

38.3M

4162.0G

5

39.84

370.4

0.9124

28.13

0.19871

model | log

PredRNN.V2

100 epoch

1xbs16

23.6M

2815.0G

7

39.57

368.8

0.9099

28.01

0.21478

model | log

DMVFN

100 epoch

1xbs16

3.5M

0.88G

727

59.61

413.2

0.8976

26.65

0.12842

model | log

SimVP+IncepU

100 epoch

2xbs8

12.2M

62.8G

77

41.11

397.1

0.9065

27.46

0.26496

model | log

SimVP+gSTA-S

100 epoch

4xbs4

15.6M

76.8G

53

45.02

417.8

0.9049

27.04

0.25240

model | log

TAU

100 epoch

4xbs4

15.0M

73.8G

55

45.32

421.7

0.9086

27.10

0.22856

model | log

Benchmark of MetaFormers Based on SimVP (MetaVP)

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with 100-epoch training. We provide config files in configs/kth/simvp.

MetaFormer

Setting

GPUs

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

LPIPS

Download

IncepU (SimVPv1)

100 epoch

2xbs8

12.2M

62.8G

77

41.11

397.1

0.9065

27.46

0.26496

model | log

gSTA (SimVPv2)

100 epoch

2xbs8

15.6M

76.8G

53

45.02

417.8

0.9049

27.04

0.25240

model | log

ViT

100 epoch

2xbs8

12.7M

112.0G

28

56.57

459.3

0.8947

26.19

0.27494

model | log

Swin Transformer

100 epoch

2xbs8

15.3M

75.9G

65

45.72

405.7

0.9039

27.01

0.25178

model | log

Uniformer

100 epoch

2xbs8

11.8M

78.3G

43

44.71

404.6

0.9058

27.16

0.24174

model | log

MLP-Mixer

100 epoch

2xbs8

20.3M

66.6G

34

57.74

517.4

0.8886

25.72

0.28799

model | log

ConvMixer

100 epoch

2xbs8

1.5M

18.3G

175

47.31

446.1

0.8993

26.66

0.28149

model | log

Poolformer

100 epoch

2xbs8

12.4M

63.6G

67

45.44

400.9

0.9065

27.22

0.24763

model | log

ConvNeXt

100 epoch

2xbs8

12.5M

63.9G

72

45.48

428.3

0.9037

26.96

0.26253

model | log

VAN

100 epoch

2xbs8

14.9M

73.8G

55

45.05

409.1

0.9074

27.07

0.23116

model | log

HorNet

100 epoch

2xbs8

15.3M

75.3G

58

46.84

421.2

0.9005

26.80

0.26921

model | log

MogaNet

100 epoch

2xbs8

15.6M

76.7G

48

42.98

418.7

0.9065

27.16

0.25146

model | log

TAU

100 epoch

2xbs8

15.0M

73.8G

55

45.32

421.7

0.9086

27.10

0.22856

model | log

(back to top)

Human 3.6M Benchmarks

We further provide high-resolution benchmark results on Human3.6M dataset using \(4\rightarrow 4\) frames prediction setting. Metrics (MSE, MAE, SSIM, pSNR, LPIPS) of the best models are reported in three trials. We use 256x256 resolutions, similar to STRPM. Parameters (M), FLOPs (G), and V100 inference FPS (s) are also reported for all methods. The default training setup is trained 100 epochs by Adam optimizer with a batch size of 16 and Cosine scheduler (no warm-up) on single GPU or 4GPUs, and we report the used GPU setups for each method (also shown in the config).

STL Benchmarks on Human 3.6M

For a fair comparison of different methods, we report the best results when models are trained to convergence. We provide config files in configs/human.

Method

Setting

GPUs

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

LPIPS

Download

ConvLSTM-S

50 epoch

1xbs16

15.5M

347.0

52

125.5

1566.7

0.9813

33.40

0.03557

model | log

E3D-LSTM

50 epoch

4xbs4

60.9M

542.0

7

143.3

1442.5

0.9803

32.52

0.04133

model | log

PredNet

50 epoch

1xbs16

12.5M

13.7

176

261.9

1625.3

0.9786

31.76

0.03264

model | log

PhyDNet

50 epoch

1xbs16

4.2M

19.1

57

125.7

1614.7

0.9804

39.84

0.03709

model | log

MAU

50 epoch

1xbs16

20.2M

105.0

6

127.3

1577.0

0.9812

33.33

0.03561

model | log

MIM

50 epoch

4xbs4

47.6M

1051.0

17

112.1

1467.1

0.9829

33.97

0.03338

model | log

PredRNN

50 epoch

1xbs16

24.6M

704.0

25

113.2

1458.3

0.9831

33.94

0.03245

model | log

PredRNN++

50 epoch

1xbs16

39.3M

1033.0

18

110.0

1452.2

0.9832

34.02

0.03196

model | log

PredRNN.V2

50 epoch

1xbs16

24.6M

708.0

24

114.9

1484.7

0.9827

33.84

0.03334

model | log

SimVP+IncepU

50 epoch

1xbs16

41.2M

197.0

26

115.8

1511.5

0.9822

33.73

0.03467

model | log

SimVP+gSTA-S

50 epoch

1xbs16

11.3M

74.6

52

108.4

1441.0

0.9834

34.08

0.03224

model | log

TAU

50 epoch

1xbs16

37.6M

182.0

26

113.3

1390.7

0.9839

34.03

0.02783

model | log

Benchmark of MetaFormers Based on SimVP (MetaVP)

Since the hidden Translator in SimVP can be replaced by any Metaformer block which achieves token mixing and channel mixing, we benchmark popular Metaformer architectures on SimVP with 100-epoch training. We provide config files in configs/kth/human.

MetaFormer

Setting

GPUs

Params

FLOPs

FPS

MSE

MAE

SSIM

PSNR

LPIPS

Download

IncepU (SimVPv1)

50 epoch

1xbs16

41.2M

197.0

26

115.8

1511.5

0.9822

33.73

0.03467

model | log

gSTA (SimVPv2)

50 epoch

1xbs16

11.3M

74.6

52

108.4

1441.0

0.9834

34.08

0.03224

model | log

ViT

50 epoch

4xbs4

28.3M

239.0

17

136.3

1603.5

0.9796

33.10

0.03729

model | log

Swin Transformer

50 epoch

1xbs16

38.8M

188.0

28

133.2

1599.7

0.9799

33.16

0.03766

model | log

Uniformer

50 epoch

4xbs4

27.7M

211.0

14

116.3

1497.7

0.9824

33.76

0.03385

model | log

MLP-Mixer

50 epoch

1xbs16

47.0M

164.0

34

125.7

1511.9

0.9819

33.49

0.03417

model | log

ConvMixer

50 epoch

1xbs16

3.1M

39.4

84

115.8

1527.4

0.9822

33.67

0.03436

model | log

Poolformer

50 epoch

1xbs16

31.2M

156.0

30

118.4

1484.1

0.9827

33.78

0.03313

model | log

ConvNeXt

50 epoch

1xbs16

31.4M

157.0

33

113.4

1469.7

0.9828

33.86

0.03305

model | log

VAN

50 epoch

1xbs16

37.5M

182.0

24

111.4

1454.5

0.9831

33.93

0.03335

model | log

HorNet

50 epoch

1xbs16

28.1M

143.0

33

118.1

1481.1

0.9824

33.73

0.03333

model | log

MogaNet

50 epoch

1xbs16

8.6M

63.6

56

109.1

1446.4

0.9834

34.05

0.03163

model | log

TAU

50 epoch

1xbs16

37.6M

182.0

26

113.3

1390.7

0.9839

34.03

0.02783

model | log

(back to top)

Read the Docs v: latest
Versions
latest
stable
Downloads
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.