Model Comparisons¶

The following table shows the different electricity consumption of popular NLP and Computer Vision models:

Model	GPU	Training Time (H)	Consumption (kWh)
BERT fintetune	4 V100	6	3.1
BERT pretrain	8 V100	36	37.3
6B Transf.	256 A100	192	13,812.4
Dense 121	1 P40	0.3	0.02
Dense 169	1 P40	0.3	0.03
Dense 201	1 P40	0.4	0.04
ViT Tiny	1 V100	19	1.7
ViT Small	1 V100	19	2.2
ViT Base	1 V100	21	4.7
ViT Large	4 V100	90	93.3
ViT Huge	4 V100	216	237.6

Electricity consumption of AI cloud instance

Impact of time of year and region¶

Carbon emissions that would be emitted from training BERT (language modeling on 8 V100s for 36 hours) in different locations:

Models emissions comparison

In this case study, time of year might not be relevant in most cases, but localisation can have a great impact on carbon emissions.

Here, and in the graph below, emissions equivalent are estimated using Microsoft Azure cloud tools. CodeCarbon has developed its own measuring tools. The result could be different.

Comparisons¶

Emissions for the 11 described models can be displayed as below:

Models emissions comparison

The black line represents the average emissions (across regions and time of year). The light blue represents the first and fourth quartiles. On the right side, equivalent sources of emissions are displayed as comparison points (source : US Environmental Protection Agency). NB : presented on a log scale

References¶

Measuring the Carbon intensity of AI in Cloud Instance

Another source comparing models carbon intensity: Energy and Policy Considerations for Deep Learning in NLP