Track Carbon Emissions with LLMs and Agents¶
Running large language models (LLMs) and AI agents locally with open-source models lets you build intelligent applications without relying on cloud APIs. CodeCarbon measures the carbon impact of running these local models and agents, helping you understand the environmental cost of inference and reasoning tasks.
Installation¶
pip install codecarbon transformers torch
Running Local Model Inference¶
Here's how to track the carbon emissions of running inference with a local language model:
# mktestdocs: skip
from transformers import AutoTokenizer, AutoModelForCausalLM
from codecarbon import EmissionsTracker
import torch
# Load a lightweight open-source model
model_name = "HuggingFaceTB/SmolLM2-135M-Instruct"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name)
# Move to GPU if available
device = "cuda" if torch.cuda.is_available() else "cpu"
model = model.to(device)
# Track inference emissions
with EmissionsTracker() as tracker:
messages = [
{"role": "user", "content": "What are the benefits of renewable energy?"}
]
# Format input for the model
inputs = tokenizer.apply_chat_template(
messages, tokenize=True, add_generation_prompt=True, return_tensors="pt"
).to(device)
# Generate response
outputs = model.generate(inputs, max_new_tokens=100, temperature=0.7)
response = tokenizer.decode(outputs[0])
print(f"Inference emissions: {tracker.final_emissions:.6f} kg CO2eq")
print(f"Model response: {response}")
What Gets Logged¶
When you run the example above, CodeCarbon creates an emissions.csv file in your working directory with columns including:
timestamp: when the measurement was takenduration: how long the inference tookemissions: CO2 in kgenergy_kwh: energy consumed in kilowatt-hourscpu_power: CPU power in wattsgpu_power: GPU power in watts (if the model is running on GPU)
Comparing Different Models¶
You can measure the carbon impact of different model sizes or model architectures:
# mktestdocs: skip
models = [
"HuggingFaceTB/SmolLM2-135M-Instruct", # 135M parameters
"HuggingFaceTB/SmolLM-360M-Instruct", # 360M parameters
]
prompt = "Explain machine learning in one sentence."
for model_name in models:
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to(device)
with EmissionsTracker(save_file_path=f"emissions_{model_name.split('/')[-1]}.csv") as tracker:
inputs = tokenizer(prompt, return_tensors="pt").to(device)
outputs = model.generate(inputs["input_ids"], max_new_tokens=50)
print(f"{model_name}: {tracker.final_emissions:.6f} kg CO2eq")
Benefits of Local Models¶
- Privacy: No data leaves your machine
- Cost: No API charges or rate limits
- Control: Full visibility into model behavior and resource usage
- Sustainability: Run efficient open-source models aligned with your carbon budget
Finding Lightweight Models¶
Popular lightweight open-source models for local inference: - SmolLM2 family – 135M to 1.7B parameters, fast and efficient - Phi family – Compact models with strong performance - Mistral – Small but capable models - TinyLlama – 1.1B parameter model, ideal for edge devices
Smaller models consume less energy while still providing useful inference capabilities.
Comparing Model Efficiency¶
To understand the trade-offs between different models, batch sizes, and their carbon impact, see Comparing Model Efficiency. You can apply the same patterns to compare different open-source models running locally.
Next Steps¶
- Configure CodeCarbon to customize tracking behavior
- Send emissions data to the cloud to visualize inference emissions across multiple runs
- Explore other frameworks in HuggingFace Transformers or Diffusers