DeadSimple: Pytorch+Cuda on your Laptop

Can it just f*cking work, please?

5 min readApr 12, 2024

Getting torch and cuda working is ANNOYING. Here’s hopefully a really quick setup for you.

Will CUDA run on CPU/Integrated Graphics?

You can install CUDA runtime, but it will only use the CPU if you don’t have a dedicated NVIDIA graphics card. In the task manager, you should see an NVIDIA device:

I recommend making sure it’s using the latest Nvidia drivers.
If you have an AMD card, you can’t use CUDA.

Dont Install Cuda directly

So you can easily install CUDA onto your laptop, but if you’re using it with pytorch , then guess what

“PyTorch doesn’t use the system’s CUDA library. When you install PyTorch using the precompiled binaries using either pip or conda it is shipped with a copy of the specified version of the CUDA library which is installed locally. In fact, you don't even need to install CUDA on your system to use PyTorch with CUDA support." ~ SO post

Now, obviously, you can do this if you're going to use it outside of a Python dev env.

Install CUDA via Pytorch

There’s a great configurator at https://pytorch.org/get-started/locally/. Let’s you choose your OS, package manager (ie. conda or pip) and cuda version.

This will output the command it install both PyTorch and Cuda and inform PyTorch to compile with the correct Cuda version. Here’s the exact commands I needed to get a working PyTorch setup on Windows:

conda create -n cuda-pytorch-12.1
conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia

It takes a while to download all the packages, but once working, test it simply with,

conda activate cuda-pytorch-12.1
python

>>> import torch
>>> torch.cuda.is_available() 
...
... True

Or Just Pull a Docker Container

There are many Docker containers with PyTorch pre-configured. One that I like is from Chainguard, which has zero vulnerabilities.

docker pull cgr.dev/chainguard/pytorch-cuda12:latest

You can shell into it as follows — note the --gpus flag which binds your host GPUs to the container. As mentioned above, CUDA will ignore integrated graphics, so only your external card will be used.

docker run --rm -it --gpus all cgr.dev/chainguard/pytorch-cuda12:latest

If you have multiple GPUs, you can specify one or more

docker run --rm -it --gpus '"device=0"' cgr.dev/chainguard/pytorch-cuda12:latest

From within the container, you can do the same torch.cuda.is_available() command.

If you want to run an example program, here’s the one I use.

#gpu-test.py
import torch

class IntenseModel(torch.nn.Module):
    def __init__(self):
        super(IntenseModel, self).__init__()

        # Increasing the layer size
        self.linear1 = torch.nn.Linear(1000, 2000)
        self.activation1 = torch.nn.ReLU()
        self.batchnorm1 = torch.nn.BatchNorm1d(2000)  # Adding batch normalization
        self.linear2 = torch.nn.Linear(2000, 1000)
        self.activation2 = torch.nn.ReLU()
        self.batchnorm2 = torch.nn.BatchNorm1d(1000)
        self.linear3 = torch.nn.Linear(1000, 500)
        self.activation3 = torch.nn.ReLU()
        self.linear4 = torch.nn.Linear(500, 10)
        self.softmax = torch.nn.Softmax(dim=1)  # Specify the dimension for Softmax

    def forward(self, x):
        x = self.linear1(x)
        x = self.activation1(x)
        x = self.batchnorm1(x)
        x = self.linear2(x)
        x = self.activation2(x)
        x = self.batchnorm2(x)
        x = self.linear3(x)
        x = self.activation3(x)
        x = self.linear4(x)
        x = self.softmax(x)
        return x

if not torch.cuda.is_available():
    raise RuntimeError("CUDA is not available. This script requires a GPU.")

device = torch.device("cuda")  # Directly set device to CUDA since we've checked its availability

intensemodel = IntenseModel().to(device)

print('The model:')
print(intensemodel)

print('\n\nModel params:')
for param in intensemodel.parameters():
    print(param)

Running this locally in Conda or in the Docker container should result in spikes to your GPU utilization. To run in Docker container, either copy it into the container and shell in and run it manually, or do something like this:

docker run --rm -it /path/to/gpu-test.py:/gpu-test.py --gpus all cgr.dev/chainguard/pytorch-cuda12:latest -c "python /gpu-test.py"

Where /path/to/model_builder.py is the path on your local machine.

When I run this, it results in quick bursts of utilization. Running it in succession results in pulses in task manager:

Bonus — Kill My Laptop!

Here’s a program that will utilize a ton of GPU memory and probably an OOM error.

#gpu-ultra-test.py

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision.transforms as transforms
from torch.utils.data import DataLoader, TensorDataset

class UltraIntenseModel(nn.Module):
    def __init__(self):
        super(UltraIntenseModel, self).__init__()

        # Enhanced convolutional blocks with more layers and features
        self.features = nn.Sequential(
            nn.Conv2d(3, 128, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.Conv2d(128, 256, kernel_size=5, padding=2),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.Conv2d(512, 512, kernel_size=3, padding=1),
            nn.ReLU(),
            nn.MaxPool2d(2, 2),
            nn.Dropout(0.5)
        )

        # Classifier with large fully connected layers
        self.classifier = nn.Sequential(
            nn.Linear(512 * 56 * 56, 4096),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(4096, 1024),
            nn.ReLU(),
            nn.Linear(1024, 10),
            nn.Softmax(dim=1)
        )

    def forward(self, x):
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

# Check if CUDA is available, and crash if not
if not torch.cuda.is_available():
    raise RuntimeError("CUDA is not available. This script requires a GPU.")

device = torch.device("cuda")  # Directly set device to CUDA since we've checked its availability

# Generate random large high-resolution data
input_tensor = torch.randn(64, 3, 224, 224)  # 64 images of 224x224 resolution
target_tensor = torch.randint(0, 10, (64,))

# Data loader
dataset = TensorDataset(input_tensor, target_tensor)
loader = DataLoader(dataset, batch_size=16, shuffle=True)

# Device setting

# Model
model = UltraIntenseModel().to(device)

# Optimizer
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Train the model
model.train()
for epoch in range(10):  # Run more epochs to see sustained GPU usage
    for data, target in loader:
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.cross_entropy(output, target)
        loss.backward()
        optimizer.step()
    print(f"Epoch {epoch+1}, Loss: {loss.item()}")

print("Training complete")

Useful for really making hammering your GPU memory