The First Framework Choice#
You’re about to start your first machine learning project, or maybe you’ve done a few tutorials and now you’re facing a real problem. PyTorch? TensorFlow? Scikit-learn?
The answer depends on what you’re building. Let me show you the practical differences with actual code.
Scenario 1: Predicting Customer Churn (Start Here)#
You have customer data: usage patterns, support tickets, payment history. You need to predict who might cancel their subscription.
This is a classic use case for scikit-learn.
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score, classification_report
# Load your data
df = pd.read_csv('customer_data.csv')
# Prepare features and target
X = df[['usage_hours', 'support_tickets', 'account_age', 'last_payment_days']]
y = df['churned']
# Split data
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
# Train model
model = RandomForestClassifier(n_estimators=100, max_depth=10)
model.fit(X_train, y_train)
# Predict and evaluate
predictions = model.predict(X_test)
print(f"Accuracy: {accuracy_score(y_test, predictions):.2f}")
print(classification_report(y_test, predictions))
# See which features matter most
feature_importance = pd.DataFrame({
'feature': X.columns,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
print(feature_importance)
Why this works: The problem is straightforward prediction from structured data. Random forests handle this well and you can explain the results to stakeholders by showing feature importance.
Installation is simple:
uv pip install scikit-learn pandas
No GPU needed. This runs on your laptop or a basic server.
Scenario 2: Building a Custom Image Classifier (Move to PyTorch)#
Now you need to classify product images for quality control. You have thousands of labeled images and need to distinguish between “good” and “defective” products.
This is where PyTorch makes sense.
import torch
import torch.nn as nn
from torchvision import transforms, models
from PIL import Image
# 1. Define Transforms (Resize images to match model input)
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
# 2. Load pre-trained model (Updated for 2025 syntax)
# We use 'DEFAULT' weights to get the best available pre-trained version
model = models.resnet18(weights='DEFAULT')
# 3. Modify the final layer for our 2 classes (Good vs Defective)
model.fc = nn.Linear(model.fc.in_features, 2)
# 4. Move to GPU if available
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = model.to(device)
# 5. Inference Function
def predict(model, image_path):
model.eval() # Set to evaluation mode
# Load and transform image
image = Image.open(image_path)
image_tensor = transform(image).unsqueeze(0).to(device)
with torch.no_grad():
output = model(image_tensor)
_, predicted = torch.max(output, 1)
return "Good" if predicted.item() == 0 else "Defective"
Why PyTorch here: Image classification needs deep learning. PyTorch lets you start with pre-trained models (like ResNet) and adapt them to your specific use case. This is called transfer learning and it’s way faster than training from scratch.
Installation:
uv pip install torch torchvision
# Note: uv handles platform resolution automatically
Common Mistakes I See Constantly#
Mistake 1: Using PyTorch for Simple Problems#
# Bad: Using PyTorch for simple regression
import torch
class LinearModel(torch.nn.Module):
# ... 50 lines of boilerplate code ...
# Good: Using scikit-learn for simple regression
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(X_train, y_train)
If your problem can be solved with traditional ML, use scikit-learn. You’ll ship faster and maintain it easier.
Mistake 2: Not Using Pre-trained Models#
Many developers try to train neural networks from scratch. This wastes time and GPU resources.
# Bad: Training from scratch with limited data
model = MyCustomCNN()
# Trains for days, gets mediocre results
# Good: Using transfer learning
model = models.resnet50(weights='DEFAULT')
# Freeze most layers, train only the last few
for param in model.parameters():
param.requires_grad = False
model.fc = nn.Linear(model.fc.in_features, num_classes)
# Trains in hours, gets better results
Pre-trained models learned from millions of images. You’re adapting that knowledge to your specific problem.
Mistake 3: Ignoring Data Preparation#
The framework doesn’t matter if your data is messy.
# Always check your data first
print(df.isnull().sum()) # Missing values?
print(df.dtypes) # Correct data types?
print(df.describe()) # Distributions make sense?
# Handle missing data
df = df.fillna(df.mean()) # Or drop rows, depending on context
# Scale features for neural networks
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
I’ve seen teams spend weeks debugging models when the real problem was unscaled features or missing value handling.
Mistake 4: Not Saving Models Properly#
You trained a model for hours. Then your script crashes. Did you save it?
# Scikit-learn
import joblib
joblib.dump(model, 'model.pkl')
model = joblib.load('model.pkl')
# PyTorch
torch.save(model.state_dict(), 'model.pth')
model.load_state_dict(torch.load('model.pth'))
model.eval() # Important: set to evaluation mode
Save checkpoints during training. Save the final model. Save the preprocessing steps (scalers, encoders).
When to Switch from Scikit-learn to PyTorch#
Switch when:
- Your accuracy plateaus after trying different algorithms and tuning hyperparameters
- You’re working with images, text, or audio where deep learning excels
- You need to adapt recent research that has PyTorch implementations
- You have enough data - deep learning needs more than traditional ML (1000+ examples)
What About TensorFlow?#
TensorFlow syntax is now similar to PyTorch after adopting eager execution. Choose it if your company already uses it or you’re heavily invested in Google Cloud Platform. For new projects, PyTorch has better community support, more examples and cleaner debugging.
Your Learning Path#
Weeks 1-2: Start with scikit-learn. Build a classifier, try different algorithms, learn train/test splitting and evaluation metrics.
Weeks 3-4: Master data preparation. Practice cleaning, preprocessing and feature engineering.
Weeks 5-8: Add PyTorch if needed. Use pre-trained models with transfer learning first, then build one custom model to understand fundamentals.
Deployment Considerations#
Scikit-learn: Easiest. Pickle your model, load it in a Flask/FastAPI app.
PyTorch: Export to ONNX format, use BentoML or Triton Inference Server, or deploy to cloud services (AWS SageMaker, Google Vertex AI, Azure ML).
Start simple. Get something working locally before production optimization.
Final Advice#
Don’t learn all three frameworks at once. Pick one based on your immediate problem:
- Structured data with clear patterns → Scikit-learn
- Images, complex text, or custom AI models → PyTorch
- Existing TensorFlow codebase → TensorFlow
Master one framework well before exploring others. The concepts transfer between them anyway.
And remember: the framework is just a tool. What matters is solving real problems with reliable systems that you can actually deploy and maintain.
Start simple, iterate quickly and add complexity only when you genuinely need it.

