LEARN — STATISTICAL MODELS

Linear Regression

Can past returns predict direction?

Imagine drawing the best possible line through a cloud of scattered dots. The line captures the general direction — but the dots still scatter. That's what regression does with market returns.

Linear regression is one of the simplest machine learning models. The idea: take what happened over the last N days and use it to predict what might happen next. The model finds the best-fit relationship between past returns (features) and the direction of the next return.

This is where the classic methods we explored earlier meet data-driven prediction. Same data, different question.

Instead of a fixed rule (“go long when the average crosses”), the model learns its own rule from the data. That sounds powerful. But it comes with a critical danger: overfitting.

How it works

Take the last N days of returns (these are the “lags” or features)
The model finds a linear combination of these lags that best explains the sign of the next return
If the prediction is positive → the model would signal an upward move
If the prediction is negative → it would signal a downward move
The model is trained on a portion of the data and tested on the rest — to see if what it learned generalizes

Educational content only — not investment advice, recommendations, or a suggestion to act. Past performance is not indicative of future results. Your decisions are your own. Full disclaimer.

The danger: overfitting

Overfitting happens when a model memorizes the noise in the data instead of learning genuine patterns. It's the most important concept in machine learning — and a common reason predictive models underperform on new data.

In-sample (training data)

The model has “seen” this data. Performance here is often impressive — but misleading. The model had the answers while studying.

Out-of-sample (test data)

The model has never seen this data. This is the real test. If performance drops significantly, the model was memorizing noise, not learning patterns.

The key insight: When someone claims their model achieves 95% accuracy, the first question should always be: in-sample or out-of-sample? The train/test split slider below illustrates this effect.

See it in action

Pick a ticker, adjust the number of lags and the train/test split to see how a linear regression model would have historically performed. Watch how in-sample and out-of-sample results diverge.

Loading SPY data...

What to notice:

The shaded area is the training period — the model typically looks great here
Move the train/test split — watch how out-of-sample performance changes dramatically
Increase lags — more features can improve in-sample results but often hurt out-of-sample (overfitting)
Compare the two metric rows — the gap between in-sample and out-of-sample tells you how much the model is overfitting

Your turn

Consider claims made by trading tools or services about prediction accuracy. Did they show in-sample or out-of-sample results?

The lesson isn't that models are useless — it's that understanding how they're tested is as important as the prediction itself. A model that overfits is worse than no model at all, because it gives false confidence.

Reflect in your Journal

What you've learned

-Linear regression uses past returns (lags) as features to predict the direction of the next return — it learns its own rule from the data.
-Overfitting is the central danger: a model that memorizes noise will look great on training data but fail on new data.
-The train/test split is how you evaluate a model honestly — always ask whether results are in-sample or out-of-sample.
-More features (lags) don’t always mean better predictions — complexity without signal is just memorized noise.

Want to test this?

Many experienced investors suggest practicing with a paper money account on a reputable broker before risking real capital. Many brokers offer free simulated trading environments where you can test strategies with real market data and no financial risk.

Paper trading lets you build confidence, understand execution, and see how a strategy behaves in real time — without the emotional weight of real money on the line.

Important

Everything on this platform is educational and didactic in nature. We do not provide investment advice, financial advisory, or recommendations to buy or sell any financial instrument. Past performance is not indicative of future results. All strategies shown are historical simulations for learning purposes only. Always do your own research and consult a qualified financial advisor before making investment decisions.

Previous: Trend Strength & Oscillators Next: Machine Learning

We're educators, not advisors. We don't make buy or sell recommendations under any circumstance. All content is for educational purposes. Past performance doesn't guarantee future results. Your decisions are your own. Disclaimer