Blog

Sample Weights

How to use sample weights to address the problem that observations are not generated by (IID) processes.

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Denoising

Explain a procedure for reducing the noise and enhancing the signal included in an empirical covariance matrix.

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Labeling

What are the numerous sorts of labelling techniques and how they differ?

Danial Nowroozi

Sat Mar 05 2022

Summary

We spoke about how to create a $X$ -dimensional matrix of financial variables from an unstructured dataset. Unsupervised learning algorithms can learn patterns from that matrix $X$ , such as whether or not it has hierarchical clusters. Supervised learning techniques, on the other hand, require that the rows in $X$ be associated with an array of labels or values $y$ , so that those labels or values may be predicted on unseen feature samples. We'll talk about how to classify financial data in this section.

Structural Breaks

Structural breaks, like the transition from one market regime to another, is one example of such a confluence that is of particular interest.

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Entropy Features

When markets are not perfect, prices are formed with partial information, and as some agents know more than others.

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Portfolio Construction

In practice, mean-variance optimal solutions tend to be concentrated and unstable. How to deal with the instability caused by the noise contained in the covariance matrix?

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Financial Bars

What are the numerous sorts of bars and how do they differ? What is the purpose of information-driven bars?

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Backtesting Dangers

A common misunderstanding is to think of backtesting as a research tool. Researching and backtesting is like drinking and driving.

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Cross Validation

CV is yet another instance where standard ML techniques fail when applied to financial problems. Overfitting will take place, and CV will not be able to detect it.

Danial Nowroozi

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Ensemble Learning

What makes Ensemble Methods effective, and how to avoid common errors that lead to their misuse in finance.place, and CV will not be able to detect it.

Danial Nowroozi

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Microstructural Features

The level of detail contained in FIX messages provides researchers with the ability to understand how market participants conceal and reveal their intentions.

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

What's the distinction between organized and unstructured data? We will learn how to work with unstructured financial data and then transform it into a structured dataset that can be used by $mathrmML$ algorithms. In general, you should avoid consuming someone else's processed dataset because you will most likely find what someone else already knows or will figure out shortly. Ideally, your starting point will be a collection of unstructured, raw data that you will analyse in order to generate relevant characteristics.

Feature Importance

Why repeating of backtest may fail? and how to prevent it. Why repeating a test over and over on the same data will likely lead to a discovery?

S.Alireza Mousavizade

Sat Mar 05 2022

Summary

One of the most common errors in financial research is taking some data, running it through an ML algorithm, backtesting the predictions, and repeating the process until a nice-looking backtest appears. Such pseudo-discoveries abound in academic journals, and even significant hedge funds are prone to falling into this trap. It makes no difference if the backtest is an out-of-sample walk-forward. The fact that we are repeating a test on the same data will almost certainly result in a discovery. This methodological error is so well-known among statisticians that the American Statistical Association warns against it in its ethical guidelines (American Statistical Association [2016], Discussion #4). It usually takes around 20 iterations to find a () investment strategy with a standard significance level ( positive rate) of 5%.