I've noticed that Big Mike was interested about generating random data a couple years ago. There is a thread about it here in futures.io (formerly BMT) but nobody really seemed to be interested at the time. Anyway, I believe it's a very good idea to have a look at random data (OHLC bars) and see if you can see some patterns there. The fact is if you do then you are only fooled by randomness. They key is to find the difference between real data and random data. If it is possible to see something that is happening in real data that is not present in random data then that can be exploited to our advantage. Also running backtest on automated systems against random data can give some kind of insight about system robustness.
When I first approached this problem I figured I need to make a monte carlo generator where I input different outcomes and probabilities for different outcomes (bins and weights in my code). Bins and weights can be generated from any instrument and monte carlo simulation will behave similarly (similar chances for moves of same amplitude) but there should be no way to make predictions because the data is random. It is easy to generate simple data points (like close prices) using this method but how to generate open, high and low values?
I figured I could make rough simulation of open values measuring gaps (open of day n - close of day (n-1)) and making a monte carlo simulation out of them. Then I can make a monte carlo simulation out of ranges (high - low) and finally randomize position of close in relation to high and low values. There I have all the ingredient neccessary to create pseudo-random data. However, data generated using this method looks very spiky and not realistic.
I realized there is a better way to generate H,L and O data. I gather for example one minute data (only closes) and generate bins and weights from it (these are analogous to ticks). Then I can create random custom bars consisting of N amount of random one minute samples. This allows me to naturally generate high and low bars from those one minute samples. Open value is not very accurate if N is rather small. Also bars don't reflect the actual volatility very well if N is a small number because normally for example 60 minute bar has thousands of ticks instead of 60 ticks. If you want to generate tick bars (using tick data) you will get perfectly accurate data pseudo-random data with this same method.
I have the required scripts for data generation in MATLAB but I can make portable versions if there is enough people interested in this. Probably some of you are using MATLAB here and can make use of my code. I will send some samples of OHLC data for you so you don't have to do any programming at all. Just import the ascii data to your favorite charting program.
I will continue to do some research on this data and I will post here if I find out something interesting.
RandomBarsGenerator contains the neccessary scripts to generate random bars and a readme.txt file that has the instructions how to use those scripts. If you have any questions please let me know.