Looking forward to your new thread. Sooner rather than later I hope, because I already have questions
I've been trying to get a mysql backend for recording tick data for a long time, this may be exactly that. But I wanted to make it publicly accessible (futures.io (formerly BMT) Elite only). So maybe we can work together to make that happen, the general idea was a central tick repository that others can pull from.
Due to time constraints, please do not PM me if your question can be resolved or answered on the forum.
Need help? 1) Stop changing things. No new indicators, charts, or methods. Be consistent with what is in front of you first. 2) Start a journal and post to it daily with the trades you made to show your strengths and weaknesses. 3) Set goals for yourself to reach daily. Make them about how you trade, not how much money you make. 4) Accept responsibility for your actions. Stop looking elsewhere to explain away poor performance. 5) Where to start as a trader? Watch this webinar and read this thread for hundreds of questions and answers. 6) Help using the forum? Watch this video to learn general tips on using the site.
If you want to support our community, become an Elite Member.
What I have so far is a long way from something that I'd ever expose to the internet. Robustness has taken a backseat to getting something that works fairly quickly, but the underpinnings are probably there. Design problem I can foresee for something that serves data to a lot of clients are: 1. That it spawns a new thread for each client, so it might not scale well over a hundred or so concurrent clients. 2. The overly simple protocol has no way to deal with faulty requests or anything.
I opted against a relational database, and am using Berkeley DB (which is what MySQL used to use for it's internal storage). BDB stores data via key/data pairs, this works great for ticks, with a key of (symbol,time) and data of (price, bid, ask, volume), duplicate keys are allowed. For simplicity in implementation I just have 1 database right now, but it will be easy to change to have numerous internal DBs for symbols, or symbols and subsets of the time like year/month, but that'd be completely transparent to the client.
The socket interface is very simple, 3 requests read, realtime write, and back fill. Only one response from the server, containing data from a read request (1..N per request). Real time and back fill are the same except backfill first deletes all the data for that exact symbol and time stamp. Read requests take a symbol and start time stamp, it then streams all the data after that time stamp for the given symbol in blocks that each contain all the ticks for that exact time stamp.
This design pushes some buffering responsibility to a back fill writer, because it has to send all the ticks for a unique time stamp at once or it deletes it's previously sent data. Readers get N responses each one with all the data for a given time stamp.
Right now I'm working on shoehorning a client into a GomFileManager subclass so I can do some debugging and testing of the server. It takes a bit of kludging because the recordTick interface to that class only provides the price and tick type, so I have to rebuild fake ask and bid prices. I also need to understand the read back use case a little better, and that might lead to changes in the read request/response interface.
The following user says Thank You to danjurgens for this post:
They store the bid-ask-last tick data, but store it separately so the sequence of ticks is not preserved. I presume that is because NT can draw the charts with either of these values, and some one made the architectural decision to tie the db to the application!
They could very easily fix this so that a single tick db has all the three types of ticks with the proper sequential time-stamp with a very simple filter but they chose not too. So the tick data downloaded by NT is not useful to backfill.
HOWEVER, they also have the infrastructure for replay which DOES store and capture tick data (both L1 and L2) in the right sequence. However that infrastructure can be used only for replay and can not drive your indicators.
I presume they are simply overloaded and understaffed and different teams did their own thing without regards to the big picture. NT is only software I know off which maintains two separate mutually incompatible dbs for tick data. But they still can not offer chart backfill capability (OnMarketData) for historical values.
On your project: The biggest challenge I have is that you can not update your database while you are collecting the ticks. So you have to find a time-window when the ticks are not coming to fill gaps in your gomi data base.
There is a similar issue with your db also. How will you guarantee that you do not create duplicate entries when you try to update the db with historical data for those times your tick collector was not working.? One way to approach this would be to define sectors (say 5 minutes of tick data), and use that as the minimum unit which can be updated. So when you are reloading historical data, your application should send data in chunks of 5 min (or multiples of 5min) and then the db will treat that data as the golden copy and over write any existing data for those 5 minutes with the new data. On the other hand the live tick data will not check for duplicates and simply insert.
The following user says Thank You to aviat72 for this post:
There will be no issue writing backfill and real time to the DB at anytime. All updates and reads from the DB will be by unique time stamp (seconds from the market, although I see gomi has a MS option). Berkeley DB supports transactions, readers will have a consistent view of the data on a per time stamp basis as they'll lock each block while reading it to prevent a write to that data, so writers block until readers release their lock.
I have two thoughts about concurrent real time writers. Either the DB server will only accept real time writes from the first connected client to request real time writing and other requests will be dumped as No-Ops. Or it will always delete the data for a key (symbol/time stamp) before writing the new data. Plan 2 would require the writer to buffer all ticks for a time stamp before sending, that way multiple writers would just overwrite each others data until the last ones data persists, not the most efficient, but workable. Whatever route I go will all be encapsulated in the client side interface I'm exposing to GomFileManager.
The use case of this DB is frequent writes to the end with rare burst reads of large amounts data across all keys, and rare burst writes. The only place a real time writer and reader conflict is at the last time stamp, so that should keep lock blocking quite minimal. Doing a simultaneous backfill while reading could have higher contention, but should still be quite reasonable since the amount of locks needing to be acquired to write or read is small.
The issue is that time stamps are not guaranteed to be unique. You can not assume that two trade ticks with exactly the same time-stamp are one and the same trade; they may correspond to different trades. Of course if the data-vendor supplies some unique tick identification mechanism you should be able to avoid the conflict. So you may end up with duplicate copies for the same tick if you have a live writer and also a backfill application which dumps historical data asynchronously.
This of course depends on the time-stamp resolution. With the 1second resolution you are virtually guaranteed to have non-unique time-stamps; even with the ms resolution, the time-stamps are not guaranteed to be unique for separate trades.
That is why I felt that backfills should over-write live data and should also be executed with some minimum chunk size.
BDB accepts duplicate keys and duplicate key/data pairs just fine. You just have to use a construct they call a cursor to access all the data with duplicate keys. The conflict between realtime writers and backfill writers resolves itself when you force the backfill writer to provide all the tick data for each unique time stamp at once. Then you just delete and replace all the data for each time stamp with the back filled data. It's theoretically possible you would lose some ticks if you back fill up to the current real time second, but I don't see that being possible in my initial implementation. To download back fill for the current partial second and get it processed and to the server while a real time writer is still writing that tick is race the user can avoid by not back filling up to right now, just go a second back.
I agree... if you take a look at how MultiCharts is dealing with their customers (setting up a site to ask for suggestions regarding their new DOM feature), you can tell that they are going to eat NinjaTrader's lunch in the not too distant future, once their feature set catches up. NT has a suggestion board too, too bad they rarely listen or do anything that their customers actually suggest... it's a lot of "my way or the highway" with them.
NT is only getting away with their ridiculously bad level of customer responsiveness and horribly delayed development timelines because there hasn't been a decently priced competitor for a while... hopefully MultiCharts can change that. I don't blame Big Mike for switching, as soon as they add some discretionary trading features I am thinking about it myself.