forexfactory calendar downloader - Matlab, R project and Python | futures.io
futures io futures trading


forexfactory calendar downloader
Started: by enjoyaol Views / Replies:4,988 / 4
Last Reply: Attachments:0

Welcome to futures io.

Welcome, Guest!

This forum was established to help traders (especially futures traders) by openly sharing indicators, strategies, methods, trading journals and discussing the psychology of trading.

We are fundamentally different than most other trading forums:
  • We work extremely hard to keep things positive on our forums.
  • We do not tolerate rude behavior, trolling, or vendor advertising in posts.
  • We firmly believe in openness and encourage sharing. The holy grail is within you, it is not something tangible you can download.
  • We expect our members to participate and become a part of the community. Help yourself by helping others.


You'll need to register in order to view the content of the threads and start contributing to our community. It's free and simple, and we will never resell your private information.

-- Big Mike

Reply
 
Thread Tools Search this Thread
 

forexfactory calendar downloader

  #1 (permalink)
Elite Member
Paris, France
 
Futures Experience: Intermediate
Platform: MT4, Amibroker, Custom
Favorite Futures: EUR/USD
 
Posts: 44 since Jan 2012
Thanks: 5 given, 23 received

forexfactory calendar downloader

hi,

Here is my forex factory calendar downloader. It creates a CSV file containing historical events from forexfactory.
It's in python and uses lxml, it's a good start for those who never made web scrapping before. Code is quite clean, but hasn't any real error management yet.

Also, it creates a 'raw' CSV view of what is available on the website. It's not filling out @NA data, doesn't try to be smart about the data. I intend to add some 'smart' behaviour during the import inside the SQL database.

Have fun.

https://www.dropbox.com/s/mmcjejumucq1mli/ff.py

 
Code
from __future__ import unicode_literals
import codecs
import pprint
import lxml.html
import mechanize
import cookielib

#some utils
pp = pprint.PrettyPrinter()


#########################
#variables
#########################
START_YEAR = 2008
END_YEAR = 2013
URL = r"http://www.forexfactory.com/calendar.php?month="
OUTFILE = r"events.csv"
#########################


#our month list for the URL
monthslist = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]

#sets up the browser
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

#set correct timezone
br.open("http://www.forexfactory.com/timezone.php")
formindex = 0
for form in br.forms():
    if "timezone.php" in form.action:
        form["timezoneoffset"] = ["0"]
        break
    formindex += 1

br.select_form(nr=formindex)
br.submit()


def getData(html, outfile):
    """
    Gets data from one page of events
    """
    root = lxml.html.fromstring(html)
    lines = root.find_class("calendar_row")
    curWeekDay = None
    curMonthDay = None
    for event in lines:
        date = event.xpath("td[@class='date']")[0]

        #get the day of the month
        weekDay = date.xpath("span")
        monthDay = date.xpath("span/span")
        if len(weekDay) > 0:
            curWeekDay = weekDay[0].text
            curMonthDay = monthDay[0].text

        #get the time
        time = event.xpath("td[@class='time']")[0].text if (len(event.xpath("td[@class='time']")) > 0) else ""

        #get currency
        currency = event.xpath("td[@class='currency']")[0].text if len(event.xpath("td[@class='currency']")) else ""

        #get impact
        impact = event.xpath("td[@class='impact']/span/@title")[0]\
            if len(event.xpath("td[@class='impact']/span/@title")) else ""

        #get name of event
        nevent = event.xpath("td[@class='event']/span")[0].text if len(event.xpath("td[@class='event']/span")) > 0 else ""

        #get actual
        actual = event.xpath("td[@class='actual']")[0].text if len(event.xpath("td[@class='actual']")) else ""
        #retry if actual is in a span (can happen if they colorize it)
        if actual is None or len(actual.strip()) == 0:
            actual = event.xpath("td[@class='actual']/span")[0].text if len(event.xpath("td[@class='actual']/span")) else ""
        actual = actual.strip().replace("\n", " ") if actual is not None else ""

        #get forecast
        forecast = event.xpath("td[@class='forecast']")[0].text if len(event.xpath("td[@class='forecast']")) else ""
        #retry if forecast is in a span (can happen if they colorize it)
        if forecast is None or len(forecast.strip()) == 0:
            forecast = event.xpath("td[@class='forecast']/span")[0].text if len(event.xpath("td[@class='forecast']/span")) else ""
        forecast = forecast.strip().replace("\n", " ") if forecast is not None else ""

        #get previous
        previous = event.xpath("td[@class='previous']")[0].text if len(event.xpath("td[@class='previous']")) else ""
        #retry if previous is in a span (can happen if they colorize it)
        if previous is None or len(previous.strip()) == 0:
            previous = event.xpath("td[@class='previous']/span")[0].text if len(event.xpath("td[@class='previous']/span")) else ""

        previous = previous.strip().replace("\n", " ") if previous is not None else ""

        outfile.write("{};{};{};{};{};{};{};{}\n".format(curMonthDay, time, currency, impact, nevent, actual, forecast, previous))


year = START_YEAR
outfile = open(OUTFILE, "w")
while year <= END_YEAR:
    for month in monthslist:
        url = "{}{}.{}".format(URL, month, year)
        print("Getting {} {} from {}".format(month, year, url))
        br.open(url)
        html = br.response().read()
        getData(html, outfile)
    year += 1
outfile.close()

Reply With Quote
The following 9 users say Thank You to enjoyaol for this post:
 
  #2 (permalink)
Quick Summary
Quick Summary Post

Quick Summary is created and edited by users like you... Add FAQ's, Links and other Relevant Information by clicking the edit button in the lower right hand corner of this message.

 
  #3 (permalink)
Elite Member
Cary, NC
 
Futures Experience: Advanced
Platform: MC, NT, Python, R
Broker/Data: FXCM, IB, Oanda, IQFeed
Favorite Futures: 6E, CL, EUR/USD
 
Fu510n's Avatar
 
Posts: 94 since Oct 2009
Thanks: 720 given, 77 received

Updated


I tried running the original code but it appears that Forex Factory changed the HTML output that broke the Python XPath parsing so I tweaked the code (below) and it seems to be working "better" now. The timezone adjustment logic seemed to be skewing the times incorrectly so I simply disabled that for now.

Further tweaking may be required but thought I'd pass mine along,
-Guy

 
Code
#!/usr/bin/env python

from __future__ import unicode_literals
import sys
# import codecs
import pprint
import lxml.html
import mechanize
import cookielib

# some utils
pp = pprint.PrettyPrinter()
debug = 0


#########################
# variables
#########################
START_YEAR = 2015
END_YEAR = 2016
URL = r"http://www.forexfactory.com/calendar.php?month="
OUTFILE = r"events.csv"
#########################


# our month list for the URL
monthslist = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]

# sets up the browser
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

# set correct timezone
br.open("http://www.forexfactory.com/timezone.php")
formindex = 0
for form in br.forms():
    if "timezone.php" in form.action:
        form["timezoneoffset"] = ["0"]
        break
    formindex += 1

br.select_form(nr=formindex)
# br.submit()


def getData(html, outfile):
    """
    Gets data from one page of events
    """
    root = lxml.html.fromstring(html)
    lines = root.find_class("calendar__row calendar_row calendar__row--grey")
    # curWeekDay = None
    curMonthDay = None
    time = curTime = ""
    # pp.pprint(lines)
    for event in lines:
        # pp.pprint(event)
        if len(event.xpath("td[@class='calendar__cell calendar__date date']")) > 0:
            date = event.xpath("td[@class='calendar__cell calendar__date date']")[0]
        else:
            sys.exit("BOOM")

        # get the day of the month
        weekDay = date.xpath("span")
        monthDay = date.xpath("span/span")
        if len(weekDay) > 0:
            # curWeekDay = weekDay[0].text
            # print "curWeekDay=[" + curWeekDay + "]"
            curMonthDay = monthDay[0].text
            if debug:
                print "curMonthDay=[" + curMonthDay + "]"

        # get the time
        curTime = time
        time = event.xpath("td[contains(@class, 'calendar__time')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__time')]")) else ""
        if time == '' or time == None:
            time = curTime
        if debug:
            print "time=[" + str(time) + "]"

        # get currency
        currency = event.xpath("td[contains(@class, 'calendar__currency')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__currency')]")) else ""
        if debug:
            print "currency=[" + currency + "]"

        # get impact
        impact = event.xpath("td[contains(@class, 'calendar__impact')]/div/span/@title")[0] if len(event.xpath("td[contains(@class, 'calendar__impact')]/div/span/@title")) else ""
        if debug:
            print "impact=[" + impact + "]"

        # get name of event
        nevent = event.xpath("td[contains(@class, 'calendar__event')]/div/span")[0].text if len(event.xpath("td[contains(@class, 'calendar__event')]/div/span")) else ""
        if debug:
            print "nevent=[" + nevent + "]"

        # get actual
        actual = event.xpath("td[contains(@class, 'calendar__actual')]/span")[0].text if len(event.xpath("td[contains(@class, 'calendar__actual')]/span")) else ""

        # retry if actual is in a span (can happen if they colorize it)
        # if actual is None or len(actual.strip()) == 0:
        #     actual = event.xpath("td[@class='actual']/span")[0].text if len(event.xpath("td[@class='actual']/span")) else ""
        actual = actual.strip().replace("\n", " ") if actual is not None else ""
        if debug:
            print "actual=[" + actual + "]"

        # get forecast
        forecast = event.xpath("td[contains(@class, 'calendar__forecast')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__forecast')]")) else ""
        # retry if forecast is in a span (can happen if they colorize it)
        # if forecast is None or len(forecast.strip()) == 0:
        #    forecast = event.xpath("td[@class='forecast']/span")[0].text if len(event.xpath("td[@class='forecast']/span")) else ""
        forecast = forecast.strip().replace("\n", " ") if forecast is not None else ""
        if debug:
            print "forecast=[" + forecast + "]"

        # get previous
        previous = event.xpath("td[contains(@class, 'calendar__previous')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__previous')]")) else ""
        # retry if previous is in a span (can happen if they colorize it)
        if previous is None or len(previous.strip()) == 0:
            previous = event.xpath("td[contains(@class, 'calendar__previous')]/span")[0].text if len(event.xpath("td[contains(@class, 'calendar__previous')]/span")) else ""
        previous = previous.strip().replace("\n", " ") if previous is not None else ""
        if debug:
            print "previous=[" + previous + "]\n"

        outfile.write("{};{};{};{};{};{};{};{}\n".format(curMonthDay, time, currency, impact, nevent, actual, forecast, previous))


year = START_YEAR
outfile = open(OUTFILE, "w")
while year <= END_YEAR:
    for month in monthslist:
        url = "{}{}.{}".format(URL, month, year)
        print("Getting {} {} from {}".format(month, year, url))
        br.open(url)
        html = br.response().read()
        getData(html, outfile)
    year += 1
outfile.close()

Reply With Quote
The following user says Thank You to Fu510n for this post:
 
  #4 (permalink)
Trading Apprentice
London, UK
 
Futures Experience: Master
Platform: London hedge fund
Broker/Data: Goldman Sachs
Favorite Futures: Crude CL
 
wintergasp's Avatar
 
Posts: 29 since Sep 2016
Thanks: 1 given, 29 received

Yes, they change their HTML often, I tried to parse many things from investing.com and forex factory in the past and after a while it always break.

I ended up writing my own Calendar Events method.... when you look at it closely, there are maybe 10 public holiday in the US, 10 in the UK, and most macro events are like "first wednsday of the month" kind of schedule. For Fed talks and other government talks, most Fed websites and ECB website provide you with an XML feed of events that you can pull once a week or so.

Reply With Quote
 
  #5 (permalink)
Elite Member
Cary, NC
 
Futures Experience: Advanced
Platform: MC, NT, Python, R
Broker/Data: FXCM, IB, Oanda, IQFeed
Favorite Futures: 6E, CL, EUR/USD
 
Fu510n's Avatar
 
Posts: 94 since Oct 2009
Thanks: 720 given, 77 received

Re-updated with week/month options

Futures Edge on FIO

What value do you place on the webinars on FIO?

 
# ffcal.py -?
ffcal.py <-h> <-f {filename}> <-w {this|next|mmmdd.yyyy}> <-m {this|next|mmm.yyyy}>

-h : display usage
-f {filename} : direct output to file instead of stdout (default)
-w {this|next|mmmdd.yyyy} : output specific week
-m {this|next|mmm.yyyy} : output specific month

If you don't specify a specific week/month, it will walk through all months for the current year.

# ffcal.py -w this
Getting this from Forex Calendar @ Forex Factory (stderr)
Jan 8;5:30pm;AUD;Low Impact Expected;AIG Construction Index;;;46.6
Jan 8;All Day;JPY;Non-Economic;Bank Holiday;;;
Jan 8;7:30pm;AUD;Medium Impact Expected;Building Approvals m/m;7.0%;4.6%;-11.8%
Jan 8;7:30pm;AUD;Low Impact Expected;ANZ Job Advertisements m/m;;;1.6%
...

 
Code
#!/usr/bin/env python

from __future__ import unicode_literals
import sys
import datetime
import getopt
# import codecs
import pprint
import lxml.html
import mechanize
import cookielib

# some utils
pp = pprint.PrettyPrinter()
debug = 0


#########################
# variables
#########################
START_YEAR = datetime.datetime.now().year
END_YEAR = START_YEAR
WEEKURL = r"http://www.forexfactory.com/calendar.php?week="
MONTHURL = r"http://www.forexfactory.com/calendar.php?month="
#OUTFILE = r"events.csv"
USAGE = "ffcal.py <-h> <-f {filename}> <-w {this|next|mmmdd.yyyy}> <-m {this|next|mmm.yyyy}>\n"
#########################


# our month list for the URL
monthslist = ["jan", "feb", "mar", "apr", "may", "jun", "jul", "aug", "sep", "oct", "nov", "dec"]

# sets up the browser
br = mechanize.Browser()
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]

# set correct timezone
br.open("http://www.forexfactory.com/timezone.php")
formindex = 0
for form in br.forms():
    if "timezone.php" in form.action:
        form["timezoneoffset"] = ["0"]
        break
    formindex += 1

br.select_form(nr=formindex)
# br.submit()


def getData(html, outfile):
    """
    Gets data from one page of events
    """
    root = lxml.html.fromstring(html)
    #lines = root.find_class("calendar__row calendar_row calendar__row--grey")
    #if not lines:
    lines = root.find_class("calendar__row calendar_row")

    # curWeekDay = None
    curMonthDay = None
    time = curTime = ""
    # pp.pprint(lines)
    for event in lines:
        # pp.pprint(event)
        if len(event.xpath("td[@class='calendar__cell calendar__date date']")) > 0:
            date = event.xpath("td[@class='calendar__cell calendar__date date']")[0]
        else:
            sys.exit("BOOM")

        # get the day of the month
        weekDay = date.xpath("span")
        monthDay = date.xpath("span/span")
        if len(weekDay) > 0:
            # curWeekDay = weekDay[0].text
            # print "curWeekDay=[" + curWeekDay + "]"
            curMonthDay = monthDay[0].text
            if debug:
                print "curMonthDay=[" + curMonthDay + "]"

        # get the time
        curTime = time
        time = event.xpath("td[contains(@class, 'calendar__time')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__time')]")) else ""
        if time == '' or time == None:
            time = curTime
        if debug:
            print "time=[" + str(time) + "]"

        # get currency
        currency = event.xpath("td[contains(@class, 'calendar__currency')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__currency')]")) else ""
        if currency == None:
            continue
        if debug:
            print "currency=[" + currency + "]"

        # get impact
        impact = event.xpath("td[contains(@class, 'calendar__impact')]/div/span/@title")[0] if len(event.xpath("td[contains(@class, 'calendar__impact')]/div/span/@title")) else ""
        if debug:
            print "impact=[" + impact + "]"

        # get name of event
        nevent = event.xpath("td[contains(@class, 'calendar__event')]/div/span")[0].text if len(event.xpath("td[contains(@class, 'calendar__event')]/div/span")) else ""
        if debug:
            print "nevent=[" + nevent + "]"

        # get actual
        actual = event.xpath("td[contains(@class, 'calendar__actual')]/span")[0].text if len(event.xpath("td[contains(@class, 'calendar__actual')]/span")) else ""

        # retry if actual is in a span (can happen if they colorize it)
        # if actual is None or len(actual.strip()) == 0:
        #     actual = event.xpath("td[@class='actual']/span")[0].text if len(event.xpath("td[@class='actual']/span")) else ""
        actual = actual.strip().replace("\n", " ") if actual is not None else ""
        if debug:
            print "actual=[" + actual + "]"

        # get forecast
        forecast = event.xpath("td[contains(@class, 'calendar__forecast')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__forecast')]")) else ""
        # retry if forecast is in a span (can happen if they colorize it)
        # if forecast is None or len(forecast.strip()) == 0:
        #    forecast = event.xpath("td[@class='forecast']/span")[0].text if len(event.xpath("td[@class='forecast']/span")) else ""
        forecast = forecast.strip().replace("\n", " ") if forecast is not None else ""
        if debug:
            print "forecast=[" + forecast + "]"

        # get previous
        previous = event.xpath("td[contains(@class, 'calendar__previous')]")[0].text if len(event.xpath("td[contains(@class, 'calendar__previous')]")) else ""
        # retry if previous is in a span (can happen if they colorize it)
        if previous is None or len(previous.strip()) == 0:
            previous = event.xpath("td[contains(@class, 'calendar__previous')]/span")[0].text if len(event.xpath("td[contains(@class, 'calendar__previous')]/span")) else ""
        previous = previous.strip().replace("\n", " ") if previous is not None else ""
        if debug:
            print "previous=[" + previous + "]\n"

        outfile.write("{};{};{};{};{};{};{};{}\n".format(curMonthDay, time, currency, impact, nevent, actual, forecast, previous))


OUTFILE = ""

try:
    opts, args = getopt.getopt(sys.argv[1:], "f:hm:w:")
except getopt.GetoptError:
    sys.stderr.write(USAGE)
    sys.exit(2)

for opt, arg in opts:

    if opt == "-h":
        sys.stderr.write(USAGE)
        sys.exit()

    if opt == "-f":
        OUTFILE = arg
    elif opt == "-w" or opt == "-m":
        outfile = open(OUTFILE, "w") if OUTFILE != "" else sys.stdout
        if opt == "-w":
            url = "{}{}".format(WEEKURL, arg)
        else:
            url = "{}{}".format(MONTHURL, arg)
        sys.stderr.write("Getting {} from {}\n".format(arg, url))
        br.open(url)
        html = br.response().read()
        getData(html, outfile)
        if outfile is not sys.stdout:
            outfile.close()
        sys.exit()

year = START_YEAR
outfile = open(OUTFILE, "w") if OUTFILE != "" else sys.stdout
while year <= END_YEAR:
    for month in monthslist:
        url = "{}{}.{}".format(MONTHURL, month, year)
        sys.stderr.write("Getting {} {} from {}\n".format(month.title(), year, url))
        br.open(url)
        html = br.response().read()
        getData(html, outfile)
    year += 1
if outfile is not sys.stdout:
    outfile.close()

Reply With Quote
The following user says Thank You to Fu510n for this post:

Reply



futures io > > > > forexfactory calendar downloader

Thread Tools Search this Thread
Search this Thread:

Advanced Search



Upcoming Webinars and Events (4:30PM ET unless noted)
 

Mastering Trading Pysychology w/Brett Steenbarger & Trading Technologies

Elite only

Spring Grains Outlook w/Sean Lusk @ Walsh Trading

Elite only

FIO Video Journal Challenge featuring NinjaTrader ($2,000+ of prizes)

April

Process above all else w/Anthony Crudele @ Futures Radio Show

Elite only

Machine Learning - Quantitative Trading w/Martin Froehler @ Quantiacs

Elite only

Ask Me Anything w/Patrick Rooney @ Trading Technologies

Apr 18

Ask Me Anything w/FuturesTrader71

Apr 19

Machine Learning w/Kris Longmore

Elite only

Market Analysis w/Dave Forss

Apr 25

Introducing iSystems with Stage 5 Trading

Apr 27
     

Similar Threads
Thread Thread Starter Forum Replies Last Post
Economic Events Downloader (news) Nicolas11 Platforms and Indicators 25 March 16th, 2017 03:17 AM
chard downloader NT7 snusnufreak NinjaTrader 3 December 10th, 2013 04:02 PM
Market Analyzer as a Historical Downloader? LostTrader NinjaTrader 2 February 17th, 2013 09:49 PM
NT7 excel downloader tinkerz NinjaTrader 0 February 16th, 2013 04:11 PM
Data Downloader StockJock AmiBroker 22 April 8th, 2012 01:44 PM


All times are GMT -4. The time now is 10:34 PM.

no new posts
Page generated 2017-03-22 in 0.09 seconds with 19 queries on phoenix via your IP 54.163.117.182