Forecasting in R: Probability Bins for Time-Series Data

This time-series.R script, below, takes a set of historical time series data and does a walk using the forecast period to generate probabilistic outcomes from the data set.

Input file is a csv file with two columns (Date, Value) with dates in reverse chronological order and in ISO-8601 format. Like so:

2019-08-06,1.73                                                                
2019-08-05,1.75                                                                
2019-08-02,1.86

Output is as follows:

0.466: Bin 1 - <1.7
0.328: Bin 2 - 1.7 to <=1.9
0.144: Bin 3 - 1.9+ to <2.1
0.045: Bin 4 - 2.1 to <=2.3
0.017: Bin 5 - 2.3+

Note: Patterns in data sets will skew results. A 20-year upward trend will make higher probabilities more likely. A volatile 5-year period will produce more conservative predictions and may not capture recent trends or a recent change in direction of movement.

R Script

# time-series.R 
# Original: December 4, 2018
# Last revised: December 4, 2018

#################################################
# Description: This script is for running any 
# sequence of historical time-series data to make 
# a forecast for five values by a particular date.
# Assumes a cvs file with two columns (Date, Value) 
# with dates in reverse chronological order and in
# ISO-8601 format. Like so:
#
# 2019-08-06,1.73                                                                
# 2019-08-05,1.75                                                                
# 2019-08-02,1.86

#Clear memory and set string option for reading in data:
rm(list=ls())
gc()

  #################################################
  # Function
  time-series <- function(time_path="./path/file.csv", 
                        closing_date="2020-01-01", trading_days=5, 
                         bin1=1.7, bin2=1.9, 
                         bin3=2.1, bin4=2.3) {

  #################################################
  # Libraries
  #
  # Load libraries. If library X is not installed
  # you can install it with this command at the R prompt:
  # install.packages('X') 

  # Determine how many days until end of question
  todays_date <- Sys.Date()
  closing_date <- as.Date(closing_date)
  remaining_weeks <- as.numeric(difftime(closing_date, todays_date, units = "weeks"))
  remaining_weeks <- round(remaining_weeks, digits=0)
  non_trading_days <- (7 - trading_days) * remaining_weeks
  day_difference <- as.numeric(difftime(closing_date, todays_date))
  remaining_days <- day_difference - non_trading_days 

  #################################################
  # Import & Parse
  # Point to time series data file and import it.
  time_import <- read.csv(time_path, header=FALSE) 
  colnames(time_import) <- c("date", "value")

  # Setting data types
  time_import$date <- as.Date(time_import$date)
  time_import$value <- as.vector(time_import$value)

  # Setting most recent value, assuming descending data
  current_value <- time_import[1,2]

  # Get the length of time_import$value and shorten it by remaining_days
  time_rows = length(time_import$value) - remaining_days

  # Create a dataframe
  time_calc <- NULL

  # Iterate through value and subtract the difference 
  # from the row remaining days away.
  for (i in 1:time_rows) {
    time_calc[i] <- time_import$value[i] - time_import$value[i+remaining_days]
  }

  # Adjusted against current values to match time_calc
  adj_bin1 <- bin1 - current_value
  adj_bin2 <- bin2 - current_value
  adj_bin3 <- bin3 - current_value 
  adj_bin4 <- bin4 - current_value 

  # Determine how many trading days fall in each question bin
  prob1 <- round(sum(time_calc<adj_bin1)/length(time_calc), digits = 3)
  prob2 <- round(sum(time_calc>=adj_bin1 & time_calc<=adj_bin2)/length(time_calc), digits = 3)
  prob3 <- round(sum(time_calc>adj_bin2 & time_calc<adj_bin3)/length(time_calc), digits = 3)
  prob4 <- round(sum(time_calc>=adj_bin3 & time_calc<=adj_bin4)/length(time_calc), digits = 3)
  prob5 <- round(sum(time_calc>adj_bin4)/length(time_calc), digits = 3)
  
  ###############################################
  # Print results
  return(cat(paste0(prob1, ": Bin 1 - ", "<", bin1, "\n",
                  prob2, ": Bin 2 - ", bin1, " to <=", bin2, "\n", 
                  prob3, ": Bin 3 - ", bin2, "+ to <", bin3, "\n", 
                  prob4, ": Bin 4 - ", bin3, " to <=", bin4, "\n", 
                  prob5, ": Bin 5 - ", bin4, "+", "\n")))
}

Learn to Program With Common Lisp

Tim Ferriss has a currently popular blog post, “Ten Lessons I Learned While Teaching Myself to Code,” that I’ve seen mentioned in a few places. While it is largely good advice, there is one point that is wrong. It does matter what language you learn. Here’s the ten lessons from his article:

  1. The online world is your friend, start there.
  2. Don’t stress over what language to pick.
  3. Code every day.
  4. Automate your life.
  5. Prepare for constant, grinding frustration.
  6. Build things. Build lots of things.
  7. “View Source”: Take other people’s code, pick it apart, and reuse it.
  8. Build things for you—code you need and want.
  9. Learn how to learn.
  10. Reach out to other coders.

The programming language you choose matters. If what you are doing is trivial, then yes, you can use any programming language. To quote from Paul Graham’s essay, “Revenge of the Nerds“:

“The disadvantage of believing that all programming languages are equivalent is that it’s not true. But the advantage is that it makes your life a lot simpler. And I think that’s the main reason the idea is so widespread. It is a comfortable idea…There are, of course, projects where the choice of programming language doesn’t matter much. As a rule, the more demanding the application, the more leverage you get from using a powerful language. But plenty of projects are not demanding at all. Most programming probably consists of writing little glue programs, and for little glue programs you can use any language that you’re already familiar with and that has good libraries for whatever you need to do. If you just need to feed data from one Windows app to another, sure, use Visual Basic.”

Tim Ferriss is writing trivial programs. So, for his use case, the choice of language is irrelevant. It might be for your use case as well.

But, computer languages are not the same. They have different strengths and weaknesses. For example, in this blog post, a professional programmer discusses why Rust is not a good replacement for C. The short version, Rust is a young language that isn’t stable, and it lacks features older languages have, such as a specification.

Some languages are simply more powerful, mature and give you more options. If you are going to go through the trouble to learning how to program, why not ground yourself in a language with more capabilities?

Your first choice of programming language is going to shape how you think about programming. It can take a long time to broaden your sense of the possible if you pick a language with limited features when you first start learning.

There are many good, powerful programming languages. Arguments can be made for any language you like. However, Paul Graham’s article mentioned above makes a good case that Lisp is a very powerful language. In Pascal Costanza’s Highly Opinionated Guide to Lisp, there’s an interesting observation:

…”Lisp is, in some sense, the mother of all [computer programming] languages…the mindset of Lisp asserts that expressive power is the single most important property of a programming language. Nothing should get in your way when you want to use that power. A programming language should not impose its view of the world on the programmer. It is the programmer who should be able to adapt the language to his/her needs, and not the other way around.”

Lisp is old. It’s stable. It is powerful. Textbooks teaching it date back to the early 1990s. But, they are still relevant and can be bought for almost nothing. So, why not learn Common Lisp? It’s a good question. The most common answer is that computer programming languages are subject to fads and new languages are more popular. But, given Lisp’s flexibility, it’s difficult to make the case that they are better.

Ok, suppose for a moment I’ve convinced you. Now, the question is: how do you go about learning Common Lisp? Last year, Steve Losh provided an answer. He put together a blog post called “A Road to Common Lisp,” that explains the Lisp language in detail and how he went about learning it. In short:

  1. Install SBCL or if you’re using MacOS and want a single GUI app you can download from the App Store, choose Clozure CL. My preference is for Emacs, SBCL, quicklisp and SLIME (tutorial), but you should use an editor that is comfortable to you that can balance parentheses, highlight comments and strings, and autoindent Lisp code. Atom can be a good choice if you haven’t use a text editor before. But, it would be better if you learned either Emacs or Vim.
  2. Read Common Lisp: A Gentle Introduction to Symbolic Computation (<$2 on Alibris, online for free). Do all the exercises and grok in fullness. This book is aimed at intelligent beginners with no prior programming knowledge. Take your time and noodle with this text. It’s going to take time to develop a fluency, and it’s a different style of thinking.
  3. Then read Practical Common Lisp (<$25 on Alibris, online for free). Work through some of this book every day until you complete it. Type in the code if it is going to help you understand it. It doesn’t have exercises.
  4. Write some code for something easy, e.g., Project Euler.
  5. Then, flesh out your understanding further with Paradigms of Artificial Intelligence Programming (<$25 on Alibris).
  6. Now, you’re ready to program something serious and need a weighty reference text, i.e., Common Lisp Recipes (<$50 on Alibris).
  7. Finally, I suspect that Patterns of Software: Tales from the Software Community (<$10 on Alibris) is recommended because it’s a help with understanding how to work on software projects, but the purpose of the recommendation is kept unclear to prevent spoilers.

This strikes me as a better alternative than a $11,000 boot camp. Depending on how much you want to learn, you can get the tools you need to learn Common Lisp to the level you want for somewhere between $0-$150. All the software is available at no cost, and it will run on any computer you already own. Then, if you want to branch out and learn Python, Javascript, or some other language, you’ll have an excellent foundation. However, if you try learning Python or Javascript and then try to learn Lisp or another more powerful language, you’ll find it to be a much more difficult task.

Good luck!