The Data Doodle

A blog about visualizing, analyzing, and managing data

BankerDoodle Intro

BankerDoodle is an application built on top of the DataDoodler platform. It inherits all the analytical goodness of DataDoodler, but contains features specific to the banking industry. It is a closed/commercial product that will be the initial means to monetize the DataDoodler platform. It is conceivable that any number of applications (open or closed) will be built on top of the DataDoodler platform.

Read More

Skills Demo - A Few Examples

This document is intended to be a guided tour through some of my code. It will provide some insight into my skill set, coding style, and overall approach to developing apps. Scroll through this docuement to find the following examples:

  • fdic-sdi-manager (node-based ETL with lots of cool es6 features)
  • Blooming Menu Directive (angular directive, animation)
  • Address Verification Directive (angular directive, promise-based DOM manipulation)

Read More

How can the data for a game be obtained and neatened?

I built this package for compatibility with chess.com, so contact me if you want me to use it for other sources. Everything required for this demo can be found in my chessDoodles workspace

1
2
3
4
5
# clear workspace
rm(list=ls())

# load workspace
load("chessDoodles.RData")

A link to a chess.com game always ends with a game ID. Here are two examples:

So the ID of the game to assess can be used to construct the link to the game.

1
2
3
4
5
6
linkIDs <- c(127609996, 131764454)

Links <- paste("chess.com/echess/game?id=",
linkIDs,
sep = "")
Links
1
2
## [1] "chess.com/echess/game?id=127609996"
## [2] "chess.com/echess/game?id=131764454"

These games can be viewed publicly, but scraping the pgn for each requires a username and password. I think it’s easiest to embed the password into the link with the format http://username:password@chess.com/…”.

1
2
3
4
5
6
7
8
9
10
11
12
13
# store username and password
Username <- "thinkboolean"
Password <- "blogChess" # counting on you not to abuse this; feel free to contact me for details

# finish constructing links
Links <- paste("http://",
Username,
":",
Password,
"@",
Links,
sep = "")
Links
1
2
## [1] "http://thinkboolean:blogChess@chess.com/echess/game?id=127609996"
## [2] "http://thinkboolean:blogChess@chess.com/echess/game?id=131764454"

The rawToTidy function scrapes the title and pgn of a chessLink.

1
2
3
4
5
games <- rawToTidy(Links[1])
for(Link in Links[-1]){
games <- rbind(games, rawToTidy(Link))
}
games
1
2
3
4
5
6
##                                                   Title
## 1 thinkboolean vs risoarcher - Online Chess - Chess.com
## 2 risoarcher vs thinkboolean - Online Chess - Chess.com
## pgn
## 1 1.e4 e5 2.Qf3 Nc6 3.b3 Nf6 4.Nc3 d6 5.Bc4 Nd4 6.Qd3 Bg4 7.Nge2 Bxe2 8.Nxe2 Nxe2 9.Qxe2 Be7 10.Qf3 O-O 11.Bb2 c5 12.O-O Qd7 13.a4 a6 14.Rfe1 b5 15.axb5 axb5 16.Rxa8 Rxa8 17.h4 Qg4 18.Bd5 Qxf3 19.gxf3 Nxd5 20.exd5 Bxh4 21.Re4 Bg5 22.d3 Ra2 23.Bc3 Bf6 24.Re2 g5 25.Kg2 Kg7 26.Be1 h5 27.Kg3 Kg6 28.b4 c4 29.dxc4 bxc4 30.b5 Rb2 31.Re4 Rxb5 32.Rxc4 Rxd5 33.Bb4 Rd4 34.Rxd4 exd4 35.Bxd6 Kf5 36.Bb4 Be5 37.Kg2 Kf4 38.Bd2 Kf5 39.Kh3 f6 40.Kg2 Bf4 41.Bb4 Be5 42.Kf1 g4 43.fxg4 hxg4 44.Ke2 Ke4 45.Kd1
## 2 1.e4 e5 2.f4 exf4 3.d4 Qe7 4.Bxf4 Qxe4 5.Qe2 Qxe2 6.Bxe2 d6 7.Nf3 Nc6 8.a3 Nf6 9.Nc3 Bf5 10.O-O-O

This provides a data frame in which each row is a game. The tidyToPgn function can split a game into a move-by-move data frame.

1
2
3
4
5
6
7
8
# first game:
moves <- tidyToPgn(tidyDF = games, n=1)

# other games:
for(n in 2:length(games)){
moves <- rbind(moves, tidyToPgn(tidyDF = games, n = n))
}
head(moves)
1
2
3
4
5
6
7
##      gameID  pgn      
## [1,] "beta1" "1.e4"
## [2,] "beta1" "1...e5"
## [3,] "beta1" "2.Qf3"
## [4,] "beta1" "2...Nc6"
## [5,] "beta1" "3.b3"
## [6,] "beta1" "3...Nf6"
1
tail(moves)
1
2
3
4
5
6
7
##        gameID  pgn       
## [103,] "beta2" "7...Nc6"
## [104,] "beta2" "8.a3"
## [105,] "beta2" "8...Nf6"
## [106,] "beta2" "9.Nc3"
## [107,] "beta2" "9...Bf5"
## [108,] "beta2" "10.O-O-O"

It avoids ambiguity between white and black moves by giving white moves a single dot, “.”, and black moves three dots, “…”.

How can chess positions be kept track of?

I keep track of chess positions using a data frame of 64 variables, one for each square of the chessboard. An example of an empty chessboard can be generated with the empty function:

1
2
3
4
load("chessDoodles.RData")

position <- empty(gameName = "example")
position
1
2
3
4
5
6
7
8
##               a8 a7 a6 a5 a4 a3 a2 a1 b8 b7 b6 b5 b4 b3 b2 b1 c8 c7 c6 c5
## example_empty NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## c4 c3 c2 c1 d8 d7 d6 d5 d4 d3 d2 d1 e8 e7 e6 e5 e4 e3 e2 e1
## example_empty NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## f8 f7 f6 f5 f4 f3 f2 f1 g8 g7 g6 g5 g4 g3 g2 g1 h8 h7 h6 h5
## example_empty NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## h4 h3 h2 h1
## example_empty NA NA NA NA

Note that columns are named after squares of the chessboard, and rows are named in gameName_move format. An example of a chessboard where the pieces have not been moved can be generated with the setup function (which internally calls the empty function):

1
2
position <- setup(gameName = "game1")
position[,c("d1","e1","d8","e8")]
1
2
3
##                      d1         e1          d8         e8
## game1_empty <NA> <NA> <NA> <NA>
## game1_zero white Queen white King black Queen black King

The position object is built row-by-row using the newPosition function.

1
2
3
newRow <- newPosition(new_pgn = "game1_1.e4", startPosition = nrow(position))
position <- rbind(position,newRow)
position[,c("e2","e4")]
1
2
3
4
##                     e2         e4
## game1_empty <NA> <NA>
## game1_zero white pawn <NA>
## game1_1.e4 <NA> white pawn

This position object is referenced by all the position-oriented functions I have written.

Node App Configuration

Applications that are deployed to various physical and logical environments (AWS, Heroku, Dev, Prod, Local, Test, Debug, Language, TodayDate, etc…) will need a way to dynamically adapt to their current environment. The path to a file on AWS is definitely different than a path to a file on a disconnected development box.

A note about supplying credentials in an application -

Credentials for accessing outside resources (databases, s3 buckets, deployment services, etc…) should not be included in any source file (config or hardcoded in app) that will be checked-in to a source control system.

Config File

A smart module for handling config files is necessary. You can create your own, or just use one that has already been created and tested, like nodejs-config.
npm install nodejs-config

Environment Variables

Sometimes, using environment variables is the way to go. Like when the application needs credentials data such as username and password.

Introduction to ChessDoodle Functions

This workspace is being developed into a package geared to help the user to explore and think about the chessboard.

1
2
3
4
5
# clear workspace
rm(list=ls())

# load workspace
load("chessDoodles.RData")

The regex patterns interpret strings.

1
ls(pattern = "Pattern")
1
2
3
4
5
6
##  [1] "bishopPattern"    "blackPattern"     "capturePattern"  
## [4] "castlePattern" "castleQPattern" "checkmatePattern"
## [7] "checkPattern" "colorPatterns" "kingPattern"
## [10] "knightPattern" "pawnPattern" "pgnPattern"
## [13] "piecePatterns" "queenPattern" "rookPattern"
## [16] "whitePattern"

The pathPrior() function is named after the term a priori. For a given input piece and square of the chessboard, it returns the path the piece can take on an empty chessboard, prior to the circumstance that arise when a game is in play.

1
pathPrior(piece = "bishop", square = "e5")
1
##  [1] "a1" "b2" "b8" "c3" "c7" "d4" "d6" "f6" "f4" "g7" "g3" "h8" "h2"
1
pathPrior(piece = "knight", square = "e5")
1
## [1] "f7" "d7" "f3" "d3" "g6" "c6" "g4" "c4"

The pawn is the only piece unable to move backwards, and its direction depends on its color.

1
pathPrior(piece = "black pawn", square = "e5")
1
## [1] "e4" "d4" "f4"
1
pathPrior(piece = "white pawn", square = "e5")
1
## [1] "e6" "d6" "f6"

It is the only piece whose color needs to be specified for pathPrior.() to work effectively. Post.() functions are conversely named after the term a posteriori. They trim each piece’s options according to the circumstances of the entire board.

1
ls(pattern = "Post")
1
2
3
## [1] "bishopPost."   "kingPost."     "knightPost."   "mobilityPost."
## [5] "pathPost." "pawnPost." "piecePost." "queenPost."
## [9] "rookPost."

They cannot be used unless a snapshot of the chessboard’s position exists for them to analyze. The snapshots exist as rows of the position data frame, which has one variable for each square of the chessboard. Each variable in position is a square of the chessboard,

1
colnames(position)
1
2
3
4
5
##  [1] "a8" "a7" "a6" "a5" "a4" "a3" "a2" "a1" "b8" "b7" "b6" "b5" "b4" "b3"
## [15] "b2" "b1" "c8" "c7" "c6" "c5" "c4" "c3" "c2" "c1" "d8" "d7" "d6" "d5"
## [29] "d4" "d3" "d2" "d1" "e8" "e7" "e6" "e5" "e4" "e3" "e2" "e1" "f8" "f7"
## [43] "f6" "f5" "f4" "f3" "f2" "f1" "g8" "g7" "g6" "g5" "g4" "g3" "g2" "g1"
## [57] "h8" "h7" "h6" "h5" "h4" "h3" "h2" "h1"

and each observation is a snapshot of the game:

1
2
3
set.seed(123); 
position[sample(2:nrow(position),1),
sample(1:ncol(position),10)]
1
2
3
4
##            c6         g7         d7   g3         h8   a6         d2   g6
## 000_zero <NA> black pawn black pawn <NA> black Rook <NA> white pawn <NA>
## h7 h3
## 000_zero black pawn <NA>

The first row is the empty chessboard, with no pieces listed for any of the squares. The second row is the zero position, with pieces set at starting positions.

We can compare the mobility of a white knight on b1 before and after the pieces are set up:

1
pathPrior(piece = "knight", square = "b1")
1
## [1] "c3" "a3" "d2"
1
pathPost.(square = "b1", game_pgn = "000_zero")
1
## [1] "c3" "a3"

Because of the game_pgn input, pathPost. knows that the piece on h2 is a white knight, and that it cannot move to d2 because that space is occupied by another pawn.

Let us begin by observing the game I am currently playing. We can modify the link into the format http://username:password@chess.com/…” and pass it to the rawToTidy() function as follows:

1
2
3
4
5
6
7
8
9
10
LinkID <- 131764454
Username <- "thinkboolean"
Password <- "blogChess" # counting on you not to abuse this; feel free to contact me for details
Link <- paste("http://",
Username,
":",
Password,
"@chess.com/echess/game?id=",
LinkID)
print(Link)
1
## [1] "http:// thinkboolean : blogChess @chess.com/echess/game?id= 131764454"
1
2
games <- rawToTidy(Link)
print(games)
1
2
3
4
##                                                   Title
## 1 risoarcher vs thinkboolean - Online Chess - Chess.com
## pgn
## 1 1.e4 e5 2.f4 exf4 3.d4 Qe7 4.Bxf4 Qxe4 5.Qe2 Qxe2 6.Bxe2 d6 7.Nf3 Nc6 8.a3 Nf6 9.Nc3

Now we can expand this pgn from a single string to a data frame of moves using tidyToPgn().

1
2
moves <- as.data.frame(tidyToPgn(games,n=1, string = "xmpl"))
print(moves)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
##    gameID      pgn
## 1 xmpl1 1.e4
## 2 xmpl1 1...e5
## 3 xmpl1 2.f4
## 4 xmpl1 2...exf4
## 5 xmpl1 3.d4
## 6 xmpl1 3...Qe7
## 7 xmpl1 4.Bxf4
## 8 xmpl1 4...Qxe4
## 9 xmpl1 5.Qe2
## 10 xmpl1 5...Qxe2
## 11 xmpl1 6.Bxe2
## 12 xmpl1 6...d6
## 13 xmpl1 7.Nf3
## 14 xmpl1 7...Nc6
## 15 xmpl1 8.a3
## 16 xmpl1 8...Nf6
## 17 xmpl1 9.Nc3

The first move is pawn to e4. Inputing it into the newPosition() function replicates the last (or specified) row of the position frame, places the white pawn on e4, and removes it from e2.

1
position[,c("e2","e4")]
1
2
3
##                   e2   e4
## 000_empty <NA> <NA>
## 000_zero white pawn <NA>
1
2
3
newPosition(new_pgn = "1.e4", 
startPosition = nrow(position))[,
c("e2","e4")]
1
2
##      e2         e4
## 1.e4 NA white pawn

newPosition() is designed to sequentially build on the position object.

1
2
3
4
5
6
7
8
9
for(n in 1:nrow(moves)){
pgn <- paste("xmpl1",
as.vector(moves[n,"pgn"]),
sep="_")
if(!pgn %in% row.names(position)){
position<-rbind(position, newPosition(new_pgn = pgn))
}
}
rownames(position)
1
2
3
4
5
##  [1] "000_empty"      "000_zero"       "xmpl1_1.e4"     "xmpl1_1...e5"  
## [5] "xmpl1_2.f4" "xmpl1_2...exf4" "xmpl1_3.d4" "xmpl1_3...Qe7"
## [9] "xmpl1_4.Bxf4" "xmpl1_4...Qxe4" "xmpl1_5.Qe2" "xmpl1_5...Qxe2"
## [13] "xmpl1_6.Bxe2" "xmpl1_6...d6" "xmpl1_7.Nf3" "xmpl1_7...Nc6"
## [17] "xmpl1_8.a3" "xmpl1_8...Nf6" "xmpl1_9.Nc3"

These functions will provide the analytic muscle for future blogging analyses.

Learning Path Roadmap

I’m eager to have my kids help with building DataDoodler. I believe it is a great opportunity for them to learn valuable skills and be involved in building a tool that will help them in their schooling years and beyond. I needed some sort of matrix to guide me in what to teach them. The embedded chart below describes the skill path required to create the various projects in the DataDoodler platform. I am starting my kids in the 100-level courses. Of course, my oldest child, Josh, has a different contribution to make in the area of data science / R programming.

Read More