Saturday, February 23, 2013

Simulating Population Growth in Cities Using R

R is great for anyone who wants to get started on learning Simulation. (Both Discrete Event or Agent-based, with stochastic elements in the process.)

This post is inspired by Matt Asher's "quick-and-dirty" R simulation work on Population Growth. Matt uses it to create aRt. I felt that his core idea provided a very good framework to introduce various concepts of simulation.

The Basic Idea behind this Simulation

We take an area (it could be a country, state, city) that is made up of units (cells) that can be settled (occupied).
We allow the initial 'pioneers' to settle first. (Seeding the cells)
We then establish some rules for future residents to settle. (Settling Rules)
Plot the area, and collect some stats to study the growth of the population under different starting conditions and different settling rules.

Pseudo-Code for the Simulation

Define parameters: Area Width, Height
Number of Pioneers P and future settlers S

for each pioneer p{ 
  settle p according to seeding rules in a valid cell in the area
}
for each settler S{
  Find an unoccupied Cell.
  Settle S in that cell, if the settling criteria permits it
}
Print Iterations Stats
Print out the Plot of the Area

A Few Changes from Matt's R code

  1. I switched from his matrix to a dataframe. I also switched from his image to ggplot.
  2. Delineated the initial parameters, seeding and settling clearly so that anyone who downloads the code can experiment
  3. Created a few different Seeding Functions for the pioneers (beyond random seeding)
  4. Matt allowed settling if the new cell was adjacent to an occupied cell. I played around with a few different population settling functions.  (rectilinear, diagonal etc).
  5. Added a function to collect a few statistics regarding population growth
  6. Ran the simulation a few 100 times (without plotting) to compare different growth and settling schemes

Color Coding the Population

  • Black cells : Unoccupied
  • Blue cells: Pioneers
  • Orange cells: Settlers who found a cell in 1-2 attempts
  • Yellow cells: Settlers who found a cell after more than 2 scouting attempts
The full R code I used for these R can be found here.

Here are some runs with a 30x30 area.

Here's what one run looked like. (Blue cells are the initial pioneers. The other colors represent the number of steps taken for them to find a home cell.)

Trying Out Different Seeding Schemes (for the pioneers)


As opposed to starting out with randomly strewn cells, what if we started out with a city downtown (a densely populated central rectangle) and let the settlers follow them?

# R-Rows C-cols in the center
seedAreaCenter <- function(r, c){
for(x in as.integer((areaW-c)/2): as.integer((areaW + c)/2)){
for(y in as.integer((areaH-r)/2): as.integer((areaH + r)/2)){
area_df[x,y] <<- 1
}
}
}
#Seed a central area and let it grow
seedAreaCenter(7,7)
view raw gistfile1.r hosted with ❤ by GitHub


I tried a rectangluar ring (annular cells for pioneers) starting point to see how the population grew inside and outside. So I used this seeding function:
# RING of R-Rows C-cols in the center of width w cells
seedCenterRing <- function(r, c, wide){
for(x in as.integer((areaW-c)/2): as.integer((areaW + c)/2)){
for(y in as.integer((areaH-r)/2): as.integer((areaH + r)/2)){
area_df[x,y] <<- 1
}
}
#scoop out the inner ring
for(x in as.integer((areaW-c)/2 + wide): as.integer((areaW + c)/2 - wide)){
for(y in as.integer((areaH-r)/2+wide): as.integer((areaH + r)/2- wide )){
area_df[x,y] <<- 0
}
}
}
view raw gistfile1.r hosted with ❤ by GitHub



One final seeding scheme I tried was to have two thick bands (columns of pioneer cells) and to let the population grow from there. Seeding function:
# w-cols to the R and L of center column
seedColumns <- function(c, w){
startLCol = as.integer( (areaW-c) / 2 )
startRCol = as.integer( (areaW+c) / 2 )
for(y in 1:areaH) {
clist = c((startLCol-w):startLCol, startRCol:(startRCol+w))
lapply(clist, function(x) area_df[x,y] <<- 1)
}
}
#seed two vertical columns of width w
seedColumns(12,2)
view raw gistfile1.r hosted with ❤ by GitHub

 

 

 

 

 

 

Different Population Growth Schemes

The default settling rule that Matt Asher uses is a very good one, one that I like. If a given cell is adjacent to any occupied cell (one of its 8 neighbors) then it becomes a valid cell to settle down in.

Just for experimentation, I tried a few other schemes.

1. Rectilinear Growth

Rule: You can only settle down in the 4 cells that are due N, S, East or West of an occupied cell.

# Are any of the 4 direct N E W S adjacent cells occupied?
isAnyOfNEWSCellsOccupied <- function(m,n) {
canOccupy <- FALSE
NEWSdir = c(2,4,5,7)
#traverse the vector of 4 elements
for(k in 1:4) {
xCheck = m + adjacells[NEWSdir[k],1]
yCheck = n + adjacells[NEWSdir[k],2]
if(!(outOfBounds(xCheck, yCheck))) {
if(area_df[xCheck, yCheck] > 0) {
print(paste("area df of ", xCheck, yCheck, area_df[xCheck, yCheck]))
print(paste("(",m,",",n,")" ,"NEWS neighbor of (", yCheck, "-", xCheck))
canOccupy <- TRUE
}
}
} #end of looping through k
return(canOccupy)
}
view raw gistfile1.r hosted with ❤ by GitHub
2. Diagonal Growth

Rule: A settler can occupy any cell that is diagonally adjacent to another occupied cell.

The code can be found here.

 

 

 

 

 

 

Collecting Stats

Now that we have all the functionality that we wanted, it is time to start collecting some statistics for each run ('replication'). There are many possibilities for what we could collect in a population simulation.
Here are a couple:
  • How many settlers found a 'home cell?'
  • How many scouting attempts did each settler make before they found a home?
store_iteration_stats<- function(kNumSettlers, found.home, max.look.around) {
#how many settlers found homes
print(c(found.home," settled out of ", kNumSettlers, (found.home/kNumSettlers)*100, "%" ) )
#avg # of steps per new settler
print(c("num of look arounds max'ed:", max.look.around))
}
view raw gistfile1.txt hosted with ❤ by GitHub

Making Multiple Replications

 


# Make multiple runs (Replication of simulation) and take the average of stats
st <- data.frame()
st_row<- vector()
for(i in 1:kNumReplications) {
area_df <- resetIteration()
seedAreaWithPioneers(numPioneers,seeding.opt)
simstats <- accommodateSettlers(kNumSettlers, settling.option)
found.home <- simstats[1]
max.look.around <- simstats[2]
#compute for this iterations
st_row <- store_iteration_stats(i, kNumSettlers, found.home, max.look.around)
st <- rbind(st,st_row)
}
view raw gistfile1.txt hosted with ❤ by GitHub
>st
  Iter FoundHome NumSettlers  Percent
1    1       243         350 69.42857
2    2       214         350 61.14286
3    3       213         350 60.85714
4    4       258         350 73.71429
5    5       235         350 67.14286

The full R code I used for these runs can be found here. Feel free to give it a spin and try out your own settling/growth schemes.
Reference: Matt Asher's post on his StatisticsBlog

No comments:

Post a Comment