Tuesday, June 08, 2010

read LAS data with R

Finding software to read LAS files can be difficult, support for 3D data is still hard to come by.

R's support for the "raw" data type provides a very neat way of reading data from binary files, such as LAS files. These files have a header containing metadata that is used to specify reading from the bulk data in the file. Essentially, you can read large sets of records at once into a raw matrix with readBin and then use the indexing tools to extract the relevant bytes for each element of the record.

This means that the bulk read is fast, and processing is also vectorized as the raw matrix can then be read in index pieces by readBin. I've done this in a simplistic way in the following functions.

  • "publicHeaderDescription" - function returns a list generated to hold information about the header 
  • "readLAS" - read file function

It works for some LAS files that I have and returns a matrix of record values. Certainly it works fast and efficiently enough for smallish files. I really don't have experience with the likely range of sizes of these files.  I've included "skip" and "nrows" arguments to allow for chunked/partial reading.

The remaining work I see involves:

  • wrap the "chunked" read so large files can be processed in parts
  •  return only specified attributes from each record (can be x, y, z, gpstime, intensity)
  • extract all components of the records, and generalize for LAS 1.1, 1.2, ... (there is minor handling of gpstime presence or absence, but nothing else)
  •  re-organize the header read and arrangement (what I've done was easy enough for me to understand, but it's a bit of a mess)
  • (no doubt many more general improvements I haven't thought of)

To use:

## sample LAS file from here: http://liblas.org/samples/
f <- "Lincoln.las"

## [1] 185.5660 Mb
  lasdata <- readLAS(f)
# user system elapsed

# 4.04 0.62 6.93
# [1] 9278073 4

#[1] "x" "y" "z" "intensity"

Optionally the list of header components (some extracted as actual data, some left in their "raw" form) can be returned instead of data using "returnHeaderOnly=TRUE".