Fork me on GitHub

home | news | discuss | issues | license

LURE: csv

Utilities for reading CSV data

tim@menzies.us
August'17

  • Read comma seperated data from string of file.
  • Kill spurious white space, comments
  • If lines end in ",", concat with next line.
  • If on row1, the column has a "?" in it, skip that column.
  • Convert cells to numbers or strings (as appropriate).
  • Pass the cells from each line to a function fn.

Sample usage:

local csv=require "csv"
csv(os.getenv("DataDir") .. "/data/weather.csv",
   print)
  • If source is a string ending with txt or csv, it assumed this string names a files from which we should read the data.
  • Otherwise, we assume that source is a string containing the data.
local the      = require "config"
local notsep   = "([^" .. the.sep .. "]+)" -- not cell seperator
local dull     = "['\"\t\n\r]*"  -- white space, quotes
local padding  = "%s*(.-)%s*"    -- space around words
local comments = "#.*"           -- comments

Control functions

If we are reading from a string that ends in any of these extensions, then we will assume the string is a file name.

local files    = {txt=true, csv=true}

If lines end with a comma, join it to the next line.

local function incomplete(txt) -- must join line to next
  return string.sub(txt,-1) == the.sep  end

If a column name includes "?", then we will need to skip that column.

local function ignored(txt) -- ignore this column
  return string.find(txt,the.ignore) == nil end

Column filtering

Reads, from row1, what columns are to be ignored (if any). Skipped those ignored columns for all rows.

local function cellsWeAreNotIgnoring(txt,wme) 
  local out,col = {},0
  for word in string.gmatch(txt,notsep) do
    col = col + 1
    if wme.first    then 
      wme.use[col] = ignored(word) end
    if wme.use[col] then 

Convery strings to numbers (if needed)

      out[#out+1]  = tonumber(word) or word end end
  return out end

Row filtering

Pass that row to the calling function wme.fn.

local function withOneLine(txt,wme)   
  txt= txt:gsub(padding,"%1")
          :gsub(dull,"")
          :gsub(comments,"") 
  if #txt > 0 then 
    wme.fn( cellsWeAreNotIgnoring(txt,wme) ) end end

Iterator for each line.

look at the src and it if ends in a files name, then read it as a file. Else, read from jat string.

local function withEachLine(src,wme)
  local cache={}
  local function line1(line)
    cache[#cache+1] = line
    if not incomplete(line) then
       cache= withOneLine(table.concat(cache), wme) 
       cache= {}
       wme.first=false end end
  if files[string.sub(src,-3,-1)] then
    io.input(src) 
    for line in io.lines() do 
      line1(line) end
  else 
    for line in src:gmatch("[^\r\n]+") do
      line1(line) end end end

External Interface

Main loop. Function fn is called for each row (and that row is sent to that function as a list of cell values).

return function (src,fn)
  withEachLine(src, {fn=fn, first=true, use={}}) end

Legal

LURE, Copyright (c) 2017, Tim Menzies All rights reserved, BSD 3-Clause License

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.
  • Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.