How to use CSV files

MiniM. Saving time.

MiniM Overview
Getting Started
Download
Documentation
Tools
Cache Tools
GT.M Tools
Knowledge Base
FAQ
Contacts

MiniM Knowledge Base | Sep 1, 2011

How to use CSV files

One of the most widely used data transfer file formats is a Comma-Separated Values file format. This article demonstrates how routine in MiniM can import and export data using this file format.

CSV file format is defined as ordinal text file, and store one data record per line. Line delimiter is standard text line terminator.

One line is one record and values of columns are delimited by comma or by semicolon. Next in this article is used comma as a delimiter. Values of columns, which contains reserved symbols (comma) are decorated by quotes ("); if value contains quotes, each quote must be doubled.

To import this file format we have two ways - use standard MUMPS language subset to parse string and use extended functions to simplify line parsing which was defined by regular rule. Most of simple way in many programming systems with regular expressions support is using regular expressions to parse lines.

To demonstrate import possibilities article use test routine, which read file line by line and for each line call subroutine to process line. In line processing functions is used subroutine which parses line by separate values of columns and displays this values.

Subroutine to read file line-by-line and call subroutine to process line:

import() ; k  d import^CSV() w
 n dev="|FILE|testdata.csv" o dev:("rte") u dev
 n oldeof=$v("proc",5,0),line
 f  r line q:$zeof  d
 . d showrecord(line)
 i $v("proc",5,oldeof)
 u $p c dev
 q

Subroutine to display line before and values after parsing:

showrecord(line)
 n oldio=$io u $p
 w "record:",!,"  "
 w line,!
 n record=$$parserecord(line)
 w "parsed fields:",!
 n i f i=1:1:$ll(record) w "  ",$lg(record,i),!
 u oldio
 q

Subroutine to parse line by separate values of columns and return record as a list of values:

parserecord(line)
 ; split record by comma-separated parts
 n resplit="(?:^|,)(\""(?:[^\""]+|\""\"")*\""|[^,]*)"
 n reremove="^\s*""|^\s*,\s*""|^\s*,\s*|^\s*,\s*""|""\s*$|""\s*,\s*$|\s*,\s*$"
 n parts=$zpcres(line,resplit,"g")
 n i f i=1:1:$ll(parts) d
 . ; remove leading and trailing quote symbols and comma if present
 . s $li(parts,i)=$zpcrer($lg(parts,i),reremove,"","g")
 . ; remove doubled quote symbols if present
 . s $li(parts,i)=$zpcrer($lg(parts,i),"""""","""","g")
 q parts

Here are used three regular espressions - resplit to divide line to separate values, reremove to remove decoration symbols and removing doubled quote symbols and replacing ones by single quote symbol.

On import CSV file from article's example this routine displays strings before and after parsing:

USER>k  d import^CSV() w
record:
  123  ,    "abc","a b c","a, b, c."  ,  "the ""Abc"" is a name"
parsed fields:
  123
  abc
  a b c
  a, b, c.
  the "Abc" is a name
record:
  "Stri,ng 1", "Stri""ng 2" , String 3,String4
parsed fields:
  Stri,ng 1
  Stri"ng 2
  String 3
  String4
record:
  789,"text","text with spaces","text,with,commas",456
parsed fields:
  789
  text
  text with spaces
  text,with,commas
  456

This utility can handle much complex formatting cases, including leading spaces and quotes.

For export data article's routine uses system function $ZQUOTE() feature to double quotes in string and add leading and trailing quotes for strings. If value is ordinal number, function does not change string and return as is, with dot symbol as a decimal separator.

In full standard, CSV standard use values delimiter in depending of system's locale - if locale use dot symbol as a decimal separator, values must be separated by comma and if locale use comma as decimal separator, values must be delimited by colon. This example use dot symbol as a decimal separator and so use comma as values separator. This case is most widely used in many database systems and other applications.

For data export routine contains subroutine to generate test data:

makedata(data)
 k data
 s data($i(data))=$lb(123,"abc","a b c","a, b, c.","the ""Abc"" is a name")
 s data($i(data))=$lb(789,"text","text with spaces","text,with,commas",456)
 q

And export function contains functions for line-by-line output and output of one line:

export() ; k  d export^CSV() w
 n dev="|FILE|testdata.csv" o dev:("wtn") u dev
 n data d makedata(.data)
 n n="" f  s n=$o(data(n)) q:n=""  d writerecord(data(n))
 u $p c dev
 q
writerecord(record)
 ; in MiniM function $zquote adds leading and trailing quotes
 ; for strings and doubles quotes of string
 n i f i=1:1:$ll(record)-1 w $zquote($lg(record,i)),","
 w $zquote($lg(record,$ll(record))),!
 q

This example is demonstration only and shows MiniM possibilities to use most widely used data exchange file format. Article apologies that the one record is passed as an ordinal list of values. Real production systems can contain more complex internal data representations.

Download csv.zip (zip, 1.1Kb)

Eugene Karataev
support@minimdb.com