MiniM Knowledge Base | Sep 1, 2011
How to use CSV files
One of the most widely used data transfer file formats is a Comma-Separated Values file format. This article demonstrates how routine in MiniM can import and export data using this file format.
CSV file format is defined as ordinal text file, and store one data record per line. Line delimiter is standard text line terminator.
One line is one record and values of columns are delimited by comma or by semicolon. Next in this article is used comma as a delimiter. Values of columns, which contains reserved symbols (comma) are decorated by quotes ("); if value contains quotes, each quote must be doubled.
To import this file format we have two ways - use standard MUMPS language subset to parse string and use extended functions to simplify line parsing which was defined by regular rule. Most of simple way in many programming systems with regular expressions support is using regular expressions to parse lines.
To demonstrate import possibilities article use test routine, which read file line by line and for each line call subroutine to process line. In line processing functions is used subroutine which parses line by separate values of columns and displays this values.
Subroutine to read file line-by-line and call subroutine to process line:
import() ; k d import^CSV() w n dev="|FILE|testdata.csv" o dev:("rte") u dev n oldeof=$v("proc",5,0),line f r line q:$zeof d . d showrecord(line) i $v("proc",5,oldeof) u $p c dev q
Subroutine to display line before and values after parsing:
showrecord(line) n oldio=$io u $p w "record:",!," " w line,! n record=$$parserecord(line) w "parsed fields:",! n i f i=1:1:$ll(record) w " ",$lg(record,i),! u oldio q
Subroutine to parse line by separate values of columns and return record as a list of values:
parserecord(line) ; split record by comma-separated parts n resplit="(?:^|,)(\""(?:[^\""]+|\""\"")*\""|[^,]*)" n reremove="^\s*""|^\s*,\s*""|^\s*,\s*|^\s*,\s*""|""\s*$|""\s*,\s*$|\s*,\s*$" n parts=$zpcres(line,resplit,"g") n i f i=1:1:$ll(parts) d . ; remove leading and trailing quote symbols and comma if present . s $li(parts,i)=$zpcrer($lg(parts,i),reremove,"","g") . ; remove doubled quote symbols if present . s $li(parts,i)=$zpcrer($lg(parts,i),"""""","""","g") q parts
Here are used three regular espressions - resplit to divide line to separate values, reremove to remove decoration symbols and removing doubled quote symbols and replacing ones by single quote symbol.
On import CSV file from article's example this routine displays strings before and after parsing:
USER>k d import^CSV() w record: 123 , "abc","a b c","a, b, c." , "the ""Abc"" is a name" parsed fields: 123 abc a b c a, b, c. the "Abc" is a name record: "Stri,ng 1", "Stri""ng 2" , String 3,String4 parsed fields: Stri,ng 1 Stri"ng 2 String 3 String4 record: 789,"text","text with spaces","text,with,commas",456 parsed fields: 789 text text with spaces text,with,commas 456
This utility can handle much complex formatting cases, including leading spaces and quotes.
For export data article's routine uses system function $ZQUOTE() feature to double quotes in string and add leading and trailing quotes for strings. If value is ordinal number, function does not change string and return as is, with dot symbol as a decimal separator.
In full standard, CSV standard use values delimiter in depending of system's locale - if locale use dot symbol as a decimal separator, values must be separated by comma and if locale use comma as decimal separator, values must be delimited by colon. This example use dot symbol as a decimal separator and so use comma as values separator. This case is most widely used in many database systems and other applications.
For data export routine contains subroutine to generate test data:
makedata(data) k data s data($i(data))=$lb(123,"abc","a b c","a, b, c.","the ""Abc"" is a name") s data($i(data))=$lb(789,"text","text with spaces","text,with,commas",456) q
And export function contains functions for line-by-line output and output of one line:
export() ; k d export^CSV() w n dev="|FILE|testdata.csv" o dev:("wtn") u dev n data d makedata(.data) n n="" f s n=$o(data(n)) q:n="" d writerecord(data(n)) u $p c dev q writerecord(record) ; in MiniM function $zquote adds leading and trailing quotes ; for strings and doubles quotes of string n i f i=1:1:$ll(record)-1 w $zquote($lg(record,i)),"," w $zquote($lg(record,$ll(record))),! q
This example is demonstration only and shows MiniM possibilities to use most widely used data exchange file format. Article apologies that the one record is passed as an ordinal list of values. Real production systems can contain more complex internal data representations.
Download csv.zip (zip, 1.1Kb)Eugene Karataev