Pages

Wednesday, May 23, 2012

CSV file Parser

using System;
using System.Data;
using System.IO; //not used by default
using System.Data.OleDb; //not used by default
namespace CSVParserExample
{
  class CSVParser
  {
    public static DataTable ParseCSV(string path)
    {
      if (!File.Exists(path))
        return null;
      string full = Path.GetFullPath(path);
      string file = Path.GetFileName(full);
      string dir = Path.GetDirectoryName(full);
      //create the "database" connection string
      string connString = "Provider=Microsoft.Jet.OLEDB.4.0;"
        + "Data Source=\"" + dir + "\\\";"
        + "Extended Properties=\"text;HDR=No;FMT=Delimited\"";
      //create the database query
      string query = "SELECT * FROM " + file;
      //create a DataTable to hold the query results
      DataTable dTable = new DataTable();
      //create an OleDbDataAdapter to execute the query
      OleDbDataAdapter dAdapter = new OleDbDataAdapter(query, connString);
      try
      {
        //fill the DataTable
        dAdapter.Fill(dTable);
      }
      catch (InvalidOperationException /*e*/)
      { }
      dAdapter.Dispose();
      return dTable;
    }
  }
}
The weird thing about all of this is that the CSV file gets treated as a database table. We need to create a connection string with the Jet OLEDB provider, and you set the Data Source to be the directory that contains the CSV file. Under extended properties, 'text' means that we are parsing a text file (as opposed to, say, an Excel file), the HDR property can be set to 'Yes' (the first row in the CSV files is header information) or 'No' (there is no header row), and setting the FMT property set to 'Delimited' essentially says that we will be working with a comma separated value file. You can also set FMT to 'FixedLength', for files where the fields are fixed length - but that wouldn't be a CSV file anymore, would it?
Now we are into normal OLEDB territory - we create a DataTable that we will be filling with results, and we create a OleDbDataAdapter to actually execute the query. Then (inside of a try block, because it can throw an exception) we fill the data table with the results of the query. Afterwords, we clean up after ourselves by disposing the OleDbDataAdapter, and we return the now filled data table.

No comments:

Post a Comment