Microsoft has recently released an open source machine learning library called ML.net. As opposed to scikit-learn, there is no dataframe in C# and the data are described as an array of instances specific to the data the learning pipeline has to handle : Get started with ML.NET in 10 minutes. I was wondering if there could be a way to skip that part even if it means to be a little bit slower. I finally ended up by implementing something similar to what a dataframe in Python with pandas which I called Scikit.ML.DataFrame. I modified the inital example:
var iris = "iris.txt"; // We read the text data and create a dataframe / dataview. var df = DataFrame.ReadCsv(iris, sep: '\t', dtypes: new DataKind?[] { DataKind.R4 }); var importData = df.EPTextLoader(iris, sep: '\t', header: true); var learningPipeline = new GenericLearningPipeline(); learningPipeline.Add(importData); learningPipeline.Add(new ColumnConcatenator("Features", "Sepal_length", "Sepal_width")); learningPipeline.Add(new StochasticDualCoordinateAscentClassifier()); var predictor = learningPipeline.Train(); var predictions = predictor.Predict(df); var dfout = DataFrame.ReadView(predictions); // And access one value... var v = dfout.iloc[0, 7]; Console.WriteLine("{0}: {1}", vdf.Schema.GetColumnName(7), v.iloc[0, 7]);
<-- --> |