介绍
本文介绍了一个类,可让您从 C# 客户端运行 Python 脚本 ( PythonRunner
)。这些脚本可以生成文本输出以及将转换为 C# Image的图像。这样,该PythonRunner
课程不仅使 C# 应用程序能够访问数据科学、机器学习和人工智能的世界,还使 Python 的详尽的图表、绘图和数据可视化库(例如matplotlib和seaborn)可用于 C# 。
背景
一般注意事项
我是一名 C# 开发人员已有十多年了 - 我能说什么:这些年来,我深深地爱上了这门语言。它为我提供了极大的架构灵活性,拥有庞大的社区支持,以及丰富的第三方工具(免费和商业工具,其中大部分是开源的),支持几乎所有可以想到的用例。C# 是一种通用语言,是业务应用程序开发的首选语言。
在过去的几个月里,我开始学习 Python 用于数据科学、机器学习、人工智能和数据可视化——主要是因为我认为这项技能将推动我作为自由软件开发人员的职业生涯。我很快意识到 Python 非常适合上述任务(远比 C# 更好),但它完全不适合开发和维护大型业务应用程序。因此, C# 与 Python(互联网上广泛讨论的话题)的问题完全没有抓住重点。C# 适合从事业务规模应用程序开发工作的开发人员,Python 适合擅长数据科学、机器学习和数据可视化的数据科学家。这两项工作没有太多共同点。
任务并不是让 C# 开发人员另外成为数据科学家或 ML 专家(或反之亦然)。成为这两个领域的专家实在是太过分了。恕我直言,这是 Microsoft 的ML.NET或SciSharp STACK等组件不会被广泛使用的主要原因。一般的 C# 开发人员不会成为数据科学家,数据科学家也不会学习 C#。他们为什么会这么做?他们已经拥有一种非常适合他们需求的优秀编程语言,并且拥有庞大的科学第三方库生态系统。
考虑到这些因素,我开始寻找一种更简单、更“自然”的方式将 Python 世界和 C# 世界结合在一起。这是一种可能的解决方案......
示例应用程序
在我们深入了解细节之前,先做一个简短的初步说明:我编写此示例应用程序的唯一目的是演示 C#/Python 集成,仅使用一些稍微复杂的 Python 代码。我不太关心机器学习代码本身是否有用的问题,所以在这方面请原谅。
话虽如此,让我简短地描述一下示例应用程序。基本上,它:
为您提供可供选择的股票列表 (6-30),
绘制(标准化)每月股票价格的汇总折线图,
根据其价格变动执行所谓的“k 均值聚类分析”,并在
treeview
.
让我们一步一步地逐步浏览该应用程序的三个部分......
选股
应用程序窗口左侧的数据网格向您显示可供选择的可用股票列表。您至少需要选择 6 个项目才能进行进一步操作(所选股票的最大数量为 30 只)。您可以使用顶部的控件来过滤列表。此外,还可以通过单击列标题对列表进行排序。“检查随机样本”按钮会从列表中随机选择 18 只股票。
调整其他参数
除了股票选择之外,您还可以调整分析的其他参数:分析的日期范围和 k 均值分析的聚类数元参数。该数量不能大于所选股票的数量。
分析结果
如果您完成了股票选择和参数调整,您可以按窗口右下角的“分析”按钮。这将(异步)调用执行上述步骤的 Python 脚本(绘制图表并执行 k 均值聚类分析)。返回时,它将处理并显示脚本的输出。
窗口的中间部分是包含所选股票价格的图表,经过标准化,开始日期的价格设置为零,并且股票价格从该起点缩放到百分比变化。运行脚本产生的图像被包装在ZoomBox控件中,以增强可访问性和用户体验。
在窗口的最右侧,显示一棵树,其中包含聚类分析的处理结果。它根据股票的相对价格变动对股票进行分组(集群)(换句话说:两只股票移动得越近,它们位于同一集群中的可能性就越大)。该树还用作图表的颜色图例。
兴趣点
准则的主要结构
一般来说,该项目包括:
C# 文件
脚本子文件夹中的 Python 脚本(chart.py、kmeans.py和common.py)
可以通过 C# 代码和 Python 脚本访问的 SQLite 数据库 (
stockdata.sqlite
)
其他注意事项:
在 C# 端,使用 EF6 和此 Codeproject 文章中的配方来访问数据库。
一些 WPF UI 控件来自Extended WPF Toolkit™。
当然,目标系统上必须安装包含所有必需包的Python环境。相应的路径是通过app.config文件配置的。
应用程序的 C# 部分使用 WPF 并遵循 MVVM 模式。根据应用程序主窗口的三重整体结构,存在由第四个视图模型( )编排的三个视图模型(
StockListViewModel
、ChartViewModel
和)。TreeViewViewModel
MainViewModel
C# 方面
PythonRunner类_
运行 Python 脚本的核心组件是PythonRunner
类。它基本上是Process类的包装器,专门用于 Python。它支持同步和异步文本输出和图像输出。下面是public
这个类的接口,以及解释细节的代码注释:
/// <summary>/// A specialized runner for python scripts. Supports textual output/// as well as image output, both synchronously and asynchronously./// </summary>/// <remarks>/// You can think of <see cref="PythonRunner" /> instances <see cref="Process" />/// instances that were specialized for Python scripts./// </remarks>/// <seealso cref="Process" />public class PythonRunner { /// <summary> /// Instantiates a new <see cref="PythonRunner" /> instance. /// </summary> /// <param name="interpreter"> /// Full path to the Python interpreter ('python.exe'). /// </param> /// <param name="timeout"> /// The script timeout in msec. Defaults to 10000 (10 sec). /// </param> /// <exception cref="ArgumentNullException"> /// Argument <paramref name="interpreter" /> is null. /// </exception> /// <exception cref="FileNotFoundException"> /// Argument <paramref name="interpreter" /> is an invalid path. /// </exception> /// <seealso cref="Interpreter" /> /// <seealso cref="Timeout" /> public PythonRunner(string interpreter, int timeout = 10000) { ... } /// <summary> /// Occurs when a python process is started. /// </summary> /// <seealso cref="PyRunnerStartedEventArgs" /> public event EventHandler<PyRunnerStartedEventArgs> Started; /// <summary> /// Occurs when a python process has exited. /// </summary> /// <seealso cref="PyRunnerExitedEventArgs" /> public event EventHandler<PyRunnerExitedEventArgs> Exited; /// <summary> /// The Python interpreter ('python.exe') that is used by this instance. /// </summary> public string Interpreter { get; } /// <summary> /// The timeout for the underlying <see cref="Process" /> component in msec. /// </summary> /// <remarks> /// See <see cref="Process.WaitForExit(int)" /> for details about this value. /// </remarks> public int Timeout { get; set; } /// <summary> /// Executes a Python script and returns the text that it prints to the console. /// </summary> /// <param name="script">Full path to the script to execute.</param> /// <param name="arguments">Arguments that were passed to the script.</param> /// <returns>The text output of the script.</returns> /// <exception cref="PythonRunnerException"> /// Thrown if error text was outputted by the script (this normally happens /// if an exception was raised by the script). <br /> /// -- or -- <br /> /// An unexpected error happened during script execution. In this case, the /// <see cref="Exception.InnerException" /> property contains the original /// <see cref="Exception" />. /// </exception> /// <exception cref="ArgumentNullException"> /// Argument <paramref name="script" /> is null. /// </exception> /// <exception cref="FileNotFoundException"> /// Argument <paramref name="script" /> is an invalid path. /// </exception> /// <remarks> /// Output to the error stream can also come from warnings, that are frequently /// outputted by various python package components. These warnings would result /// in an exception, therefore they must be switched off within the script by /// including the following statement: <c>warnings.simplefilter("ignore")</c>. /// </remarks> public string Execute(string script, params object[] arguments) { ... } /// <summary> /// Runs the <see cref="Execute"/> method asynchronously. /// </summary> /// <returns> /// An awaitable task, with the text output of the script as /// <see cref="Task{TResult}.Result"/>. /// </returns> /// <seealso cref="Execute"/> public Task<string> ExecuteAsync(string script, params object[] arguments) { ... } /// <summary> /// Executes a Python script and returns the resulting image /// (mostly a chart that was produced /// by a Python package like e.g. <see href="https://matplotlib.org/">matplotlib</see> or /// <see href="https://seaborn.pydata.org/">seaborn</see>). /// </summary> /// <param name="script">Full path to the script to execute.</param> /// <param name="arguments">Arguments that were passed to the script.</param> /// <returns>The <see cref="Bitmap"/> that the script creates.</returns> /// <exception cref="PythonRunnerException"> /// Thrown if error text was outputted by the script (this normally happens /// if an exception was raised by the script). <br/> /// -- or -- <br/> /// An unexpected error happened during script execution. In this case, the /// <see cref="Exception.InnerException"/> property contains the original /// <see cref="Exception"/>. /// </exception> /// <exception cref="ArgumentNullException"> /// Argument <paramref name="script"/> is null. /// </exception> /// <exception cref="FileNotFoundException"> /// Argument <paramref name="script"/> is an invalid path. /// </exception> /// <remarks> /// <para> /// In a 'normal' case, a Python script that creates a chart would show this chart /// with the help of Python's own backend, like this. /// <example> /// import matplotlib.pyplot as plt /// ... /// plt.show() /// </example> /// For the script to be used within the context of this <see cref="PythonRunner"/>, /// it should instead convert the image to a base64-encoded string and print this string /// to the console. The following code snippet shows a Python method (<c>print_figure</c>) /// that does this: /// <example> /// import io, sys, base64 /// /// def print_figure(fig): /// buf = io.BytesIO() /// fig.savefig(buf, format='png') /// print(base64.b64encode(buf.getbuffer())) /// /// import matplotlib.pyplot as plt /// ... /// print_figure(plt.gcf()) # the gcf() method retrieves the current figure /// </example> /// </para><para> /// Output to the error stream can also come from warnings, that are frequently /// outputted by various python package components. These warnings would result /// in an exception, therefore they must be switched off within the script by /// including the following statement: <c>warnings.simplefilter("ignore")</c>. /// </para> /// </remarks> public Bitmap GetImage(string script, params object[] arguments) { ... } /// <summary> /// Runs the <see cref="GetImage"/> method asynchronously. /// </summary> /// <returns> /// An awaitable task, with the <see cref="Bitmap"/> that the script /// creates as <see cref="Task{TResult}.Result"/>. /// </returns> /// <seealso cref="GetImage"/> public Task<Bitmap> GetImageAsync(string script, params object[] arguments) { ... } }
检索股票数据
如前所述,示例应用程序使用 SQLite 数据库作为其数据存储(也可以由 Python 端访问 - 见下文)。为此,使用了实体框架以及这篇 Codeproject 文章中的配方。然后将股票数据放入ListCollectionView中,它支持过滤和排序:
private void LoadStocks() { var ctx = new SQLiteDatabaseContext(_mainVm.DbPath); var itemList = ctx.Stocks.ToList().Select(s => new StockItem(s)).ToList(); _stocks = new ObservableCollection<StockItem>(itemList); _collectionView = new ListCollectionView(_stocks); // Initially sort the list by stock names ICollectionView view = CollectionViewSource.GetDefaultView(_collectionView); view.SortDescriptions.Add(new SortDescription("Name", ListSortDirection.Ascending)); }
获取文本输出
这里,PythonRunner
正在调用一个生成文本输出的脚本。该KMeansClusteringScript
属性指向要执行的脚本:
/// <summary>/// Calls the python script to retrieve a textual list that /// will subsequently be used for building the treeview./// </summary>/// <returns>True on success.</returns>private async Task<string> RunKMeans() { TreeViewText = Processing; Items.Clear(); try { string output = await _mainVm.PythonRunner.ExecuteAsync( KMeansClusteringScript, _mainVm.DbPath, _mainVm.TickerList, _mainVm.NumClusters, _mainVm.StartDate.ToString("yyyy-MM-dd"), _mainVm.EndDate.ToString("yyyy-MM-dd")); return output; } catch (Exception e) { TreeViewText = e.ToString(); return string.Empty; } }
这是脚本生成的一些示例输出:
0 AYR 0,0,255 0 PCCWY 0,100,0 0 HSNGY 128,128,128 0 CRHKY 165,42,42 0 IBN 128,128,0 1 SRNN 199,21,133 ... 4 PNBK 139,0,0 5 BOTJ 255,165,0 5 SPPJY 47,79,79
第一列是 k 均值分析的聚类编号,第二列是相应股票的股票代码,第三列表示用于在图表中绘制该股票线条的颜色的 RGB 值。
获取图像
该方法使用 viewmodelPythonRunner
实例异步调用所需的 Python 脚本(其路径存储在属性中DrawSummaryLineChartScript
)以及所需的脚本参数。一旦结果可用,就会将其处理为“WPF 友好”的表单:
/// <summary>/// Calls the python script to draw the chart of the selected stocks./// </summary>/// <returns>True on success.</returns>internal async Task<bool> DrawChart() { SummaryChartText = Processing; SummaryChart = null; try { var bitmap = await _mainVm.PythonRunner.GetImageAsync( DrawSummaryLineChartScript, _mainVm.DbPath, _mainVm.TickerList, _mainVm.StartDate.ToString("yyyy-MM-dd"), _mainVm.EndDate.ToString("yyyy-MM-dd")); SummaryChart = Imaging.CreateBitmapSourceFromHBitmap( bitmap.GetHbitmap(), IntPtr.Zero, Int32Rect.Empty, BitmapSizeOptions.FromEmptyOptions()); return true; } catch (Exception e) { SummaryChartText = e.ToString(); return false; } }
Python 方面
抑制警告
需要注意的重要一点是,PythonRunner
一旦被调用的脚本写入stderr
. 当 Python 代码由于某种原因引发错误时就会出现这种情况,在这种情况下,需要重新抛出错误。stderr
但是,如果某些组件发出无害的警告,例如当某些组件很快被弃用、某些组件初始化两次或任何其他小问题时,脚本也可能会写入。在这种情况下,我们不想中断执行,而只是忽略警告。下面代码片段中的语句正是这样做的:
import warnings ... # Suppress all kinds of warnings (this would lead to an exception on the client side). warnings.simplefilter("ignore") ...
解析命令行参数
正如我们所见,C#(客户端)端调用具有可变数量位置参数的脚本。参数通过命令行提交给脚本。这意味着脚本“理解”这些参数并相应地对其进行解析。提供给 Python 脚本的命令行参数可通过sys.argv
string
数组访问。下面的代码片段来自kmeans.py
脚本,演示了如何执行此操作:
import sys ... # parse command line arguments db_path = sys.argv[1] ticker_list = sys.argv[2] clusters = int(sys.argv[3]) start_date = sys.argv[4] end_date = sys.argv[5] ...
检索股票数据
Python 脚本使用与 C# 代码相同的 SQLite 数据库。这是通过将数据库的路径作为应用程序设置存储在C# 端的app.config中,然后作为参数提交给调用的 Python 脚本来实现的。上面,我们已经看到了如何从调用方以及 Python 脚本中的命令行参数解析来完成此操作。现在是 Python 辅助函数,它根据参数构建 SQL 语句并将所需数据加载到dataframe数组中(使用sqlalchemy Python 包):
from sqlalchemy import create_engine import pandas as pd def load_stock_data(db, tickers, start_date, end_date): """ Loads the stock data for the specified ticker symbols, and for the specified date range. :param db: Full path to database with stock data. :param tickers: A list with ticker symbols. :param start_date: The start date. :param end_date: The start date. :return: A list of time-indexed dataframe, one for each ticker, ordered by date. """ SQL = "SELECT * FROM Quotes WHERE TICKER IN ({}) AND Date >= '{}' AND Date <= '{}'"\ .format(tickers, start_date, end_date) engine = create_engine('sqlite:///' + db) df_all = pd.read_sql(SQL, engine, index_col='Date', parse_dates='Date') df_all = df_all.round(2) result = [] for ticker in tickers.split(","): df_ticker = df_all.query("Ticker == " + ticker) result.append(df_ticker) return result
文本输出
对于 Python 脚本,生成可从 C# 端使用的文本输出仅意味着:照常打印到控制台。调用PythonRunner
类将处理其他所有事情。以下是kmeans.py中的片段,它生成上面看到的文本:
# Create a DataFrame aligning labels and companies. df = pd.DataFrame({'ticker': tickers}, index=labels) df.sort_index(inplace=True) # Make a real python list. ticker_list = list(ticker_list.replace("'", "").split(',')) # Output the clusters together with the used colorsfor cluster, row in df.iterrows(): ticker = row['ticker'] index = ticker_list.index(ticker) rgb = get_rgb(common.COLOR_MAP[index]) print(cluster, ticker, rgb)
图像输出
图像输出与文本输出没有太大区别:首先,脚本像往常一样创建所需的图形。然后,不要show()
使用 Python 自己的后端调用方法来显示图像,而是将其转换为 abase64 string
并将其打印string
到控制台。您可以使用这个辅助函数:
import io, sys, base64 def print_figure(fig): """ Converts a figure (as created e.g. with matplotlib or seaborn) to a png image and this png subsequently to a base64-string, then prints the resulting string to the console. """ buf = io.BytesIO() fig.savefig(buf, format='png') print(base64.b64encode(buf.getbuffer()))
然后在主脚本中,您可以像这样调用辅助函数(该gcf()
函数只是获取当前数字):
import matplotlib.pyplot as plt ... # do stuff ... print_figure(plt.gcf())
然后,在 C# 客户端,这个由 所使用的小帮助器类PythonRunner
会将其转换string
回图像(准确地说是位图):
/// <summary>/// Helper class for converting a base64 string (as printed by/// python script) to a <see cref="Bitmap" /> image./// </summary>internal static class PythonBase64ImageConverter { /// <summary> /// Converts a base64 string (as printed by python script) to a <see cref="Bitmap" /> image. /// </summary> public static Bitmap FromPythonBase64String(string pythonBase64String) { // Remove the first two chars and the last one. // First one is 'b' (python format sign), others are quote signs. string base64String = pythonBase64String.Substring(2, pythonBase64String.Length - 3); // Convert now raw base46 string to byte array. byte[] imageBytes = Convert.FromBase64String(base64String); // Read bytes as stream. var memoryStream = new MemoryStream(imageBytes, 0, imageBytes.Length); memoryStream.Write(imageBytes, 0, imageBytes.Length); // Create bitmap from stream. return (Bitmap)Image.FromStream(memoryStream, true); } }
评论留言