作者

Nicky Zhu

MOD

关注

文章 Nicky Zhu · 四月 10 2m read

IRIS 世界中的 Excel

#Caché #InterSystems IRIS #InterSystems IRIS for Health ##嵌入式 Python #CSV #Python #互操作性 #工具

处理文件通常很简单：打开文件，读取并处理。这种方法非常有效，直到文件碰巧是 Excel 文件。

常见假设

起初，Excel 文件（.xlsx）看起来就像另一个数据文件，行、列和值。因此，我们很自然地认为它可以像 .txt 或 .csv 文件一样被读取。但问题就出在这里。

Excel 文件为何表现不同

关键区别在于数据的存储方式：

-> .txt / .csv - 纯文本，逐行存储。

-> .xlsx - 压缩、结构化格式（非纯文本）

excel 文件实际上不是一个简单的可读行流。从内部看，它是一个包含结构化数据的打包文件，标准文件读取命令无法解释这些数据。

如果把它当作文本文件处理，会发生什么情况？

执行过程中出现错误
输出看起来无法阅读

重要事项 --> 这不是限制，而是工具和文件格式不匹配 。

实用的处理方法

与其只使用基于文本的方法，还有更好的选择、

转换为 CSV
嵌入 Python （pandas 或 openpyxl 等库）
IRIS 互操作性工具

如果有人曾在 IRIS 中处理过 Excel 文件，或有其他行之有效的方法，请随时分享。）

下面提到几个例子。

	// Example for Reading excel and storing in a text file
ClassMethod ReadDataToFile() [ Language = python ]
{
        #; w ##class(test).ReadDataToFile()
    import pandas as pd
    import iris

    source = r"[input-file-path]\Test.xlsx"
    outfile = r"[output-file-path]\Output.txt"

    df = pd.read_excel(source)
    f = open(outfile, "w", encoding="utf-8")
    data = []
    for index, row in df.iterrows():
        line = " ".join([str(x) for x in row.tolist()])
        line = "\t".join(cleaned_values)
        f.write(line + "\n")
    f.close()
}


/// Example for storing in a global
ClassMethod ReadData() [ Language = python ]
{
        #; w ##class(test).ReadData()
    import pandas as pd
    import iris

    source = r"[input-file-path]\Test.xlsx"
    
    df = pd.read_excel(source)
    g = iris.gref("^zReadFile")
    for index, row in df.iterrows():
        cleaned_values = [str(x).replace("\n", " ").replace("\r", " ") for x in row.tolist()]
        line = "\t".join(cleaned_values)
        g[jobId,index]=line
}

查看原帖由 @Vachan C Rannore

讨论 (0)0

登录或注册以继续

添加回复