发布新帖

Pesquisar

文章
· 九月 2, 2020 阅读大约需 7 分钟

Integrity Check: Speeding it Up or Slowing it Down

While the integrity of Caché and InterSystems IRIS databases is completely protected from the consequences of system failure, physical storage devices do fail in ways that corrupt the data they store.  For that reason, many sites choose to run regular database integrity checks, particularly in coordination with backups to validate that a given backup could be relied upon in a disaster.  Integrity check may also be acutely needed by the system administrator in response to a disaster involving storage corruption.  Integrity check must read every block of the globals being checked (if not already in buffers), and in an order dictated by the global structure. This takes substantial time, but integrity check is capable of reading as fast as the storage subsystem can sustain.  In some situations, it is desirable to run it in that manner to get results as quickly as possible.  In other situations, integrity check needs to be more conservative to avoid consuming too much of the storage subsystem’s bandwidth. 

Plan of Attack

This following outline caters for most situations.  The detailed discussion in the remainder of this article provides the necessary information to act on any of these, or to derive other courses of action. 

  1. If using Linux and integrity check is slow, see the information below on enabling Asynchronous I/O. 
  2. If integrity check must complete as fast as possible - running in an isolated environment, or because results are needed urgently - use Multi-Process Integrity Check to check multiple globals or databases in parallel.  The number of processes times the number of concurrent asynchronous reads that each process will perform (8 by default, or 1 if using Linux with asynchronous I/O disabled) is the limit on the number of concurrent reads in flight.  Consider that the average may be half that and then compare to the capabilities of the storage subsystem.  For example, with storage striped across 20 drives and the default 8 concurrent reads per process, five or more processes may be needed to capture the full capacity of the storage subsystem (5*8/2=20).
  3. When balancing integrity check speed against its impact on production, first adjust the number of processes in the Multi-Process Integrity Check, then if needed, see the SetAsyncReadBuffers tunable.  See Isolating Integrity Check below for a longer-term solution (and for eliminating false positives).
  4. If already confined to a single process (e.g. there’s one extremely large global or other external constraints) and the speed of integrity check needs adjustment up or down, see the SetAsyncReadBuffers tunable below.

Multi-Process Integrity Check

The general solution to get an integrity check to complete faster (using system resources at a higher rate) is to divide the work among multiple parallel processes.  Some of the integrity check user interfaces and APIs do so, while others use a single process.  Assignment to processes is on a per-global basis, so checking a single global is always done by just one process (versions prior to Caché 2018.1 divided the work by database instead of by global).

The principal API for multi-process integrity check is CheckLIst^Integrity (see documentation for details). It collects the results in a temporary global to be displayed by Display^Integrity. The following is an example checking three databases using five processes. Omitting the database list parameter here checks all databases.

set dblist=$listbuild(“/data/db1/”,”/data/db2/”,”/data/db3/”)
set sc=$$CheckList^Integrity(,dblist,,,5)
do Display^Integrity()
kill ^IRIS.TempIntegrityOutput(+$job)

/* Note: evaluating ‘sc’ above isn’t needed just to display the results, but...
   $system.Status.IsOK(sc) - ran successfully and found no errors
   $system.Status.GetErrorCodes(sc)=$$$ERRORCODE($$$IntegrityCheckErrors) // 267
                           - ran successfully, but found errors.
   Else - a problem may have prevented some portion from running, ‘sc’ may have 
          multiple error codes, one of which may be $$$IntegrityCheckErrors. */

Using CheckLIst^Integrity like this is the most straight-forward way to achieve the level of control that is of interest to us.  The Management Portal interface and the Integrity Check Task (built-in but not scheduled) use multiple processes, but may not offer sufficient control for our purposes.*

Other integrity check interfaces, notably the terminal user interface, ^INTEGRIT or ^Integrity, as well as Silent^Integrity, perform integrity check in a single process. These interfaces, therefore, do not complete the check as fast as it's possible to achieve, and they use fewer resources.  An advantage, though, is that their results are visible, logged to a file or output to the terminal, as each global is checked, and in a well-defined order.

Asynchronous I/O

An integrity check process walks through each pointer block of a global, one at a time, validating each against the contents of the data blocks it points to.  The data blocks are read with asynchronous I/O to keep a number of read requests in flight for the storage subsystem to process, and the validation is performed as each read completes. 

On Linux only, async I/O is effective only in combination with direct I/O, which is not enabled by default until InterSystems IRIS 2020.3.  This accounts for a large number of cases where integrity check takes too long on Linux.  Fortunately, it can be enabled on Cache 2018.1, IRIS 2019.1 and later, by setting wduseasyncio=1 in the [config] section of the .cpf file and restarting.  This parameter is recommended in general for I/O scalability on busy systems and is the default on non-Linux platforms since Caché 2015.2.  Before enabling it, make sure that you’ve configured sufficient memory for database cache (global buffers) because with Direct I/O, the databases will no longer be (redundantly) cached by Linux.  When not enabled, reads done by integrity check complete synchronously and it cannot utilize the storage efficiently. 

On all platforms, the number of reads that an integrity check process will put in flight at one time is set to 8 by default.  If you must alter the rate at which a single integrity check process reads from disk this parameter can be tuned – up to get a single process to complete faster, down to use less storage bandwidth.  Bear in mind that:

  • This parameter applies to each integrity check process.  When multiple processes are used, the number of processes multiplies this number of in-flight reads  Changing the number of parallel integrity check processes has a much larger impact and therefore is usually the first thing to do.  Each process is also limited by computational time (among other things) so there increasing the value of this parameter is limited in its benefit.
  • This only works within the storage subsystem’s capacity to process concurrent reads. Higher values have no benefit if databases are stored on a single local drive, whereas a storage array with striping across dozens of drives can process dozens of reads concurrently.

To adjust this parameter from the %SYS namespace, do SetAsyncReadBuffers^Integrity(value). To see the current value, write $$GetAsyncReadBuffers^Integrity(). The change takes effect when the next global is checked.  The setting currently does not persist through a restart of the system, though it can be added to SYSTEM^%ZSTART.

There is a similar parameter to control the maximum size of each read when blocks are contiguous on disk (or nearly so).  This parameter is less often needed, though systems with high storage latency or databases with larger block sizes could possibly benefit from fine tuning.  The value has units of 64KB, so a value of 1 is 64KB, 4 is 256KB, etc.  0 (the default) lets the system to select and it currently selects 1 (64KB).  The ^Integrity function for this parameter, parallel to those mentioned above, are SetAsyncReadBufferSize and GetAsyncReadBufferSize.

Isolating Integrity Check

Many sites run regular integrity checks directly on the production system. This is certainly the simplest to configure, but it’s not ideal.  In addition to concerns about integrity check’s impact on storage bandwidth, concurrent database update activity can sometimes lead to false positive errors (despite mitigations built into the checking algorithm).  As a result, errors reported from an integrity check run on production need to be evaluated and/or rechecked by an administrator.

Often times, a better option exists.  A storage snapshot or backup image can be mounted on another host, where an isolated Caché or IRIS instance runs the integrity check.  Not only does this prevent any possibility of false positives, but if the storage is also isolated from production, integrity check can be run to fully utilize the storage bandwidth and complete much more quickly.  This approach fits well into the model where integrity check is used to validate backups; a validated backup effectively validates production as of the time the backup was made.  Cloud and virtualization platforms can also make it easier to establish a usable isolated environment from a snapshot.

 


The Management Portal interface, the Integrity Check Task and the IntegrityCheck method of SYS.Database select a rather large number of processes (equal to the number of CPU cores), lacking the control that’s needed in many situations. The management portal and the task also perform a complete recheck of any global that reported error in effort to identify false positives that may have occurred due to concurrent updates. This recheck occurs above and beyond the false positive mitigation built into the integrity check algorithms, and that may be unwanted in some situations due to the additional time it takes (the recheck runs in a single process and checks the entire global). This behavior may be changed in the future.

8 Comments
讨论 (8)0
登录或注册以继续
文章
· 八月 30, 2020 阅读大约需 1 分钟

How to Restart SMP server

Caused by a conflict in the port assignment I get this  entry in mesages.log and SMP doesn't respond:

08/30/20-12:56:40:714 (15232) 1 [Utility.Event] Private webserver may not start on port 52773, may be in use by another instance
08/30/20-12:56:40:737 (15232) 0 [Utility.Event] Private webserver started on 52773

The first line is true, the second is just wishful thinking sad

demo code on GitHub

6 Comments
讨论 (6)1
登录或注册以继续
文章
· 八月 28, 2020 阅读大约需 2 分钟

Effective use of Collection Indexing and Querying Collections through SQL

Triggered by a question placed by @Kurro Lopez  recently 
I took a closer look at the indexing of collections.
My simple test setup is a serial class and a persistent class with a list of this serial.

Class rcc.IC.serItem Extends (%SerialObject, %Populate)
{ Property Subject As %String [ Required ]; 
  Property Change As %TimeStamp [ Required ]; 
  Property Color As %String(COLLATION = "EXACT", 
     VALUELIST = ",red,white,blue,yellow,black,unknown") [ Required ];
}
Class rcc.IC.ItemList Extends (%Persistent, %Populate) [ Final ]
{ Property Company As %String [ Required ]; 
  Property Region As list Of %String(COLLATION = "EXACT", POPSPEC = ":4",
     VALUELIST = ",US,CD,MX,EU,JP,AU,ZA") [ Required ];
  Property Items As list Of rcc.IC.serItem(POPSPEC = ":4") [ Required ];
 
  Index xitm On Items(ELEMENTS);
  Index ycol On Items(ELEMENTS).Color;
}

Related Docs
Index xitm holds the complete serial element. !!
With some records generated by %Populate utility  I could place this query

Select ID,Company from rcc_IC.ItemList
Where FOR SOME %ELEMENT(rcc_IC.ItemList.Items) ($list(%Value,3) in ('blue','yellow'))

This works OK but disassembling every serial object wasn't very promising for my performance considerations.
So I followed a hit from @Dan Pasco  recently seen in this forum a few days ago,
and expecting better performance I added 

Index ycol On Items(ELEMENTS).Color;

The result was rather disappointing.
No improvement.
Investigation of the query plan showed that the new index was just ignored.


 After some trials, this query satisfied my needs

Select ID,Company 
from %IGNOREINDEX xitm rcc_IC.ItemList
Where FOR SOME %ELEMENT(rcc_IC.ItemList.Items) ('blue,yellow' [ %Value )

with

During the investigation with many variations I found this rule:

IF you have more than one ELEMENT index on the same property the 
query generator always takes the alphabetic first index it finds.
And you have to explicitly exclude a non-fitting index.

As  there is no hint in the documentation I would like to know:

Is this observation correct or is it just an accidental effect in my case?

As ELEMENT index was designed for List of %String  I understand that  having
more than one index was just an unlikely case at the time of design.

GitHub

5 Comments
讨论 (5)1
登录或注册以继续
问题
· 八月 12, 2020

Is there a way to trigger system functions from Alerting in Ensemble

We have a vendor that every couple of days will just stop transmitting messages, but still hold the TCP/IP connection open. No matter how many times we troubleshoot and talk with them, they don't seem to think its an issue with system.  Normally if I just restart the service it will get the data flowing again.

I know ideal is for them to fix the issue, but in the meantime I have setup an Inactivity time out alert.  I was wondering with the correct filtering if there was a way to say if the Inactivity Alert is triggered during the business day, to have the Alert trigger a restart of the service?

Thanks

Scott

3 Comments
讨论 (3)3
登录或注册以继续
文章
· 八月 5, 2020 阅读大约需 8 分钟

【はじめてのInterSystems IRIS】セルフラーニングビデオ:アクセス編:IRIS での JSON の操作

IRIS サーバ側で JSON の操作を行う方法を解説します(3つのビデオに分かれています)。

ビデオ① :ダイナミックエンティティの操作練習

ビデオ② :ダイナミックエンティティで利用できるメソッドの練習

ビデオ③ :SQL関数と %JSON.Adapter の使い方

 

なお、このビデオには、以下の関連ビデオがあります。ぜひご参照ください。

 

ビデオ①

このビデオの目次は以下の通りです。

最初~ 復習ビデオ/関連ビデオについて など

2:05~ JSONとは?

3:26~ JSONオブジェクト:ダイナミックエンティティの作成

//%DynamicObjectを使用した例
set json=##class(%DynamicObject).%New()
set json.Name="テスト太郎"
set json.Address="東京都新宿区"
write json.%ToJSON()
// 出力結果は以下の通り
//{"Name":"テスト太郎","Address":"東京都新宿区"}
//リテラルJSONコンストラクタ {} を使用した例
set json={}   // %DynamicObjectと一緒
set json.Name="テスト太郎",json.Address="東京都新宿区"
write json.%ToJSON()
// 出力結果は以下の通り
//{"Name":"山田太郎","Address":"東京都新宿区"}

 

7:25~ JSON配列:ダイナミックエンティティでの作成例

//%DynamicArrayを使用した例
set array=##class(%DynamicArray).%New()
set array."0"="配列1"  //配列はインデックス番号0からスタート
set array."1"="配列2"
write array.%ToJSON()
// 出力結果は以下の通り
//["配列1","配列2"]
//リテラルJSONコンストラクタ [] を使用した例
set array=[]   //%DynamicArrayと一緒
set array."0"="配列1",array."1"="配列2"
write array.%ToJSON()
// 出力結果は以下の通り
//["配列1","配列2"]

 

9:49~ ダイナミックエンティティ操作用のメソッド

※ ビデオ②に続きます

先頭へ戻る

 

ビデオ②

もくじは以下の通りです。

00:00~ %Set()、%Get()、%Remove() オブジェクト編

set obj={}
set obj.Name="山田太郎"
do obj.%Set("Zip","160-0023")
do obj.%Set("Tel","03-5321-6200")
write obj.%ToJSON()
write obj.%Get("Zip")," - ",obj.%Get("Name")
do obj.%Remove("Zip")
write obj.%ToJSON()
set obj.Pref="東京都"
write obj.%ToJSON()
do obj.%Set("City","新宿区")
write obj.%ToJSON()

 

02:02~ %Set()、%Get()、%Remove() 配列編

set array=[]
do array.%Set(0,"最初")
write array.%ToJSON()
set array."4"="最後"   //set array."番号"="値" は array.%Set("番号","値")と同等
do array.%Set(2,"真中")
write array.%ToJSON()
do array.%Pop()
write array.%ToJSON()
do array.%Push("Pushしたデータ")
write array.%ToJSON()
do array.%Remove(1)   // 左から2番目の null を削除
write array.%ToJSON()

 

03:34~ JSON配列 要素の操作:%Size()、%Get()

set array=["最初",null,""]
do array.%Set(4,"最後")   // インデックス番号3 はJSONのnullを設定
write array.%ToJSON()
// 出力結果は以下の通り
["最初",null,"",null,"最後"]

for i=0:1:array.%Size()-1 w array.%Get(i),!
// 出力結果は以下の通り
最初
 
 
 
最後

 

05:20~ 値が有効値かどうか確認する %IsDefined()

set array=["最初",null,""]
do array.%Set(4,"最後")   // インデックス番号3 はJSONのnullを設定
write array.%ToJSON()
// 出力結果は以下の通り
["最初",null,"",null,"最後"]

for i=0:1:array.%Size()-1 {write array.%Get(i)," - 有効値?",array.%IsDefined(i),!}
// 出力結果は以下の通り
最初 - 有効値?1
 - 有効値?1
 - 有効値?1
 - 有効値?0
最後 - 有効値?1

 

07:38~ 反復処理:配列の場合:%GetIterator()

set array=["最初",null,""]
do array.%Set(4,"最後")   // インデックス番号3 はJSONのnullを設定
write array.%ToJSON()
// 出力結果は以下の通り
["最初",null,"",null,"最後"]

set iter=array.%GetIterator()
while iter.%GetNext(.key,.val) { write key," - value=  ",val,! }
// 出力結果は以下の通り
0 - value=  最初
1 - value=
2 - value=
4 - value=  最後

 

09:55~ 反復処理:オブジェクトの場合:%GetIterator()

set obj={"Name":"山田太郎","Zip":"160-0023","Pref":"東京都","City":"新宿区"}
write obj.%ToJSON()
// 出力結果は以下の通り
{"Name":"山田太郎","Zip":"160-0023","Pref":"東京都","City":"新宿区"} set iter=obj.%GetIterator()

set iter=obj.%GetIterator()
while iter.%GetNext(.key,.val) { write key," - value=  ",val,! }
// 出力結果は以下の通り
Name - value=  山田太郎
Zip - value=  160-0023
Pref - value=  東京都
City - value=  新宿区

 

10:11~ JSON null/true/false (ObjectScriptの中での対応)

set array=[null,true,false,1,0,""]
set array."7"="値あり"
for i=0:1:array.%Size()-1 { write i," - ",array.%Get(i),! }
// 出力結果は以下の通り
0 -
1 - 1
2 - 0
3 - 1
4 - 0
5 -
6 -
7 - 値あり

 

11:40~ JSONのデータタイプを確認する= %GetTypeOf()メソッド

set array=[null,true,false,1,0,""]
set array."7"="値あり"
for i=0:1:array.%Size()-1 {write i," - ",array.%Get(i)," - ",array.%GetTypeOf(i),! }
// 出力結果は以下の通り
0 -  - null
1 - 1 - boolean
2 - 0 - boolean
3 - 1 - number
4 - 0 - number
5 -  - string
6 -  - unassigned
7 - 値あり - string

 

12:32~ JSONのデータタイプを指定して値を設定する

set array=[]
do array.%Set(0,"","null")  // 第2引数はスクリプト上のnull
do array.%Set(1,1,"number")  //数字の1として設定
do array.%Set(2,0,"number")  //数字の0として設定
do array.%Set(3,1,"boolean")  // booleanの1=trueとして設定
do array.%Push(0,"boolean")  // booleanの0=false として設定
for i=0:1:array.%Size()-1 { write i," - ",array.%Get(i)," - ",array.%GetTypeOf(i),! }
// 出力結果は以下の通り
0 -  - null
1 - 1 - number
2 - 0 - number
3 - 1 - boolean
4 - 0 - boolean

write array.%ToJSON()
// 出力結果は以下の通り
[null,1,0,true,false]

 

13:38~ ObjectScriptの変数や表現式を [] や {} で使用する方法

set obj={"日付":($ZDATE($H,16)),"時刻":($ZTIME($PIECE($H,",",2)))}
write obj.%ToJSON()
set mgr=$system.Util.ManagerDirectory()
set array=[($system.Util.InstallDirectory()),(mgr)]
write array.%ToJSON()

 

※ ビデオ③へつづきます

先頭へ戻る
 

ビデオ③

もくじは以下の通りです。

00:00~ テーブルデータをJSONオブジェクト、JSON配列で取得する方法 概要

 関連ビデオのご紹介

 (Test.Personの作り方を確認する場合に良いビデオ)
    上記ビデオの 13:20~(スタジオでの操作)/18:44~(VS Code での操作) で作成方法を紹介しています。

 

00:54~   SQL:SELECTでの操作 JSON_OBJECT()関数 説明と実演

 管理ポータル→システムエクスプローラ→SQL を開き対象ネームスペースに移動後
 クエリ実行タブで以下実行します。

SELECT JSON_OBJECT('Name':Name,'Email':Email) ABSENT ON NULL as json from Test.Person

 

02:42~ JSON_OBJECT()例(埋込SQLでの実行例)

Class Test.JSONTest
{
ClassMethod GetAllPerson()
{
	//埋込SQL
	&sql(declare C1 cursor for
	select JSON_OBJECT('Name':Name,'Email':Email) as json into :json from Test.Person)
	&sql(open C1)
	set array=[]
	for {
		&sql(fetch C1)
		if SQLCODE'=0 {
			quit
		}
		set obj={}.%FromJSON(json)
		do array.%Push(obj)
	}
	&sql(close C1)
	write array.%ToJSON()
}  
}
//実行文
do ##class(Test.JSONTest).GetAllPerson()

 

07:00~ SQL:SELECTでの操作 JSON_ARRAY()関数

 管理ポータル→システムエクスプローラ→SQL を開き対象のネームスペースに移動後
 クエリ実行タブで以下実行します。

SELECT JSON_ARRAY(Name,Email ABSENT ON NULL) as array from Test.Person

 

07:50~ JSON_ARRAY()例(ダイナミックSQL実行例)

ClassMethod GetAllPersonArray()
{
    set sql="SELECT JSON_ARRAY(Name,Email ABSENT ON NULL) as array from Test.Person"
    set stmt=##class(%SQL.Statement).%New()
    set status=stmt.%Prepare(sql)
    set rset=stmt.%Execute()
    set root=[]
    while rset.%Next() {
        set array=[].%FromJSON(rset.%Get("array"))
        do root.%Push(array)
    }
    do root.%ToJSON()
}
//実行文
do ##class(Test.JSONTest).GetAllPersonArray()

08:59~ JSONアダプタ(%JSON.Adapter)

set person=##class(Test.Person).%OpenId(1)
set status=person.%JSONExport()
write status

 

10:08~ オブジェクト→JSON文字列を含むストリーム %JSONExportToStream() 説明と実演

set person=##class(Test.Person).%OpenId(1)
set st=person.%JSONExportToStream(.jstream)
write st
write jstream.Read()
write jstream.Rewind()
set jobj={}.%FromJSON(jstream.Read())
write jobj.Name
write jobj.Email

 

12:59~ オブジェクト→JSON文字列にマッピング %JSONExportToString()

set person=##class(Test.Person).%OpenId(1)
set st=person.%JSONExportToString(.jstring)
write st
write jstring
set jobj={}.%FromJSON(jstring)
write jobj.Name
write jobj.Email

 

13:20~ JSON文字列→オブジェクトへのマッピング %JSONImport()

set json={}
set json.Name="ジェイソン", json.Email="json@mail.com"
zwrite json
set p1=##class(Test.Person).%New()
set st=p1.%JSONImport(json)
write st

 

14:24~ 確認できたこと

 


《2024/1/16追記》ビデオには含まれていませんが、バージョン2023.3以降でJSON配列に情報を追加できる add()メソッド、JSON配列同士の結合に便利な addAll()メソッドが追加されました。

USER>set a1=["a","b","c"]

USER>do a1.add("追加")

USER>zwrite a1
a1=["a","b","c","追加"]  ; <DYNAMIC ARRAY>
USER>

USER>set a2=[1,2,3]

USER>do a1.addAll(a2)

USER>zwrite a1
a1=["a","b","c","追加",1,2,3]  ; <DYNAMIC ARRAY>

先頭へ戻る

1 Comment
讨论 (1)1
登录或注册以继续