发布新帖

查找

问题
· 八月 16, 2017

How to get the number of global references for a dynamic sql statement

In $system.SQL.Shell(), it would also report the number of global references for a given query.

Is it possible that I could also get this statistics from the %SQL.Statement interface when I launch a dynamic sql?

Thanks.

4 Comments
讨论 (4)1
登录或注册以继续
文章
· 八月 8, 2017 阅读大约需 1 分钟

Outperforming PostgreSQL and MySQL

In a previous exercise, I was able to show the power of Caché.
A medium-designed set of interdependent tables with some GB of data.
URLs cross reference over some million pages resulting in ~3 billion records

Competition was between

  • Caché
  • PostgreSQL
  • MySQL

Criteria were Speed + Storage consumption
I composed  a customized loader fed over a "raw" TCP connection
Mapping the "objects" into the final table by directly writing to Global Storage.,

Phase 1:  MySQL failed before reaching the 1st million records by it's
Incredible consumption of memory and disks space

Pase2:  Disk consumption of PostgreSQL was higher than Caché
And the load speed was remarkably lower. 
Mapping the "objects" to tables done in PHP resulting in multiple DB INSERTS 

For Caché the input was in principle a string. disassembled in the related Globals.
When Caché was done, PostgreSQL was significantly under 50% of the total.

The slim and effective design o​F Table sin Caché also showed a
dramatically better performance in data retrieval. Even with the unequal dimensions.

Finale furioso: 
The customer ordered a nicely dimensioned license for Caché.​​​​​​

 

3 Comments
讨论 (3)3
登录或注册以继续
文章
· 八月 5, 2017 阅读大约需 3 分钟

Winning in GIS competition

GIS stands for Geographic Information System.
  and it's not a typical arena for Caché. But it's definitely an environment with high data volume.

You see 3 major areas

- Visual front end:
   A mature area well covered by a bunch of commercial and open source products.
   No need for Caché there.

- GIS mathematics:  
  JTS (Java Topology Suite)  is fixed standard that covers all requirements and can be linked to Caché by the Java Gateway
or the C, C++ incarnation of this standard library using Caché Call Out Gateway.
  So far no added value by Caché.

- the Database that holds all the geographic objects and its attributes, especially geographic coordinates
  for road maps, highways, companies, private houses, railways, demographic data, ......

This is the area where Caché can win.
Delivering data based on classic attributes is something we understand quite well. Nothing new.
The challenge starts with comparing locations. The map you work on is represented as collection of bounding rectangles
and your task is to find if a specific element is inside that rectangle. The theory behind is known as R-Tree. https://en.wikipedia.org/wiki/R-tree

 

The basic task is to sort out all points that don't fit  running series of 
SQL SELECT ... WHERE xp BETWEEN x1 AND x2  AND yp BETWEEN y1 AND y2 ....

How can this query be be accelerated:
- Forget about using floating point coordinates for indexing but  keep them for JTS purpose
- Use SQLcomputed values instead. Preferably Integers.
   >> Integer calculation is the fastest in any machine
   >> Integers have the smallest footprint in memory, $LB(),index, ...
- Honor the required precision
>> 1° in latitude is 110....111 km
>> 1° in longitude is depending on latitude between 11km (84°N) ... 111km (0°N)
so if the target of accuracy is about 10mt 4 digits will be enough.
Typical GIS sources feed you with 6...8 decimal digits

For the process itself you iterate from a rather large grid down to your requirements like an inverse pyramid

The picture illustrates it:
You drop those that don't fit at all (gray)
In the next cycle you iterate with higher precision and much less possible targets excluding more candidates (green)

As we came from bounding rectangles the remaining result still requires a final check using JTS
but on a significant lower numer of instances.

Where comes the speed from:
- pure Integer calculation. When I saw the difference first I was beaten down and couldn't believe it.
- keep your indices slim to achieve high buffer hit rates
- filtering down results  in iteration . Larger index but less elements to check
- don't care about number of inidces you generate. It's all static data, that you can compact and that change rarely
- use all the freedom and flexibility offered by Caché  [%PARALLEL wasn't available then]

BTW.
I found this all during a benchmark against my favorite competitor [O. from Redwood City, CA] wink

The exercise:
out of 1 mio places with >1 person and income > $$$ [almost all]
find those that are closer than 300 mt to a highway.
the highway map had ~5 mio elements (GY)

On an identic VM configuration Caché took depending on precision ~ 12...20 min
while O.Spatial was stopped with no result, no sign of progress after ~3 hrs+

I think the principles of this exercise could apply to other big data situations as well.

4 Comments
讨论 (4)3
登录或注册以继续
问题
· 七月 23, 2017

German collation of ß in upper and lower case in Cache 2017

Recently upper case ß was defined as an official character for German
how can this be adjusted in Caché 2017 ?

5 Comments
讨论 (5)1
登录或注册以继续
问题
· 七月 14, 2017

How to specify the UrlMap for Rest call

Hi,

I have a UrlMap like the one below.

<Route Url="/Results/:queryID" Method="POST" Call="QueryResults"/>

What if I have to pass two parameters to QueryResults, how to do configure Url?

Something like Url="/Results/:queryID/:parameter2"   ?

Thanks.

1 Comment
讨论 (1)0
登录或注册以继续