Feed aggregator

Kill a session from database procedure

Tom Kyte - Mon, 2019-10-21 03:45
How i can kill a session from a stored database procedure. There is some way to do this?
Categories: DBA Blogs

Solr Sharding – Concepts & Methods

Yann Neuhaus - Sun, 2019-10-20 03:06

A few weeks ago, I published a series of blog on the Alfresco Clustering, including Solr Sharding. At that time, I planned to first explain what is really the Solr Sharding, what are the different concepts and methods around it. Unfortunately, I didn’t get the time to write this blog so I had to post the one related to Solr even before explaining the basics. Today, I’m here to rights my wrong! Obviously, this blog has a focus on Alfresco related Solr Sharding since that’s what I do.

I. Solr Sharding – Concepts

The Sharding in general is the partitioning of a set of data in a specific way. There are several possibilities to do that, depending on the technology you are working on. In the scope of Solr, the Sharding is therefore the split of the Solr index into several smaller indices. You might be interested in the Solr Sharding because it improves the following points:

  • Fault Tolerance: with a single index, if you lose it, then… you lost it. If the index is split into several indices, then even if you are losing one part, you will still have all others that will continue working
  • High Availability: it provides more granularity than the single index. You might want for example to have a few small indices without HA and then have some others with HA because you configured them to contain some really important nodes of your repository
  • Automatic Failover: Alfresco knows automatically (with Dynamic Registration) which Shards are up-to-date and which ones are lagging behind so it will choose automatically the best Shards to handle the search queries so that you get the best results possible. In combination with the Fault Tolerance above, this gives the best possible HA solution with the less possible resources
  • Performance improvements: better performance in indexing since you will have several Shards indexing the same repository so you can have less work done by each Shards for example (depends on Sharding Method). Better performance in searches since the search query will be processes by all Shards in parallel on smaller parts of the index instead of being one single query on the full index

Based on benchmarks, Alfresco considers that a Solr Shard can contain up to 50 to 80 000 000 nodes. This is obviously not a hard limit, you can have a single Shard with 200 000 000 nodes but it is more of a best practice if you want to keep a fast and reliable index. With older versions of Alfresco (before the version 5.1), you couldn’t create Shards because Alfresco didn’t support it. So, at that time, there were no other solutions than having a single big index.

There is one additional thing that must be understood here: the 50 000 000 nodes soft limit is 50M nodes in the index, not in the repository. Let’s assume that you are using a DB_ID_RANGE method (see below for the explanation) with an assumed split of 65% live nodes, 20% archived nodes, 15% others (not indexed: renditions, other stores, …). So, if we are talking about the “workspace://SpacesStore” nodes (live ones), then if we want to fill a Shard with 50M nodes, we will have to use a DB_ID_RANGE of 100*50M/65 = 77M. Basically, the Shard should be more or less “full” once there are 77M IDs in the Database. For the “archive://SpacesStore” nodes (archived ones), it would be 100*50M/20 = 250M.

Alright so what are the main concepts in the Solr Sharding? There are several terms that need to be understood:

  • Node: It’s a Solr Server (a Solr installed using the Alfresco Search Services). Below, I will use “Solr Server” instead because I already use “nodes” (lowercase) for the Alfresco Documents so using “Node” (uppercase) for the Solr Server, it might be a little bit confusing…
  • Cluster: It’s a set of Solr Servers all working together to index the same repository
  • Shard: A part of the index. In other words, it’s a representation (virtual concept) of the index composed of a certain set of nodes (Alfresco Documents)
  • Shard Instance: It’s one Instance of a specific Shard. A Shard is like a virtual concept while the Instance is the implementation of that virtual concept for that piece of the index. Several Shard Instances of the same Shard will therefore contain the same set of Alfresco nodes
  • Shard Group: It’s a collection of Shards (several indices) that forms a complete index. Shards are part of the same index (same Shard Group) if they:
    • Track the same store (E.g.: workspace://SpacesStore)
    • Use the same template (E.g.: rerank)
    • Have the same number of Shards max (“numShards“)
    • Use the same configuration (Sharding methods, Solr settings, …)

Shard is often (wrongly) used in place of Shard Instance which might lead to some confusion… When you are reading “Shard”, sometimes it means the Shard itself (the virtual concept), sometimes it’s all its Shard Instances. This is these concepts can look like:
Solr Sharding - Concepts

II. Solr Sharding – Methods

Alfresco supports several methods for the Solr Sharding and they all have different attributes and different ways of working:

  • MOD_ACL_ID (ACL v1): Alfresco nodes and ACLs are grouped by their ACL ID and stored together in the same Shard. Different ACL IDs will be assigned randomly to different Shards (depending on the number of Shards you defined). Each Alfresco node using a specific ACL ID will be stored in the Shard already containing this ACL ID. This simplifies the search requests from Solr since ACLs and nodes are together, so permission checking is simple. If you have a lot of documents using the same ACL, then the distribution will not be even between Shards. Parameters:
    • shard.method=MOD_ACL_ID
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • ACL_ID (ACL v2): This is the same as the MOD_ACL_ID, the only difference is that it changes the method to assign to ACL to the Shards so it is more evenly distributed but if you still have a lot of documents using the same ACL then you still have the same issue. Parameters:
    • shard.method=ACL_ID
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • DB_ID: This is the default Sharding Method in Solr 6 which will evenly distribute the nodes in the different Shards based on their DB ID (“alf_node.id“). The ACLs are replicated on each of the Shards so that Solr is able to perform the permission checking. If you have a lot of ACLs, then this will obviously make the Shards a little bit bigger, but this is usually insignificant. Parameters:
    • shard.method=DB_ID
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • DB_ID_RANGE: Pretty much the same thing as the DB_ID but instead of looking into each DB ID one by one, it will just dispatch the DB IDs from the same range into the same Shard. The ranges are predefined at the Shard Instance creation and you cannot change them later, but this is also the only Sharding Method that allows you to add new Shards dynamically (auto-scaling) without the need to perform a full reindex. The lower value of the range is included and the upper value is excluded (for Math lovers: [begin-end[ ;)). Since DB IDs are incremental (increase over time), performing a search query with a date filter might end-up as simple as checking inside a single Shard. Parameters:
    • shard.method=DB_ID_RANGE
    • shard.range=<begin-end>
    • shard.instance=<shard.instance>
  • DATE: Months will be assigned to a specific Shard sequentially and then nodes are indexed into the Shard that was assigned the current month. Therefore, if you have 2 Shards, each one will contain 6 months (Shard 1 = Months 1,3,5,7,9,11 // Shard 2 = Months 2,4,6,8,10,12). It is possible to assign consecutive months to the same Shard using the “shard.date.grouping” parameter which defines how many months should be grouped together (a semester for example). If there is no date on a node, the fallback method is to use DB_ID instead. Parameters:
    • shard.method=DATE
    • shard.key=exif:dateTimeOriginal
    • shard.date.grouping=<1-12>
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • PROPERTY: A property is specified as the base for the Shard assignment. The first time that a node is indexed with a new value for this property, the node will be assigned randomly to a Shard. Each node coming in with the same value for this property will be assigned to the same Shard. Valid properties are either d:text (single line text), d:date (date only) or d:datetime (date+time). It is possible to use only a part of the property’s value using “shard.regex” (To keep only the first 4 digit of a date for example: shard.regex=^\d{4}). If this property doesn’t exist on a node or if the regex doesn’t match (if any is specified), the fallback method is to use DB_ID instead. Parameters:
    • shard.method=PROPERTY
    • shard.key=cm:creator
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>
  • EXPLICIT_ID: Pretty much similar to the PROPERTY but instead of using the value of a “random” property, this method requires a specific property (d:text) to define explicitly on which Shard the node should be indexed. Therefore, this will require an update of the Data Model to have one property dedicated to the assignment of a node to a Shard. In case you are using several types of documents, then you will potentially want to do that for all. If this property doesn’t exist on a node or if an invalid Shard number is given, the fallback method is to use DB_ID instead. Parameters:
    • shard.method=EXPLICIT_ID
    • shard.key=<property> (E.g.: cm:targetShardInstance)
    • shard.instance=<shard.instance>
    • shard.count=<shard.count>

As you can see above, each Sharding Method has its own set of properties. You can define these properties in:

  • The template’s solrcore.properties file in which case it will apply to all Shard Instance creations
    • E.g.: $SOLR_HOME/solrhome/templates/rerank/conf/solrcore.properties
  • The URL/Command used to create the Shard Instance in which case it will only apply to the current Shard Instance creation
    • E.g.: curl -v “http://host:port/solr/admin/cores?action=newCore&…&property.shard.method=DB_ID_RANGE&property.shard.range=0-50000000&property.shard.instance=0

Summary of the benefits of each method:
Solr Sharding - Benefits

First supported versions for the Solr Sharding in Alfresco:
Solr Sharding - Availability

Hopefully, this is a good first look into the Solr Sharding. In a future blog, I will talk about the creation process and show some example of what is possible. If you want to read more on the subject, don’t hesitate to take a look at the Alfresco documentation, it doesn’t explain everything, but it is still a very good starting point.

Cet article Solr Sharding – Concepts & Methods est apparu en premier sur Blog dbi services.

Why cost for TABLE ACCESS BY INDEX ROWID to high for only one row

Tom Kyte - Sat, 2019-10-19 15:45
Dear Tom, I have problem with query on table have function base index. create index : <code> create index customer_idx_idno on Customer (lower(id_no)) ; --- id_no varchar2(40) </code> <b>Query 1:</b> execute time 0.031s but cost 5,149, 1 row ...
Categories: DBA Blogs

How to get unique values/blanks across all columns

Tom Kyte - Sat, 2019-10-19 15:45
Hi, I have a wide table with 200 odd columns. Requirement is to pivot the columns and display the unique values and count of blanks within each column <code>CREATE TABLE example( c1 VARCHAR(10), c2 VARCHAR(10), c3 VARCHAR(10) ); / INSERT ...
Categories: DBA Blogs

Elastic search using Oracle 19c

Tom Kyte - Sat, 2019-10-19 15:45
Team, Very recently we got a question from our customer that "can I replace Elastic search using Oracle 19c or any version of Oracle database prior to that"? Any inputs/directions to that please - kindly advice.
Categories: DBA Blogs

Add NULLABLE column with DEFAULT values to a large table

Tom Kyte - Sat, 2019-10-19 15:45
I'm adding a number of columns with DEFAULT values that are NULLABLE to a large table e.g <code> alter table big_table add (col1 varchar2(1) default 0, col2 varchar2(1) default 0); </code> It's taking a long time to do because Oracle is e...
Categories: DBA Blogs

Create table from select with changes in the column values

Tom Kyte - Sat, 2019-10-19 15:45
Hello, In the work we have an update script that takes around 20 hours, and some of the most demanding queries are updates where we change some values, something like: <code> UPDATE table1 SET column1 = DECODE(table1.column1,null,null,'no...
Categories: DBA Blogs

Data archival and purging for OLTP database.

Tom Kyte - Sat, 2019-10-19 15:45
Hi Tom, Need your suggestion regarding data archival and purging solution for OLTP db. Currently we are planning to have below approach. database is size is 150 Gb and planning to run the jobs monthly. 1) Generate flat files from table based on...
Categories: DBA Blogs

Job to end in case connection not establishing with utl_http.begin_request

Tom Kyte - Sat, 2019-10-19 15:45
i am tracking around 17000 orders through a web service through pl/sql to a destination server. i am running multiple jobs in batches(for 500 orders each job) for invoking webservice to get the order status. so around 34 jobs are running (17000/500) ...
Categories: DBA Blogs

Inserting values into a table with '&'

Tom Kyte - Sat, 2019-10-19 15:45
Hi, I want to insert a values into a table as follows: create table test (name varchar2(35)); insert into test values ('&Vivek'); I tried the escape character '\' but the system asks for a value of the substitution variable. I also did a...
Categories: DBA Blogs

SELECT ANY DICTIONARY - What Privileges Does it Have - SELECT_CATALOG_ROLE

Pete Finnigan - Sat, 2019-10-19 15:45
There has been a few blog posts over the years discussing what is the difference between SELECT ANY DICTIONARY and the SELECT_CATALOG_ROLE. Hemant posted in 2014 about the difference between SELECT ANY DICTIONARY and SELECT_CATALOG_ROLE . This post was a....[Read More]

Posted by Pete On 11/10/19 At 01:59 PM

Categories: Security Blogs

What Privileges Can you Grant On PL/SQL?

Pete Finnigan - Sat, 2019-10-19 15:45
Oracle has a lot of privileges and models; privileges can be granted to users, roles and also since 12c roles can be granted to PL/SQL code (I will not discuss this aspect here as i will bog separately about grants....[Read More]

Posted by Pete On 08/10/19 At 01:43 PM

Categories: Security Blogs

ORA-01950 Error on a Sequence - Error on Primary Key Index

Pete Finnigan - Sat, 2019-10-19 15:45
I posted yesterday a blog about an error on a sequence of ORA-01950 on tablespace USERS - ORA-01950 Error on a Sequence . This was attributed to the sequence by me because that's where the error in Oracle was pointing....[Read More]

Posted by Pete On 01/10/19 At 01:12 PM

Categories: Security Blogs

ORA-01950 Error on a Sequence

Pete Finnigan - Sat, 2019-10-19 15:45
UPDATE: I have updated information for this post and rather than make this one much longer i created a new post - please see ORA-01950 Error on a Sequence - Error on Primary Key Index Wow, its been a while....[Read More]

Posted by Pete On 30/09/19 At 01:42 PM

Categories: Security Blogs

Machine Learning with SQL

Andrejus Baranovski - Sat, 2019-10-19 10:25
Python (and soon JavaScript with TensorFlow.js) is a dominant language for Machine Learning. What about SQL? There is a way to build/run Machine Learning models in SQL. There could be a benefit to run model training close to the database, where data stays. With SQL we can leverage strong data analysis out of the box and run algorithms without fetching data to the outside world (which could be an expensive operation in terms of performance, especially with large datasets). This post is to describe how to do Machine Learning in the database with SQL.



Read more in my Towards Data Science post.

Basic Replication -- 8 : REFRESH_MODE ON COMMIT

Hemant K Chitale - Sat, 2019-10-19 09:26
So far, in previous posts in this series, I have demonstrated Materialized Views that set to REFRESH ON DEMAND.

You can also define a Materialized View that is set to REFRESH ON COMMIT -- i.e. every time DML against the Source Table is committed, the MV is also immediately updated.  Such an MV must be in the same database  (you cannot define an ON COMMIT Refresh across two databases  -- to do so, you have to build your own replication code, possibly using Database Triggers or external methods of 2-phase commit).

Here is a quick demonstration, starting with a Source Table in the HEMANT schema and then building a FAST REFRESH MV in the HR schema.

SQL> show user
USER is "HEMANT"
SQL> create table hemant_source_tbl (id_col number not null primary key, data_col varchar2(30));

Table created.

SQL> grant select on hemant_source_tbl to hr;

Grant succeeded.

SQL> create materialized view log on hemant_source_tbl;

Materialized view log created.

SQL> grant select on mlog$_hemant_source_tbl to hr;

Grant succeeded.

SQL>
SQL> grant create materialized view to hr;

Grant succeeded.

SQL> grant on commit refresh on hemant_source_tbl to hr;

Grant succeeded.

SQL>
SQL> grant on commit refresh on mlog$_hemant_source_tbl to hr;

Grant succeeded.

SQL>


Note : I had to grant the CREATE MATERIALIZED VIEW privilege to HR for this test case. Also, as the MV is to Refresh ON COMMIT, two additional object-level grants on the Source Table and the Materialized View Log are required as the Refresh is across schemas.

SQL> connect hr/HR@orclpdb1
Connected.
SQL> create materialized view hr_mv_on_commit
2 refresh fast on commit
3 as select id_col as primary_key_col, data_col as value_column
4 from hemant.hemant_source_tbl;

Materialized view created.

SQL>


Now that the Materialized View is created successfully, I will test DML against the table and check that an explicit REFRESH call (e.g. DBMS_MVIEW.REFRESH or DBMS_REFRESH.REFRESH) is not required.

SQL> connect hemant/hemant@orclpdb1
Connected.
SQL> insert into hemant_source_tbl values (1,'First');

1 row created.

SQL> commit;

Commit complete.

SQL> select * from hr.hr_mv_on_commit;

PRIMARY_KEY_COL VALUE_COLUMN
--------------- ------------------------------
1 First

SQL> connect hr/HR@orclpdb1
Connected.
SQL> select * from hr_mv_on_commit;

PRIMARY_KEY_COL VALUE_COLUMN
--------------- ------------------------------
1 First

SQL>


The Materialized View in the HR schema was refreshed immediately, without an explicit REFRESH call.

Remember : An MV that is to REFRESH ON COMMIT must be in the same database as the Source Table.




Categories: DBA Blogs

pgconf.eu – Welcome to the community

Yann Neuhaus - Fri, 2019-10-18 15:41

On tuesday I started my journey to Milan to attend my first pgconf.eu, which was also my first big conference. I was really excited what will come up to me. How will it be, to become a visible part of the community. How will it be, to give my first presentation in front of so many people?

The conference started with the welcome and opening session. It took place in a huge room, to give all of the participants a seat. Really amazing, how big this community is and it is still growing. So many people from all over the world (Japan, USA, Chile, Canada….) attending this conference.

And suddenly I realized, this is the room, where I have to give my session. Some really strange feelings came up. This is my first presentation at a conference, this is the main stage, there is space for so many people! And I really hoped, they will make it smaller for me. But there was something else: Anticipation.

But first I want to give you some impressions from my time at the pgconf. Amazing to talk to one of the main developers of Patroni. I was really nervous when I just went to him and said: “Hi, may I ask you a question?” For sure he didn’t say NO. Even all the other ladies and gentlemen I met (the list is quite long), they all are so nice and all of them really open minded (is this because they all work with an open source database?). And of course a special thanks to Pavel Golub for the great picture. Find it in Daniel’s blog
Beside meeting all that great people, I enjoyed some really informative and cool sessions.


Although I still hoped they are going to make the room smaller for my presentation, of course they didn’t do it. So I had only one chance:

And I did it and afterwards I book it under “good experience”. A huge room is not so much different than a small one.

As I am back home now, I want to say: Thanks pgconf.eu and dbi services for giving me this opportunity and thanks to the community for this warm welcome.

Cet article pgconf.eu – Welcome to the community est apparu en premier sur Blog dbi services.

CBO Oddities – 1

Jonathan Lewis - Fri, 2019-10-18 12:10

I’ve decided to do a little rewriting and collating so that I can catalogue related ideas in an order that makes for a better narrative. So this is the first in a series of notes designed to help you understand why the optimizer has made a particular choice and why that choice is (from your perspective) a bad one, and what you can do either to help the optimizer find a better plan, or subvert the optimizer and force a better plan.

If you’re wondering why I choose to differentiate between “help the optimizer” and “subvert the optimizer” consider the following examples.

  • A query is joining two tables in the wrong order with a hash join when you know that a nested loop join in the opposite order would far better because you know that the data you want is very nicely clustered and there’s a really good index that would make access to that data very efficient. You check the table preferences and discover that the table_cached_blocks preference (see end notes) is at its default value of 1, so you set it to 16 and gather fresh stats on the indexes on the table. Oracle now recognises the effectiveness of this index and changes plan accordingly.
  • The optimizer has done a surprising transformation of a query, aggregating a table before joining to a couple of other tables when you were expecting it to use the joins to eliminate a huge fraction of the data before aggregating it.  After a little investigation you find that setting hidden parameter _optimizer_distinct_placement to false stops this happening.

You may find the distinction unnecessarily fussy, but I’d call the first example “helping the optimzier” – it gives the optimizer some truthful information about your data that is potentially going to result in better decisions in many different statements – and the second example “subverting the optimizer” – you’ve brute-forced it into not taking a path you didn’t like but at the same time you may have stopped that feature from appearing in other ways or in other queries. Of course, you might have minimised the impact of setting the parameter by using the opt_param() hint to apply the restriction to just this one query, nevertheless it’s possible that there is a better plan for the query that would have used the feature at some other point in the query if you’d managed to do something to help the optimizer rather than constraining it.

What’s up with the Optimizer

It’s likely that most of the articles will be based around interpreting execution plans since those are the things that tell us what the optimizer thinks will happen when it executes a statement, and within execution plans there are three critical aspects to consider –

  1. the numbers (most particularly Cost and Rows),
  2. the shape of the plan,
  3. the Predicate Information.

I want to use this note to make a couple of points about just the first of the three.

  • First – the estimates on any one line of an execution plan are “per start” of the line; some lines of an execution plan will be called many times in the course of a statement. In many cases the Rows estimate from one line of a plan will dictate the number of times that some other line of the plan will be executed – so a bad estimate of “how much data” can double up as a bad estimate of “how many times”, leading to a plan that looks efficient on paper but does far too much work at run-time. A line in a plan that looks a little inefficient may be fine if it executes only one, a line that looks very efficient may be a disaster if it executes a million time. Being able to read a plan and spot the places where the optimizer has produced a poor estimate of Rows is a critical skill – and there are many reasons why the optimizer produces poor estimates. Being able to spot poor estimates depends fairly heavily on knowing the data, but if you know the generic reasons for the optimizer producing poor estimates you’ve got a head start for recognising and addressing the errors when they appear.
  • Second – Cost is synonymous with Time. For a given instance at a given moment there is a simple, linear, relationship between the figure that the optimizer reports for the Cost of a statement (or subsection of a statement) and the Time that the optimizer reports. For many systems (those that have not run the calibrate_io procedure) the Time is simply the Cost multiplied by the time the optimizer thinks it will take to satisfy a single block read request, and the Cost is the optimizer’s estimate of the I/O requirement to satisfy the statement – with a fudge factor introduced to recognise the fact that a “single block” read request ought to complete in less time than a “multiblock” read request. Generally speaking the optimizer will consider many possible plans for a statement and pick the plan with the lowest estimated cost – but there is at least one exception to this rule, and it is an unfortunate weakness in the optimizer that there are many valid reasons why its estimates of Cost/Time are poor. Of course, you will note that the values that Oracle reports for the Time column are only accurate to the second – which isn’t particularly helpful when a single block read typically operates in the range of a few milliseconds.

To a large degree the optimizer’s task boils down to:

  • What’s the volume and scatter of the data I need
  • What access paths, with what wastage, are available to get to that data
  • How much time will I spend on I/O reading (and possibly discarding) data to extract the bit I want

Of course there are other considerations like the amount of CPU needed for a sort, the potential for I/O as sorts or hash joins, the time to handle a round-trip to a remote system, and RAC variations on the basic theme. But for many statements the driving issue is that any bad estimates of “how much data” and “how much (real) I/O” will lead to bad, potentially catastrophic, choices of execution plan. In the next article I’ll list all the different reasons (that I can think of at the time) why the optimizer can produce bad estimates of volume and time.

References for Cost vs. Time

References for table_cached_blocks:

 

OAC v105.4: Understanding Map Data Quality

Rittman Mead Consulting - Fri, 2019-10-18 08:58
 Understanding Map Data Quality

Last week Oracle Analytics Cloud v105.4 was announced. One of the features particularly interested me since it reminded the story of an Italian couple willing to spend their honeymoon in the Australian Sydney and ending up in the same Sydney city but in Nova Scotia for a travel agency error. For the funny people out there: don't worry, it wasn't me!

The feature is "Maps ambiguous location matches" and I wanted to write a bit about it.

#OracleAnalytics 105.4 update just about to go live and deploy on your environments. Check-out some of the new features coming. Here is a first list of selected items: https://t.co/Megqz5ekcx. Stay tuned with whole #OAC team (@OracleAnalytics,@BenjaminArnulf...) for more examples pic.twitter.com/CWpj8rC1Bf

— Philippe Lions (@philippe_lions) October 8, 2019

Btw OAC 105.4  includes a good set of new features like a unified Home page, the possibility to customize any DV font and more options for security and on-premises connections amongst others. For a full list of new features check out the related Oracle blog or videos.

Maps: a bit of History

Let's start with a bit of history. Maps have been around in OBIEE first and OAC later since a long time, in the earlier stages of my career I spent quite a lot of time writing HTML and Javascript to include map visualizations within OBIEE 10g. The basic tool was called Mapviewer and the knowledge & time required to create a custom clickable or drillable map was....huge!

With the raise of OBIEE 11g and 12c the Mapping capability became easier, a new "Map" visualization type was included in the Answers and all we had to do was to match the geographical reference coming from one of our Subject Areas (e.g. Country Name) with the related column containing the shape information (e.g. the Country Shape).

 Understanding Map Data Quality

After doing so, we were able to plot our geographical information properly: adding multiple layers, drilling capabilities and tooltips was just a matter of few clicks.

 Understanding Map Data QualityThe Secret Source: Good Maps and Data Quality

Perfect, you might think, we can easily use maps everywhere as soon as we have any type of geo-location data available in our dataset! Well, the reality in the old days wasn't like that, Oracle at the time provided some sample maps with a certain level of granularity and covering only some countries in detail. What if we wanted to display all the suburbs of Verona? Sadly that wasn't included so we were forced to either find a free source online or to purchase it from a Vendor.

The source of map shapes was only half of the problem to solve: we always need to create a join with a column coming from our Subject Area! Should we use the Zip Code? What about the Address? Is City name enough? The deeper we were going into the mapping details the more problems were arising.

A common problem (as we saw before about Sydney) was using the City name. How many cities are called the same? How many regions? Is the street name correct? Data quality was and still is crucial to provide accurate data and not only a nice but useless map view.

OAC and the Automatic Mapping Capability

Within OAC, DV offers the Automatic Mapping Capability, we only need to include in a Project a column containing a geographical reference (lat/long, country name etc), select "Map" as visualization type and the tool will choose the most appropriate mapping granularity that matches our dataset.

 Understanding Map Data Quality

Great! This solves all our issues! Well... not all of them! The Automatic Mapping capability doesn't have all the possible maps in it, but we can always include new custom maps using the OAC Console if we need them.

 Understanding Map Data QualitySo What's New in 105.4?

All the above was available way before the latest OAC release. The 105.4 adds the "Maps ambiguous location matches" feature, which means that every time we create a Map View, OAC will provide us with a Location Matches option

 Understanding Map Data Quality

If we click this option OAC will provide as a simple window where we can see:

  • How many locations matched
  • How many locations have issues
  • What's the type of Issue?
 Understanding Map Data Quality

The type of issue can be one between:

  • No Match in case OAC doesn't find any comparable geographical value
  • Multiple Matches  when there are multiple possible associations
  • Partial Matches when there is a match only to part of the content
 Understanding Map Data Quality

We can then take this useful information and start a process of data cleaning to raise the quality of our data visualization.

Conclusion

Maps were and are a really important visualization available in OAC. The Maps ambiguous location matches feature provides a way to understand if our visualization is representative of our dataset. So, if you want to avoid spending your honeymoon in the wrong Sydney or if you just want to provide accurate maps on top of your dataset, use this feature available in OAC!

Categories: BI & Warehousing

In Defence of Best Practices

Tim Hall - Fri, 2019-10-18 03:38

The subject of “Best Practices” came up again yesterday in a thread on Twitter. This is a subject that rears its head every so often.

I understand all the arguments against the term “Best Practices”. There isn’t one correct way to do things. If there were it would be the only way, or automatic etc. It’s all situational etc. I really do understand all that. I’ve been in this conversation so many times over the years you wouldn’t believe it. I’ve heard all the various sentences and terms people would prefer to use rather than “Best Practice”, but here’s my answer to all that.

“Best practices are fine. Get over yourself and shut up!”

Tim Hall : 18th October 2019

I’ve said this more politely in many other conversations, including endless email chains etc.

When it comes down to it, people need guidance. A good best practice will give some context to suggest it is a starting point, and will give people directions for further information/investigation, but it’s targeted at people who don’t know enough about what they are doing and need help. Without a best practice they will do something really bad, and when shit happens they will blame the product. A good best practice can be the start of a journey for people.

I agree that the “Always do this because ‘just bloody do it!'” style of best practice is bad, but we all know that…

I just find the whole conversation so elitist. I spend half of my life Googling solutions (mostly non-Oracle stuff) and reading best practices and some of them are really good. Some of them have definitely improved my understanding, and left me in a position where I have a working production system that would otherwise not be working.

I’m sure this post will get a lot of reactions where people try and “explain to me” why I am wrong, and what I’m not understanding about the problems with best practices. As mentioned before, I really do know all that and I think you are wrong, and so do the vast majority of people outside your elitist echo chamber. Want to test that? Try these…

  • Write a post called “Best Practices for {insert subject of your choice}”. It will get more hits than anything else you’ve ever written.
  • Submit a conference session called “Best Practices for {insert subject of your choice}”. Assuming it gets through the paper selection, you will have more bums on seats than you’ve ever had before for that same subject.

Rather than wasting your life arguing about how flawed the term “Best Practices” is, why don’t you just write some good best practices? Show the world how they should be done, and start people on a positive journey. It’s just a term. Seriously. Get over yourself!

Cheers

Tim…

PS. I hope people from yesterday’s tweets don’t think this is directed at them. It’s really not. It’s the subject matter! This really is a subject I’ve revisited so many times over the years…

Updates

Due to repeatedly having to explain myself, here come some points people have raised and my reactions. I’m sure this list will grow as people insist on “educating me” about why I’m wrong.

I prefer “standard” or “normal” to “best”. As I said at the start of the post, I’ve heard just about every potential variation of this, and I just don’t care. They are all the same thing. They are all best practices. It’s just words. Yes, I know what “best” means, but that’s irrelevant. This is a commonly used term in tech and you aren’t getting rid of it, so own it!

I’ve seen people weaponize best practices. OK. So are you saying they won’t weaponize “standard practices” or “normal practices”? They won’t ever say, “So are you telling me you went against normal practices?”. Of course they will. Stupid people/companies will do stupid things regardless of the name.

But it’s not the “best”! Did you even read my post? I’m so tired of this. It’s a best practice to never use hints in SQL. I think that’s pretty solid advice. I do use hints in some SQL, but I always include a comment to explain why. I have deviated from best practice, but documented the reason why. If a person/company wants no deviation from best practice, they can remove it and have shit performance. That’s their choice. I’ve been transparent and explained my deviation. If this is not the way you work, you are wrong, not the best practice.

Most vendor best practice documents are crap. I have some sympathy for this, but I raise tickets against bad documentation, including best practices, and generally the reception to these has been good. The last one was a couple of weeks ago and the company (not Oracle) changed the docs the same day. I always recommend raising an SR/ticket/bug against bad documentation. It doesn’t take much time and you are improving things for yourself and everyone else. I feel like you can’t complain about the quality of the docs if you never point out the faults.

In Defence of Best Practices was first posted on October 18, 2019 at 9:38 am.
©2012 "The ORACLE-BASE Blog". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement.

Pages

Subscribe to Oracle FAQ aggregator