HPCC Internal Entity Linking through SALT – A Quick Start Guide – Pt. 3


This post is the third part of a series that aims to provide simple steps for a novice user to “get going” with HPCC’s SALT Internal Linking. Please see here for further information on the series. 

We should by now have a file (people_draft.mod) which contains all the ECL code to calculate the Field Specificities for our dataset. As such, we can proceed with:

  1. Importing the ECL Code into the IDE and calculating the Field Specificities
  2. Performing our first Internal Linking Iteration
  3. Performing subsequent Iterations


Calculating Field Specifities

Import the SALT Specificity ECL Code to ECL IDE

Going back to the ECL IDE, perform the following steps:

  1. Go to Open
  2. Navigate to the location where the people_draft.mod file was saved.
  3. Select the mod file and open it.
  4. In the following popup, select the My Files folder as Target.
  5. Start the Import

You should end up with something looking as follows:

7

Figure 1: Solution Structure after Specificity ECL import

Execute the Specificity ECL Code

To calculate the Field Specificities, perform the following steps:

  1. Open the attribute BWR_Specificities
  2. Perform a Syntax Check (F7)
  3. Submit the code (ctrl + enter)

Assuming that you have performed all the steps as described up to this point, the workunit should start executing and should complete within a few minutes (given the small dataset). Once complete, perform the following:

  1. Go to ECL Watch / Workunits
  2. Identify the completed Workunit (named by default “People.BWR_Specificities – Specificities – SALT Vx.x.x” or something similar)
  3. Select the Workunit and examine the Outputs – one of these should be called Specificities
  4. Open the Specificities Output and examine its contents. The calculated specificities are present for each field. The highlighted example indicates the specificity of the first_name field:
8

Figure 2: Specificities Output Example

Note: High specificity implies that a field contains values that are unique (specific) and as such, should result in easier/better clustering.


Performing our first Linking Iteration

Before we are ready to execute our first Linking Iteration, we need to generate the Iteration ECL Code using the newly calculated Field Specificities. To achieve that, perform the following steps:

Create a copy of the draft SALT spec file “people_draft.salt“, calling it “people_1.salt“.

In the new file, update the specificity values as per specificity report (rounding up or down to the nearest integer) – for example, using the specificity from Figure 2 (first_name):

FIELD:first_name:8,0

Using SALT.exe (as described here) execute the following command to generate People_1.mod, containing all the Iteration ECL Code.

salt.exe -ga people_1.salt >People_1.mod

Import the module file into your solution (as described here). This will overwrite some of the files already generated/imported, whilst some new ones will appear. Your solution should now contain the attribute BRW_Iterate.

Open the attribute BWR_Iterate and perform a Syntax Check (F7). If successful, submit the Workunit and wait for its completion! Given the small dataset size, this should not take longer than a few minutes.


Performing subsequent Iterations

To perform subsequent Entity Linking Iterations, all we need to do is “feed” SALT with the output of the first iteration and ask it to repeat the same exercise using that dataset. To identify the name of the output logical file, all we have to do is the following:

  1. Open ECL Watch
  2. Identify the Workunit responsible for the first iteration and access its details by clicking on it.
  3. Select the “Output” tab
  4. Check out the file name of Result 15 (temp::id::people::it1).

This is the name of the file we should “feed” to SALT. Once the second iteration has taken place, the output will be named

temp::id::people::it1

with each subsequent iteration increasing the counter at the end of the filename. To perform the second iteration, proceed as follows:

Open the Module BWR_Iterate and comment out the following line:

P := People.Proc_Iterate('1');

Add the following line of code:

P := People.Proc_Iterate('2',
        DATASET('~temp::id::people::it1',
        Layout_People,
        THOR));

Here, we are instructing SALT to accept our first iteration output as its input and proceed with the second iteration. Perform a Syntax Check (F7) and submit the code.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: