This post is the third part of a series that aims to provide simple steps for a novice user to “get going” with HPCC’s SALT Internal Linking. Please see here for further information on the series.
We should by now have a file (people_draft.mod) which contains all the ECL code to calculate the Field Specificities for our dataset. As such, we can proceed with:
- Importing the ECL Code into the IDE and calculating the Field Specificities
- Performing our first Internal Linking Iteration
- Performing subsequent Iterations
Calculating Field Specifities
Import the SALT Specificity ECL Code to ECL IDE
Going back to the ECL IDE, perform the following steps:
- Go to Open
- Navigate to the location where the people_draft.mod file was saved.
- Select the mod file and open it.
- In the following popup, select the My Files folder as Target.
- Start the Import
You should end up with something looking as follows:
Execute the Specificity ECL Code
To calculate the Field Specificities, perform the following steps:
- Open the attribute BWR_Specificities
- Perform a Syntax Check (F7)
- Submit the code (ctrl + enter)
Assuming that you have performed all the steps as described up to this point, the workunit should start executing and should complete within a few minutes (given the small dataset). Once complete, perform the following:
- Go to ECL Watch / Workunits
- Identify the completed Workunit (named by default “People.BWR_Specificities – Specificities – SALT Vx.x.x” or something similar)
- Select the Workunit and examine the Outputs – one of these should be called Specificities
- Open the Specificities Output and examine its contents. The calculated specificities are present for each field. The highlighted example indicates the specificity of the first_name field:
Note: High specificity implies that a field contains values that are unique (specific) and as such, should result in easier/better clustering.
Performing our first Linking Iteration
Before we are ready to execute our first Linking Iteration, we need to generate the Iteration ECL Code using the newly calculated Field Specificities. To achieve that, perform the following steps:
Create a copy of the draft SALT spec file “people_draft.salt“, calling it “people_1.salt“.
In the new file, update the specificity values as per specificity report (rounding up or down to the nearest integer) – for example, using the specificity from Figure 2 (first_name):
Using SALT.exe (as described here) execute the following command to generate People_1.mod, containing all the Iteration ECL Code.
salt.exe -ga people_1.salt >People_1.mod
Import the module file into your solution (as described here). This will overwrite some of the files already generated/imported, whilst some new ones will appear. Your solution should now contain the attribute BRW_Iterate.
Open the attribute BWR_Iterate and perform a Syntax Check (F7). If successful, submit the Workunit and wait for its completion! Given the small dataset size, this should not take longer than a few minutes.
Performing subsequent Iterations
To perform subsequent Entity Linking Iterations, all we need to do is “feed” SALT with the output of the first iteration and ask it to repeat the same exercise using that dataset. To identify the name of the output logical file, all we have to do is the following:
- Open ECL Watch
- Identify the Workunit responsible for the first iteration and access its details by clicking on it.
- Select the “Output” tab
- Check out the file name of Result 15 (temp::id::people::it1).
This is the name of the file we should “feed” to SALT. Once the second iteration has taken place, the output will be named
with each subsequent iteration increasing the counter at the end of the filename. To perform the second iteration, proceed as follows:
Open the Module BWR_Iterate and comment out the following line:
P := People.Proc_Iterate('1');
Add the following line of code:
P := People.Proc_Iterate('2', DATASET('~temp::id::people::it1', Layout_People, THOR));
Here, we are instructing SALT to accept our first iteration output as its input and proceed with the second iteration. Perform a Syntax Check (F7) and submit the code.