MergeDatasets¶
Merges datasets holding overlapping cases but different variables. The merge may be controlled by keys or grouping variables.
Properties¶
Name | Type | Description | |
---|---|---|---|
MergeFiles | MergeFileDescription | 2..n | Description of files to be merged. |
MergeByVariables | VariableReferenceBase | 0..1 | A variable or list of variables that acts as the unique case identifier across datasets. If MergeByVariables is absent, MergeType must be “sequential” on all files. |
FirstVariable | string | 0..1 | The name of a variable set to 1 for the first row of each group of cases with the same value for the MergeByVariables variables and set to 0 for all other rows. |
LastVariable | string | 0..1 | The name of a variable set to 1 for the last row of each group of cases with the same value for the MergeByVariables variables and set to 0 for all other rows. |
Properties Inherited from TransformBase¶
Name | Type | Description | |
---|---|---|---|
ProducesDataframe | DataframeDescription | 0..n | Signify the dataframe which this transform produces. |
ConsumesDataframe | DataframeDescription | 0..n | Signify the dataframe which this transform acts upon. |
Properties Inherited from CommandBase¶
Name | Type | Description | |
---|---|---|---|
Command | string | 1..1 | The type of command |
SourceInformation | SourceInformation | 0..n | Information about the source of the command. |
MessageText | string | 0..n | Adds a message that can be displayed with the command. |
Item Type Hierarchy¶
- CommandBase
- TransformBase
- MergeDatasets
Relationships¶
Merge_options¶
Properties and Options of MergeFileDescription¶
Property name | Description |
---|---|
FileName | The names of the files to be merged. “Active file” means the file current active dataset. |
_ | |
MergeType | Describes the type of merge performed. |
> Sequential: Match rows from each input > dataframe in sequential order. > > OneToOne: Create one row for each value of > the mergeByVariables. If a combination > of the mergeByVariables is repeated, > only one row is matched. Rows with > repeated combinations of the > MergeByVariables may or may not be > included in the output file depending on > the newRow property. > > OneToMany: Create a row in the output > dataframe by matching rows in this > dataframe to every row in other dataframes > with the same value of MergeByVariables. > Note that OneToMany implies that one of > the other input datarames is set to > ManyToOne. > > ManyToOne: Create a row in the output > dataframe by matching all rows in this > dataframe to the one row in the other > dataframe with the same value of > MergeByVariables. > > Cartesian: Create a new row in the output > dataframe for every possible combination > of rows having the same value of > MergeByVariables. This is equivalent to a > many to many merge. R and Python use a > model derived from SQL, which is based on > Cartesian joins. > > Unmatched: Create a new row for every row > that cannot be matched on the > MergeByVariables > > SASmatchMerge: SAS uses a merging approach > that combines matching keys and sequential > merges within groups. | |
MergeFlagVariable | Creates a new variable indicating whether the row came from this file or a different input file. |
RenameVariables | Variables to be renamed |
_ | |
Update | Describes outcome when a variable exists in both this file and another file. |
> Master: This dataframe is the Master > dataframe. > > Ignore: If a column with the same name > exists in the Master dataframe, ignore the > values in this dataframe. > > FillNew: If a column with the same name > exists in the Master dataframe, use the > values from this dataframe only in new > rows created from this dataframe. > > UpdateMissing: If a column with the same > name exists in the Master dataframe, use > values from this dataframe when the value > in the Master dataframe is missing. Rows > not in the Master dataframe are filled > from this dataframe. > > Replace: If a column with the same name > exists in the Master dataframe, use values > from this dataframe. | |
NewRow | When TRUE, generates a new row when not matched to other files |
KeepVariables | List of variables to keep |
DropVariables | List of variables to drop |
KeepCasesCondition | Logical condition for keeping rows. |
DropCasesCondition | Logical condition for dropping rows. |
MergeByNames | > An ordered list of variables used as keys > in this file to be matched to the > variables in the mergeByVariables property > of the MergeDatasets command. This > property is only used when the key > variables in this file have different > names than the variable names listed in > the MergeDatasets command. |
SPSS_merge_examples¶
==================== EXAMPLE 1 ==================================== ``` MATCH FILES
/FILE=’merge_1.sav’
- /file=’merge_2.sav’
- .
- {“command”: “MergeDatasets”,
- “$type”: “MergeDatasets”,
“MergeFiles”: [ “mergeFileDescription”:
{“fileName”: “merge_1.sav”, “mergeType”: “Sequential”,
“newRow”: TRUE},
- “MergeFileDescription”:
{“fileName”: “merge_2.sav”, “mergeType”: “Sequential”
“newRow”: TRUE
}
==================== EXAMPLE 2 ==================================== ``` MATCH FILES
/FILE=’merge_1.sav’/in=from_f1 /file=’merge_3.sav’ /in=from_f3 /RENAME= (VAR3=VARx) /KEEP= id VAR2 VARx /by id /first=firstvar /last=lastvar
.
- {“command”: “MergeDatasets”,
- “$type”: “MergeDatasets”,
- “mergeByVariables”: [ {“$type”: “VariableSymbolExpression”,
- “VariableName”:”id”} ],
“firstVariable”: “firstvar”, “lastVariable”: “lastvar”,
- “mergeFiles”: [
- “mergeFileDescription”:
{“fileName”: “merge_1.sav”, “mergeType”: “OneToOne”, “mergeFlagVariable”:”from_f1”, “renameVariable”:[“RenamePair”:
{“OldVariable”:”VAR3”,”NewVariable”:”VARx”} ],“newRow”: TRUE },
- “mergeFileDescription”:
- {“fileName”: “merge_3.sav”, “mergeType”: “OneToOne”, “mergeFlagVariable”:”from_f3”, “newRow”: FALSE }
},
- {“command”: “KeepVariables”,
“$type”: “KeepVariables”, “variables”: {“$type”: “VariableListExpression”,
- “variables”:
- [ {“$type”: “VariableSymbolExpression”,
- “VariableName”:”id”}, {“$type”: “VariableSymbolExpression”, “VariableName”:”VAR2”}, {“$type”: “VariableSymbolExpression”, “VariableName”:”VARx”} ]
},
“messageText”: “NOTE: This KeepVariables command is after the MergeDatasets command, because it applies to the output dataframe.” }
Stata_merge_examples¶
``` NOTE: These Stata Merge options are not represented in SDTL:
noreport nolabel nonotes sorted
==================== EXAMPLE 1 ====================================
``` use “mergedat1.dta”, clear merge 1:1 _n using “mergedat4.dta” list _all
- {“command”: “MergeDatasets”,
“mergeFiles”: [ “mergeFileDescription”:
{“fileName”: “Active file”, “mergeType”: “Sequential”, “newRow”: TRUE, “mergeFlagVariable”:”_merge”},- “mergeFileDescription”:
- {“fileName”: “mergedat4.dta”, “mergeType”: “Sequential”, “newRow”: TRUE} ]
},
- {“$type”: “SetValueLabels”,
“command”: “SetValueLabels”, “variables”: [
- {“$type”: “VariableSymbolExpression”,
- “variableName”, “_merge”}
],
- “labels”: [
- {“value”: 1, “label”: “master”} {“value”: 2, “label”: “using”} {“value”: 3, “label”: “match”} {“value”: 4, “label”: “match_update”} {“value”: 5, “label”: “match_conflict”} ]
}
`
==================== EXAMPLE 2 ====================================
`
use “mergedat1.dta”, clear
merge 1:1 id using “mergedat3b.dta” , update gener(matchVar)
list _all
- {“command”: “MergeDatasets”,
“MergeFiles”: [ “MergeFileDescription”:
{“fileName”: “Active Dataframe”, “mergeType”: “1:1”, “update”: “Master”, “mergeFlagVariable”:”matchVar”, “newRow”: TRUE},- “MergeFileDescription”:
- {“fileName”: “mergedat3c.dta”, “mergeType”: “1:1”, “update”: “UpdateMissing”, “newRow”:TRUE} ],
- “MergeByVariables”: {“$type”: “VariableSymbolExpression”,
- “VariableName”:”id”}
},
- {“$type”: “SetValueLabels”,
“command”: “SetValueLabels”, “variables”: [
- {“$type”: “VariableSymbolExpression”,
- “variableName”, “matchVar”}
],
- “labels”: [
- {“value”: 1, “label”: “master”} {“value”: 2, “label”: “using”} {“value”: 3, “label”: “match”} {“value”: 4, “label”: “match_update”} {“value”: 5, “label”: “match_conflict”} ]
}
`
==================== EXAMPLE 3 ====================================
`
use “mergedat1.dta”, clear
merge 1:1 id using “mergedat3b.dta” , update replace keepusing(lastname)
- {“command”: “MergeDatasets”,
- “$type”: “MergeDatasets”,
- “mergeByVariables”:{“$type”: “VariableSymbolExpression”,
- “variableName”:”id”},
“mergeFiles”: [ “mergeFileDescription”:
{“mileName”: “Active file”, “mergeType”: “1:1”, “update”: “UpdateMissing”, “mergeFlagVariable”:”matchVar”, “newRow”:”False”},- “mergeFileDescription”:
{“fileName”: “mergedat3b.dta”, “mergeType”: “1:1”, “update”: “Master”, “newRow”:”True”, “keepVariables”:{“$type”: “VariableSymbolExpression”,
“variableName”:”lastname”}
},
- {“$type”: “SetValueLabels”,
“command”: “SetValueLabels”, “variables”: [
- {“$type”: “VariableSymbolExpression”,
- “variableName”, “_merge”}
],
- “labels”: [
- {“value”: 1, “label”: “master”} {“value”: 2, “label”: “using”} {“value”: 3, “label”: “match”} {“value”: 4, “label”: “match_update”} {“value”: 5, “label”: “match_conflict”} ]
}