MergeDatasets
Merges datasets holding overlapping cases but different variables. The merge may be controlled by keys or grouping variables.
Properties
Name |
Type |
Description |
|
---|---|---|---|
MergeFiles |
2..n |
Description of files to be merged. |
|
MergeByVariables |
0..1 |
A variable or list of variables that acts as the unique case identifier across datasets. If MergeByVariables is absent, MergeType must be “sequential” on all files. |
|
FirstVariable |
0..1 |
The name of a variable set to 1 for the first row of each group of cases with the same value for the MergeByVariables variables and set to 0 for all other rows. |
|
LastVariable |
0..1 |
The name of a variable set to 1 for the last row of each group of cases with the same value for the MergeByVariables variables and set to 0 for all other rows. |
Properties Inherited from TransformBase
Name |
Type |
Description |
|
---|---|---|---|
ProducesDataframe |
0..n |
Signify the dataframe which this transform produces. |
|
ConsumesDataframe |
0..n |
Signify the dataframe which this transform acts upon. |
Properties Inherited from CommandBase
Name |
Type |
Description |
|
---|---|---|---|
Command |
1..1 |
The type of command |
|
SourceInformation |
0..n |
Information about the source of the command. |
|
MessageText |
0..n |
Adds a message that can be displayed with the command. |
Item Type Hierarchy
- CommandBase
- TransformBase
MergeDatasets
Relationships
The following identified item types reference this type.
Merge_options
Properties and Options of MergeFileDescription
+----------------------+----------------------------------------------+ |Property name | Description | +======================+==============================================+ | FileName | The names of the files to be merged. | | | "Active file" means the file current | | | active dataset. | +----------------------+----------------------------------------------+ | _ | | +----------------------+----------------------------------------------+ | MergeType | Describes the type of merge performed. | +----------------------+----------------------------------------------+ | | > Sequential: Match rows from each input | | | > dataframe in sequential order. | | | > | | | > OneToOne: Create one row for each value of | | | > the mergeByVariables. If a combination | | | > of the mergeByVariables is repeated, | | | > only one row is matched. Rows with | | | > repeated combinations of the | | | > MergeByVariables may or may not be | | | > included in the output file depending on | | | > the newRow property. | | | > | | | > OneToMany: Create a row in the output | | | > dataframe by matching rows in this | | | > dataframe to every row in other dataframes | | | > with the same value of MergeByVariables. | | | > Note that OneToMany implies that one of | | | > the other input datarames is set to | | | > ManyToOne. | | | > | | | > ManyToOne: Create a row in the output | | | > dataframe by matching all rows in this | | | > dataframe to the one row in the other | | | > dataframe with the same value of | | | > MergeByVariables. | | | > | | | > Cartesian: Create a new row in the output | | | > dataframe for every possible combination | | | > of rows having the same value of | | | > MergeByVariables. This is equivalent to a | | | > many to many merge. R and Python use a | | | > model derived from SQL, which is based on | | | > Cartesian joins. | | | > | | | > Unmatched: Create a new row for every row | | | > that cannot be matched on the | | | > MergeByVariables | | | > | | | > SASmatchMerge: SAS uses a merging approach | | | > that combines matching keys and sequential | | | > merges within groups. | +----------------------+----------------------------------------------+ | MergeFlagVariable | Creates a new variable indicating whether | | | the row came from this file or a different | | | input file. | +----------------------+----------------------------------------------+ | RenameVariables | Variables to be renamed | +----------------------+----------------------------------------------+ | _ | | +----------------------+----------------------------------------------+ | Update | Describes outcome when a variable exists in | | | both this file and another file. | +----------------------+----------------------------------------------+ | | > Master: This dataframe is the Master | | | > dataframe. | | | > | | | > Ignore: If a column with the same name | | | > exists in the Master dataframe, ignore the | | | > values in this dataframe. | | | > | | | > FillNew: If a column with the same name | | | > exists in the Master dataframe, use the | | | > values from this dataframe only in new | | | > rows created from this dataframe. | | | > | | | > UpdateMissing: If a column with the same | | | > name exists in the Master dataframe, use | | | > values from this dataframe when the value | | | > in the Master dataframe is missing. Rows | | | > not in the Master dataframe are filled | | | > from this dataframe. | | | > | | | > Replace: If a column with the same name | | | > exists in the Master dataframe, use values | | | > from this dataframe. | +----------------------+----------------------------------------------+ | NewRow | When TRUE, generates a new row when not | | | matched to other files | +----------------------+----------------------------------------------+ | KeepVariables | List of variables to keep | +----------------------+----------------------------------------------+ | DropVariables | List of variables to drop | +----------------------+----------------------------------------------+ | KeepCasesCondition | Logical condition for keeping rows. | +----------------------+----------------------------------------------+ | DropCasesCondition | Logical condition for dropping rows. | +----------------------+----------------------------------------------+ | MergeByNames | > An ordered list of variables used as keys | | | > in this file to be matched to the | | | > variables in the mergeByVariables property | | | > of the MergeDatasets command. This | | | > property is only used when the key | | | > variables in this file have different | | | > names than the variable names listed in | | | > the MergeDatasets command. | +----------------------+----------------------------------------------+
SPSS_merge_examples
==================== EXAMPLE 1 ====================================
MATCH FILES
/FILE='merge_1.sav'
/file='merge_2.sav'
.
{"command": "MergeDatasets",
"$type": "MergeDatasets",
"MergeFiles": [
"mergeFileDescription":
{"fileName": "merge_1.sav",
"mergeType": "Sequential",
"newRow": TRUE
},
"MergeFileDescription":
{"fileName": "merge_2.sav",
"mergeType": "Sequential"
"newRow": TRUE
}
]
}
==================== EXAMPLE 2 ====================================
MATCH FILES
/FILE='merge_1.sav'
/in=from_f1
/file='merge_3.sav'
/in=from_f3
/RENAME= (VAR3=VARx)
/KEEP= id VAR2 VARx
/by id
/first=firstvar
/last=lastvar
.
{"command": "MergeDatasets",
"$type": "MergeDatasets",
"mergeByVariables": [ {"$type": "VariableSymbolExpression",
"VariableName":"id"} ],
"firstVariable": "firstvar",
"lastVariable": "lastvar",
"mergeFiles": [
"mergeFileDescription":
{"fileName": "merge_1.sav",
"mergeType": "OneToOne",
"mergeFlagVariable":"from_f1",
"renameVariable":["RenamePair":
{"OldVariable":"VAR3","NewVariable":"VARx"} ],
"newRow": TRUE
},
"mergeFileDescription":
{"fileName": "merge_3.sav",
"mergeType": "OneToOne",
"mergeFlagVariable":"from_f3",
"newRow": FALSE
}
},
{"command": "KeepVariables",
"$type": "KeepVariables",
"variables": {"$type": "VariableListExpression",
"variables":
[ {"$type": "VariableSymbolExpression",
"VariableName":"id"},
{"$type": "VariableSymbolExpression",
"VariableName":"VAR2"},
{"$type": "VariableSymbolExpression",
"VariableName":"VARx"}
]
},
"messageText": "NOTE: This KeepVariables command is after the MergeDatasets command, because it applies to the output dataframe."
}
Stata_merge_examples
NOTE: These Stata Merge options are not represented in SDTL:
noreport
nolabel
nonotes
sorted
==================== EXAMPLE 1 ====================================
use "mergedat1.dta", clear
merge 1:1 _n using "mergedat4.dta"
list _all
{"command": "MergeDatasets",
"mergeFiles": [
"mergeFileDescription":
{"fileName": "Active file",
"mergeType": "Sequential",
"newRow": TRUE,
"mergeFlagVariable":"_merge"},
"mergeFileDescription":
{"fileName": "mergedat4.dta",
"mergeType": "Sequential",
"newRow": TRUE}
]
},
{"$type": "SetValueLabels",
"command": "SetValueLabels",
"variables": [
{"$type": "VariableSymbolExpression",
"variableName", "_merge"}
],
"labels": [
{"value": 1, "label": "master"}
{"value": 2, "label": "using"}
{"value": 3, "label": "match"}
{"value": 4, "label": "match_update"}
{"value": 5, "label": "match_conflict"}
]
}
==================== EXAMPLE 2 ====================================
use "mergedat1.dta", clear
merge 1:1 id using "mergedat3b.dta" , update gener(matchVar)
list _all
{"command": "MergeDatasets",
"MergeFiles": [
"MergeFileDescription":
{"fileName": "Active Dataframe",
"mergeType": "1:1",
"update": "Master",
"mergeFlagVariable":"matchVar",
"newRow": TRUE},
"MergeFileDescription":
{"fileName": "mergedat3c.dta",
"mergeType": "1:1",
"update": "UpdateMissing",
"newRow":TRUE}
],
"MergeByVariables": {"$type": "VariableSymbolExpression",
"VariableName":"id"}
},
{"$type": "SetValueLabels",
"command": "SetValueLabels",
"variables": [
{"$type": "VariableSymbolExpression",
"variableName", "matchVar"}
],
"labels": [
{"value": 1, "label": "master"}
{"value": 2, "label": "using"}
{"value": 3, "label": "match"}
{"value": 4, "label": "match_update"}
{"value": 5, "label": "match_conflict"}
]
}
==================== EXAMPLE 3 ====================================
use "mergedat1.dta", clear
merge 1:1 id using "mergedat3b.dta" , update replace keepusing(lastname)
{"command": "MergeDatasets",
"$type": "MergeDatasets",
"mergeByVariables":{"$type": "VariableSymbolExpression",
"variableName":"id"},
"mergeFiles": [
"mergeFileDescription":
{"mileName": "Active file",
"mergeType": "1:1",
"update": "UpdateMissing",
"mergeFlagVariable":"matchVar",
"newRow":"False"},
"mergeFileDescription":
{"fileName": "mergedat3b.dta",
"mergeType": "1:1",
"update": "Master",
"newRow":"True",
"keepVariables":{"$type": "VariableSymbolExpression",
"variableName":"lastname"}
}
]
},
{"$type": "SetValueLabels",
"command": "SetValueLabels",
"variables": [
{"$type": "VariableSymbolExpression",
"variableName", "_merge"}
],
"labels": [
{"value": 1, "label": "master"}
{"value": 2, "label": "using"}
{"value": 3, "label": "match"}
{"value": 4, "label": "match_update"}
{"value": 5, "label": "match_conflict"}
]
}