Properties and Options of MergeFileDescription

MergeDatasets
-------------

Merges datasets holding overlapping cases but different variables.  The merge may be controlled by keys or grouping variables.


Properties
~~~~~~~~~~

.. csv-table::
   :header: "Name","Type","","Description"
   :widths: 15,10,5,100

   "MergeFiles",":doc:`/composite-types/MergeFileDescription/index`","2..n","Description of files to be merged."
   "MergeByVariables",":doc:`/composite-types/VariableReferenceBase/index`","0..1","A variable or list of variables that acts as the unique case identifier across datasets.  If MergeByVariables is absent, MergeType must be ""sequential"" on all files."
   "FirstVariable","`string <https://cogsdata.org/docs/modeler-guide/primitive-types/#string>`_","0..1","The name of a variable set to 1 for the first row of each group of cases with the same value for the MergeByVariables variables and set to  0 for all other rows."
   "LastVariable","`string <https://cogsdata.org/docs/modeler-guide/primitive-types/#string>`_","0..1","The name of a variable set to 1 for the last row of each group of cases with the same value for the MergeByVariables variables and set to  0 for all other rows."

Properties Inherited from TransformBase
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. csv-table::
   :header: "Name","Type","","Description"
   :widths: 15,10,5,100

   "ProducesDataframe",":doc:`/composite-types/DataframeDescription/index`","0..n","Signify the dataframe which this transform produces."
   "ConsumesDataframe",":doc:`/composite-types/DataframeDescription/index`","0..n","Signify the dataframe which this transform acts upon."

Properties Inherited from CommandBase
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. csv-table::
   :header: "Name","Type","","Description"
   :widths: 15,10,5,100

   "Command","`string <https://cogsdata.org/docs/modeler-guide/primitive-types/#string>`_","1..1","The type of command"
   "SourceInformation",":doc:`/composite-types/SourceInformation/index`","0..n","Information about the source of the command."
   "MessageText","`string <https://cogsdata.org/docs/modeler-guide/primitive-types/#string>`_","0..n","Adds a message that can be displayed with the command."


Item Type Hierarchy
~~~~~~~~~~~~~~~~~~~

* :doc:`/composite-types/CommandBase/index`
    * :doc:`/composite-types/TransformBase/index`
        * **MergeDatasets**


Relationships
~~~~~~~~~~~~~
The following identified item types reference this type.

.. container:: image

   |stub|

.. |stub| image:: ../../images/MergeDatasets.svg

Merge_options
~~~~~~~~~~~~~

.. raw:: html

   <h1>Properties and Options of MergeFileDescription</h1>
   <p>+----------------------+----------------------------------------------+
   |Property name         | Description                                  |
   +======================+==============================================+
   | FileName             | The names of the files to be merged.         |
   |                      | &quot;Active file&quot; means the file current       |
   |                      | active dataset.                              |
   +----------------------+----------------------------------------------+
   |  _                   |                                              |
   +----------------------+----------------------------------------------+
   | MergeType            | Describes the type of merge performed.       |
   +----------------------+----------------------------------------------+
   |                      | &gt; Sequential: Match rows from each input     |
   |                      | &gt; dataframe in sequential order.             |
   |                      | &gt;                                            |
   |                      | &gt; OneToOne: Create one row for each value of |
   |                      | &gt; the <strong>mergeByVariables</strong>. If a combination |
   |                      | &gt; of the <strong>mergeByVariables</strong> is repeated,   |
   |                      | &gt; only one row is matched. Rows with         |
   |                      | &gt; repeated combinations of the               |
   |                      | &gt; MergeByVariables may or may not be         |
   |                      | &gt; included in the output file depending on   |
   |                      | &gt; the <strong>newRow</strong> property.                   |
   |                      | &gt;                                            |
   |                      | &gt; OneToMany: Create a row in the output      |
   |                      | &gt; dataframe by matching rows in this         |
   |                      | &gt; dataframe to every row in other dataframes |
   |                      | &gt; with the same value of MergeByVariables.   |
   |                      | &gt; Note that OneToMany implies that one of    |
   |                      | &gt; the other input datarames is set to        |
   |                      | &gt; ManyToOne.                                 |
   |                      | &gt;                                            |
   |                      | &gt; ManyToOne: Create a row in the output      |
   |                      | &gt; dataframe by matching all rows in this     |
   |                      | &gt; dataframe to the one row in the other      |
   |                      | &gt; dataframe with the same value of           |
   |                      | &gt; MergeByVariables.                          |
   |                      | &gt;                                            |
   |                      | &gt; Cartesian: Create a new row in the output  |
   |                      | &gt; dataframe for every possible combination   |
   |                      | &gt; of rows having the same value of           |
   |                      | &gt; MergeByVariables. This is equivalent to a  |
   |                      | &gt; many to many merge. R and Python use a     |
   |                      | &gt; model derived from SQL, which is based on  |
   |                      | &gt; Cartesian joins.                           |
   |                      | &gt;                                            |
   |                      | &gt; Unmatched: Create a new row for every row  |
   |                      | &gt; that cannot be matched on the              |
   |                      | &gt; MergeByVariables                           |
   |                      | &gt;                                            |
   |                      | &gt; SASmatchMerge: SAS uses a merging approach |
   |                      | &gt; that combines matching keys and sequential |
   |                      | &gt; merges within groups.                      |
   +----------------------+----------------------------------------------+
   | MergeFlagVariable    | Creates a new variable indicating whether    |
   |                      | the row came from this file or a different   |
   |                      | input file.                                  |
   +----------------------+----------------------------------------------+
   | RenameVariables      | Variables to be renamed                      |
   +----------------------+----------------------------------------------+
   | _                    |                                              |
   +----------------------+----------------------------------------------+
   | Update               | Describes outcome when a variable exists in  |
   |                      | both this file and another file.             |
   +----------------------+----------------------------------------------+
   |                      | &gt; Master: This dataframe is the Master       |
   |                      | &gt; dataframe.                                 |
   |                      | &gt;                                            |
   |                      | &gt; Ignore: If a column with the same name     |
   |                      | &gt; exists in the Master dataframe, ignore the |
   |                      | &gt; values in this dataframe.                  |
   |                      | &gt;                                            |
   |                      | &gt; FillNew: If a column with the same name    |
   |                      | &gt; exists in the Master dataframe, use the    |
   |                      | &gt; values from this dataframe only in new     |
   |                      | &gt; rows created from this dataframe.          |
   |                      | &gt;                                            |
   |                      | &gt; UpdateMissing: If a column with the same   |
   |                      | &gt; name exists in the Master dataframe, use   |
   |                      | &gt; values from this dataframe when the value  |
   |                      | &gt; in the Master dataframe is missing. Rows   |
   |                      | &gt; not in the Master dataframe are filled     |
   |                      | &gt; from this dataframe.                       |
   |                      | &gt;                                            |
   |                      | &gt; Replace: If a column with the same name    |
   |                      | &gt; exists in the Master dataframe, use values |
   |                      | &gt; from this dataframe.                       |
   +----------------------+----------------------------------------------+
   | NewRow               | When TRUE, generates a new row when not      |
   |                      | matched to other files                       |
   +----------------------+----------------------------------------------+
   | KeepVariables        | List of variables to keep                    |
   +----------------------+----------------------------------------------+
   | DropVariables        | List of variables to drop                    |
   +----------------------+----------------------------------------------+
   | KeepCasesCondition   | Logical condition for keeping rows.          |
   +----------------------+----------------------------------------------+
   | DropCasesCondition   | Logical condition for dropping rows.         |
   +----------------------+----------------------------------------------+
   | MergeByNames         | &gt; An ordered list of variables used as keys  |
   |                      | &gt; in this file to be matched to the          |
   |                      | &gt; variables in the mergeByVariables property |
   |                      | &gt; of the MergeDatasets command. This         |
   |                      | &gt; property is only used when the key         |
   |                      | &gt; variables in this file have different      |
   |                      | &gt; names than the variable names listed in    |
   |                      | &gt; the MergeDatasets command.                 |
   +----------------------+----------------------------------------------+</p>

SPSS_merge_examples
~~~~~~~~~~~~~~~~~~~

.. raw:: html

   <p>====================  EXAMPLE 1   ====================================</p>
   <pre><code>MATCH FILES  
       /FILE='merge_1.sav'
      /file='merge_2.sav'
       .
   
   
   {&quot;command&quot;: &quot;MergeDatasets&quot;,
   	&quot;$type&quot;: &quot;MergeDatasets&quot;,
       &quot;MergeFiles&quot;: [
       &quot;mergeFileDescription&quot;:
           {&quot;fileName&quot;: &quot;merge_1.sav&quot;,
           &quot;mergeType&quot;: &quot;Sequential&quot;,
   		&quot;newRow&quot;: TRUE
           },
       &quot;MergeFileDescription&quot;:
           {&quot;fileName&quot;: &quot;merge_2.sav&quot;,
           &quot;mergeType&quot;: &quot;Sequential&quot;
   		&quot;newRow&quot;: TRUE
           }
           ]
       }
   </code></pre>
   <p>====================  EXAMPLE 2   ====================================</p>
   <pre><code>MATCH FILES  
       /FILE='merge_1.sav'
      /in=from_f1
      /file='merge_3.sav'
      /in=from_f3
      /RENAME= (VAR3=VARx)
      /KEEP= id VAR2 VARx
      /by id
      /first=firstvar
      /last=lastvar
     .
   </code></pre>
   <pre><code>  
   {&quot;command&quot;: &quot;MergeDatasets&quot;,
   	&quot;$type&quot;: &quot;MergeDatasets&quot;,
       &quot;mergeByVariables&quot;: [ {&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
                               &quot;VariableName&quot;:&quot;id&quot;}      ],
       &quot;firstVariable&quot;: &quot;firstvar&quot;,
       &quot;lastVariable&quot;: &quot;lastvar&quot;,
   	&quot;mergeFiles&quot;: [
   	   &quot;mergeFileDescription&quot;:
   			{&quot;fileName&quot;: &quot;merge_1.sav&quot;,
   			&quot;mergeType&quot;: &quot;OneToOne&quot;,
   			&quot;mergeFlagVariable&quot;:&quot;from_f1&quot;,
   			&quot;renameVariable&quot;:[&quot;RenamePair&quot;:
   				{&quot;OldVariable&quot;:&quot;VAR3&quot;,&quot;NewVariable&quot;:&quot;VARx&quot;}  ],
   			&quot;newRow&quot;: TRUE		
   			},
   		&quot;mergeFileDescription&quot;:
   			{&quot;fileName&quot;: &quot;merge_3.sav&quot;,
   			&quot;mergeType&quot;: &quot;OneToOne&quot;,
   			&quot;mergeFlagVariable&quot;:&quot;from_f3&quot;,
   			&quot;newRow&quot;: FALSE
   			}
           },
   {&quot;command&quot;: &quot;KeepVariables&quot;,
   		&quot;$type&quot;: &quot;KeepVariables&quot;,
   		&quot;variables&quot;: {&quot;$type&quot;: &quot;VariableListExpression&quot;,
   				&quot;variables&quot;:
   					[ {&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
   						&quot;VariableName&quot;:&quot;id&quot;},
   						{&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
   						&quot;VariableName&quot;:&quot;VAR2&quot;},
   						{&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
   						&quot;VariableName&quot;:&quot;VARx&quot;}
   						]
   					},
   		&quot;messageText&quot;: &quot;NOTE: This KeepVariables command is after the MergeDatasets command, because it applies to the output dataframe.&quot;
   		}
   </code></pre>

Stata_merge_examples
~~~~~~~~~~~~~~~~~~~~

.. raw:: html

   <pre><code>NOTE:  These Stata Merge options are not represented in SDTL:  
   	noreport   
   	nolabel   
   	nonotes   
   	sorted   
   </code></pre>
   <p>====================  EXAMPLE 1   ====================================</p>
   <pre><code>use &quot;mergedat1.dta&quot;, clear
   merge 1:1 _n using &quot;mergedat4.dta&quot;
   list _all
   
   {&quot;command&quot;: &quot;MergeDatasets&quot;,
       &quot;mergeFiles&quot;: [
       &quot;mergeFileDescription&quot;:
           {&quot;fileName&quot;: &quot;Active file&quot;,
           &quot;mergeType&quot;: &quot;Sequential&quot;,
           &quot;newRow&quot;: TRUE,
           &quot;mergeFlagVariable&quot;:&quot;_merge&quot;},
       &quot;mergeFileDescription&quot;:
           {&quot;fileName&quot;: &quot;mergedat4.dta&quot;,
           &quot;mergeType&quot;: &quot;Sequential&quot;,
           &quot;newRow&quot;: TRUE}
           ]
       },
   {&quot;$type&quot;: &quot;SetValueLabels&quot;,
   	&quot;command&quot;: &quot;SetValueLabels&quot;,
   	&quot;variables&quot;: [
   			{&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
   				&quot;variableName&quot;, &quot;_merge&quot;}
   			],		
   	&quot;labels&quot;: [
   			{&quot;value&quot;: 1,	&quot;label&quot;: &quot;master&quot;}
   			{&quot;value&quot;: 2,	&quot;label&quot;: &quot;using&quot;}
   			{&quot;value&quot;: 3,	&quot;label&quot;: &quot;match&quot;}
   			{&quot;value&quot;: 4,	&quot;label&quot;: &quot;match_update&quot;}
   			{&quot;value&quot;: 5,	&quot;label&quot;: &quot;match_conflict&quot;}
   			]
   	}	
   </code></pre>
   <p>====================  EXAMPLE 2   ====================================</p>
   <pre><code>use &quot;mergedat1.dta&quot;, clear
   merge 1:1 id using &quot;mergedat3b.dta&quot; ,  update  gener(matchVar)
   list _all
   
   {&quot;command&quot;: &quot;MergeDatasets&quot;,
       &quot;MergeFiles&quot;: [
       &quot;MergeFileDescription&quot;:
           {&quot;fileName&quot;: &quot;Active Dataframe&quot;,
           &quot;mergeType&quot;: &quot;1:1&quot;,
           &quot;update&quot;: &quot;Master&quot;,
           &quot;mergeFlagVariable&quot;:&quot;matchVar&quot;,
           &quot;newRow&quot;: TRUE},
       &quot;MergeFileDescription&quot;:
           {&quot;fileName&quot;: &quot;mergedat3c.dta&quot;,
           &quot;mergeType&quot;: &quot;1:1&quot;,
           &quot;update&quot;: &quot;UpdateMissing&quot;,
           &quot;newRow&quot;:TRUE}
           ],      
       &quot;MergeByVariables&quot;: {&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
                               &quot;VariableName&quot;:&quot;id&quot;}
       },
   {&quot;$type&quot;: &quot;SetValueLabels&quot;,
   	&quot;command&quot;: &quot;SetValueLabels&quot;,
   	&quot;variables&quot;: [
   			{&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
   				&quot;variableName&quot;, &quot;matchVar&quot;}
   			],		
   	&quot;labels&quot;: [
   			{&quot;value&quot;: 1,	&quot;label&quot;: &quot;master&quot;}
   			{&quot;value&quot;: 2,	&quot;label&quot;: &quot;using&quot;}
   			{&quot;value&quot;: 3,	&quot;label&quot;: &quot;match&quot;}
   			{&quot;value&quot;: 4,	&quot;label&quot;: &quot;match_update&quot;}
   			{&quot;value&quot;: 5,	&quot;label&quot;: &quot;match_conflict&quot;}
   			]
   	}
   </code></pre>
   <p>====================  EXAMPLE 3   ====================================</p>
   <pre><code>use &quot;mergedat1.dta&quot;, clear
   merge 1:1 id using &quot;mergedat3b.dta&quot; , update replace  keepusing(lastname) 
   
   
   {&quot;command&quot;: &quot;MergeDatasets&quot;,
   	&quot;$type&quot;: &quot;MergeDatasets&quot;,
       &quot;mergeByVariables&quot;:{&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
                               &quot;variableName&quot;:&quot;id&quot;},
       &quot;mergeFiles&quot;: [
       &quot;mergeFileDescription&quot;:
           {&quot;mileName&quot;: &quot;Active file&quot;,
           &quot;mergeType&quot;: &quot;1:1&quot;,
           &quot;update&quot;: &quot;UpdateMissing&quot;,
           &quot;mergeFlagVariable&quot;:&quot;matchVar&quot;,
           &quot;newRow&quot;:&quot;False&quot;},
       &quot;mergeFileDescription&quot;:
           {&quot;fileName&quot;: &quot;mergedat3b.dta&quot;,
           &quot;mergeType&quot;: &quot;1:1&quot;,
           &quot;update&quot;: &quot;Master&quot;,
           &quot;newRow&quot;:&quot;True&quot;,
           &quot;keepVariables&quot;:{&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
                               &quot;variableName&quot;:&quot;lastname&quot;}
           }
           ]
       },
   {&quot;$type&quot;: &quot;SetValueLabels&quot;,
   	&quot;command&quot;: &quot;SetValueLabels&quot;,
   	&quot;variables&quot;: [
   			{&quot;$type&quot;: &quot;VariableSymbolExpression&quot;,
   				&quot;variableName&quot;, &quot;_merge&quot;}
   			],		
   	&quot;labels&quot;: [
   			{&quot;value&quot;: 1,	&quot;label&quot;: &quot;master&quot;}
   			{&quot;value&quot;: 2,	&quot;label&quot;: &quot;using&quot;}
   			{&quot;value&quot;: 3,	&quot;label&quot;: &quot;match&quot;}
   			{&quot;value&quot;: 4,	&quot;label&quot;: &quot;match_update&quot;}
   			{&quot;value&quot;: 5,	&quot;label&quot;: &quot;match_conflict&quot;}
   			]
   	}
   </code></pre>