|
Chapter Contents |
Previous |
Next |
| The NETFLOW Procedure |
This statement invokes the procedure. The following options and the options listed with the RESET statement can appear in the PROC NETFLOW statement.
The options available with the PROC NETFLOW statement are summarized by purpose in Table 4.18.
Table 4.18: Functional Summary, PROC NETFLOW statement| Description | Statement | Option |
| Input Data Set Options | ||
| arcs input data set | NETFLOW | ARCDATA= |
| nodes input data set | NETFLOW | NODEDATA= |
| constraint input data set | NETFLOW | CONDATA= |
| Output Data Set Options | ||
| unconstrained primal solution data set | NETFLOW | ARCOUT= |
| unconstrained dual solution data set | NETFLOW | NODEOUT= |
| constrained primal solution data set | NETFLOW | CONOUT= |
| constrained dual solution data set | NETFLOW | DUALOUT= |
| Options for Networks | ||
| default arc cost | NETFLOW | DEFCOST= |
| default arc capacity | NETFLOW | DEFCAPACITY= |
| default arc lower flow bound | NETFLOW | DEFMINFLOW= |
| network's only supply node | NETFLOW | SOURCE= |
| SOURCE's supply capability | NETFLOW | SUPPLY= |
| network's only demand node | NETFLOW | SINK= |
| SINK's demand | NETFLOW | DEMAND= |
| excess supply or demand is conveyed through network | NETFLOW | THRUNET |
| find maximal flow between SOURCE and SINK | NETFLOW | MAXFLOW |
| cost of bypass arc when solving MAXFLOW problem | NETFLOW | BYPASSDIV= |
| find shortest path from SOURCE to SINK | NETFLOW | SHORTPATH |
| Miscellaneous Options | ||
| infinity value | NETFLOW | INFINITY= |
| do constraint row and/or nonarc variable column coefficient scaling, or neither | NETFLOW | SCALE= |
| maximization instead of minimization | NETFLOW | MAXIMIZE |
| use warm start solution | NETFLOW | WARM |
| all-artificial starting solution | NETFLOW | ALLART |
| Data Set Read Options | ||
| CONDATA has sparse data format | NETFLOW | SPARSECONDATA |
| default constraint type | NETFLOW | DEFCONTYPE= |
| special COLUMN variable value | NETFLOW | TYPEOBS= |
| special COLUMN variable value | NETFLOW | RHSOBS= |
| is used to interpret arc and nonarc variable names in the CONDATA | NETFLOW | NAMECTRL= |
| no new nonarc variables | NETFLOW | SAME_NONARC_DATA |
| no nonarc data in the ARCDATA | NETFLOW | ARCS_ONLY_ARCDATA |
| data for an arc found in only one obs of ARCDATA | NETFLOW | ARC_SINGLE_OBS |
| data for an constraint found in only one obs of CONDATA | NETFLOW | CON_SINGLE_OBS |
| data for a coefficient found once in CONDATA | NETFLOW | NON_REPLIC= |
| data is grouped, exploited during data read | NETFLOW | GROUPED= |
| Problem Size (approx.) Options | ||
| number of nodes | NETFLOW | NNODES= |
| number of arcs | NETFLOW | NARCS= |
| number of nonarc variables | NETFLOW | NNAS= |
| number of coefficients | NETFLOW | NCOEFS= |
| number of constraints | NETFLOW | NCONS= |
| Memory Control Options | ||
| issue memory usage messages to SASLOG | NETFLOW | MEMREP |
| number of bytes to use for main memory | NETFLOW | BYTES= |
| proportion of memory used by frequently accessed arrays | NETFLOW | COREFACTOR= |
| memory allocated for LU factors | NETFLOW | DWIA= |
| linked list for updated column | NETFLOW | SPARSEP2 |
| use 2-dimensional array instead of LU factor for basis matrix | NETFLOW | INVD_2D |
| maximum bytes for a single array | NETFLOW | MAXARRAYBYTES= |
| Interior Point algorithm Options | ||
| use Interior Point algorithm | NETFLOW | INTPOINT |
The following options can be specified only in the PROC NETFLOW statement and are relevant to the start of the procedure. Once specified, they cannot be changed.
The use of the NODEDATA= data set is optional in the PROC NETFLOW statement provided that, if the NODEDATA= data set is not used, supply and demand details are specified by other means. Other means include using the MAXFLOW or SHORTPATH option, SUPPLY or DEMAND list variables (or both) in the ARCDATA= data set, and the SOURCE=, SUPPLY=, SINK=, or DEMAND= option in the PROC NETFLOW statement.
If you specify ARC_SINGLE_OBS, PROC NETFLOW automatically works as if GROUPEDnetflowgrouped=ARCDATA is also specified. See the "How to Make the Data Read of PROC NETFLOW More Efficient" section.
If there are no nonzero costs of arcs in the MAXFLOW problem, the cost of the bypass arc is set to 1.0 (-1.0 if maximizing) if you do not specify the BYPASSDIV= option. The reduced costs in the ARCOUT= data set and the CONOUT= data set will correctly reflect the value that would be added to the maximal flow if the capacity of the arc is increased by one unit. If there are nonzero costs, or if you specify the BYPASSDIV= option, the reduced costs may be contaminated by the cost of the bypass arc and no economic interpretation can be given to reduced cost values. The default value for the BYPASSDIV= option (in the presence of nonzero arc costs) is 100.0.
PROC NETFLOW uses more memory than the main working memory. The additional memory requirements cannot be determined at the time when the main working memory is allocated. For example, every time an output data set is created, some additional memory is required. Do not specify a value for the BYTES= option equal to the size of available memory.
If CONDATA has a sparse format, and data for each arc and nonarc variablecan be found in only one observation of CONDATA, then specify the CON_SINGLE_OBS option. If there are n SAS variables in the ROW and COEF list, then each arc or nonarc can have at most n constraint coefficients in the model. See the "How to Make the Data Read of PROC NETFLOW More Efficient" section.
Some of the arrays and buffers used during constrained optimization either vary in size, are not required as frequently as other arrays, or are not required throughout the Simplex iteration. Let a be the amount of memory in bytes required to store frequently accessed arrays of nonvarying size. Specify the MEMREP option in the PROC NETFLOW statement to get the value for a and a report of memory usage. If the size of the main working memory BYTES=b multiplied by COREFACTOR=c is greater than a, PROC NETFLOW keeps the frequently accessed arrays of nonvarying size resident in core throughout the optimization. If the other arrays cannot fit into core, they are paged in and out of the remaining part of the main working memory.
If b multiplied by c is less than a, PROC NETFLOW uses a different memory scheme. The working memory is used to store only the arrays needed in the part of the algorithm being executed. If necessary, these arrays are read from disk into the main working area. Paging, if required, is done for all these arrays, and sometimes information is written back to disk at the end of that part of the algorithm. This memory scheme is not as fast as the other memory schemes. However, problems can be solved with memory that is too small to store every array.
PROC NETFLOW is capable of solving very large problems in a modest amount of available memory. However, as more time is spent doing input/output operations, the speed of PROC NETFLOW decreases. It is important to choose the value of the COREFACTOR= option carefully. If COREFACTOR is too small, the memory scheme that needs to be used might not be as efficient as another that could have been used had a larger COREFACTOR been specified. If COREFACTOR is too large, too much of the main working memory is occupied by the frequently accessed, nonvarying sized arrays, leaving too little for the other arrays. The amount of input/output operations for these other arrays can be so high that another memory scheme might have been used more beneficially.
The valid values of COREFACTOR=c are between 0.0 and 0.95, inclusive. The default value for c is 0.75 when there are over 200 side constraints, and 0.9 when there is only one side constraint. When the problem has between 2 and 200 constraints, the value of c lies between the two points (1, 0.9) and (201, 0.75).
Occasionally, it is necessary to compress the U factor so that it again occupies contiguous memory. Specifying too large a value for DWIA means that more memory is required by PROC NETFLOW. This might cause more expensive memory mechanisms to be used than if a smaller but adequate value had been specified for DWIA=. Specifying too small a value for the DWIA= option can make time-consuming compressions more numerous. The default value for the DWIA= option is eight times the number of side constraints.
If the CONDATA= data set has a dense format, GROUPED=CONDATA indicates that the CONDATA= data set has been grouped by values of the ROW list variable. If _ROW_ is the name of the ROW list variable, you could use PROC SORT DATA=CONDATA;BY _ROW_; prior to calling PROC NETFLOW. Technically, you do not have to sort the data, only ensure that all similar values of the ROW list variable are grouped together. If you specify the CON_SINGLE_OBS option, or if there is no ROW list variable, PROC NETFLOW automatically works as if GROUPED=CONDATA has been specified.
If CONDATA has the sparse format, GROUPED=CONDATA indicates that CONDATA has been grouped by values of the COLUMN list variable. If _COL_ is the name of the COLUMN list variable, you could use PROC SORT DATA=CONDATA;BY _COL_; prior to calling PROC NETFLOW. Technically, you do not have to sort the data, only ensure that all similar values of the COLUMN list variable are grouped together.
A data set like
... _XXXXX_ ....
bbb
bbb
aaa
ccc
ccc
is a candidate for the GROUPED= option.
Similar values are grouped together. When PROC NETFLOW is reading the ith observation,
either the value of the _XXXXX_ variable is the same as the (i-1)th
(that is, the previous observation's) _XXXXX_ value, or it is a new _XXXXX_ value
not seen in any previous observation. This also means that if the
ith _XXXXX_ value is different from the (i-1)th _XXXXX_ value,
the value of the (i-1)th _XXXXX_ variable will not be seen in any
observations i, i+1, ... .
If INVD_2D is not specified, lower (L ) and upper (U ) factors of the working basis matrix are used. U is an upper triangular matrix and L is a lower triangular matrix corresponding to a sequence of elementary matrix row operations. The sparsity-exploiting variant of the Bartels-Golub decomposition is used to update the LU factors. This scheme works well when the side constraint coefficient matrix is sparse or when many side constraints are nonbinding.
There is one array that contains information about nodes and the network basis spanning tree description. This tree description enables computations involving the network part of the basis to be performed very quickly and is the reason why PROC NETFLOW is more suited to solving constrained network problems than PROC LP. It is beneficial that this array be stored in core when possible, otherwise this array must be paged, slowing down the computations. Try not to specify a MAXARRAYBYTES=m value smaller than the amount of memory needed to store the main node array. You are told what this memory amount is on the SAS log if you specify the MEMREP option in the PROC NETFLOW statement.
You can use the MAXFLOW option when solving any flow problem (not necessarily a maximum flow problem) when the network has one supply node (with infinite supply) and one demand node (with infinite demand). The MAXFLOW option can be used in conjunction with all other options (except SHORTPATH, SUPPLY=, and DEMAND=) and capabilities of PROC NETFLOW.
In the CONDATA= data set, if the dense data format is used, (described in the "CONDATA= Data Set" section) a name of an arc or a nonarc variable is the name of a SAS variable listed in the VAR list specification. If the sparse data format of the CONDATA= data set is used, a name of an arc or a nonarc variable is a value of the SAS variable listed in the COLUMN list specification. The NAMECTRL= option is used when a name of an arc or nonarc variable in the CONDATA= data set (either a VAR list SAS variable name or value of the COLUMN list SAS variable) is in the form tail_head and there exists an arc with these end nodes. If tail_head has not already been tagged as belonging to an arc or nonarc variable in the ARCDATA= data set, PROC NETFLOW needs to know whether tail_head is the name of the arc or the name of a nonarc variable.
If you specify NAMECTRL=1, a name that is not defined in the ARCDATA= data set is assumed to be the name of a nonarc variable. NAMECTRL=2 treats tail_head as the name of the arc with these endnodes, provided no other name is used to associate data in the CONDATA= data set with this arc. If the arc does have other names that appear in the CONDATA= data set, tail_head is assumed to be the name of a nonarc variable. If you specify NAMECTRL=3, tail_head is assumed to be a name of the arc with these end nodes, whether the arc has other names or not. The default value of NAMECTRL is 3. Note that if you use the dense side constraint input format, the default arc name tail_head is not recognized (regardless of the NAMECTRL value) unless the head node and tail node names contain no lowercase letters.
If the dense format is used for the CONDATA= data set, the SAS System converts SAS variable names in a SAS program to uppercase. The VAR list variable names are uppercased. Because of this, PROC NETFLOW automatically uppercases names of arcs and nonarc variables (the values of the NAME list variable) in the ARCDATA= data set. The names of arcs and nonarc variables (the values of the NAME list variable) appear uppercased in the ARCOUT= data set and the CONOUT= data set, and in the PRINT statement output.
Also, if the dense format is used for the CONDATA= data set, be careful with default arc names (names in the form tailnode_headnode). Node names (values in the TAILNODE and HEADNODE list variables) in the ARCDATA= data set are not uppercased by PROC NETFLOW. Consider the following code:
data arcdata; input _from_ $ _to_ $ _name $ ; datalines; from to1 . from to2 arc2 TAIL TO3 . ; data densecon; input from_to1 from_to2 arc2 tail_to3; datalines; 2 3 5 ; proc netflow arcdata=arcdata condata=densecon; run;The SAS System does not uppercase character string values. PROC NETFLOW never uppercases node names, so the arcs in observations 1, 2, and 3 in the preceeding ARCDATA= data set have the default names "from_to1", "from_to2", and "TAIL_TO3", respectively. When the dense format of the CONDATA= data set is used, PROC NETFLOW does uppercase values of the NAME list variable, so the name of the arc in the second observation of the ARCDATA= data set is "ARC2". Thus, the second arc has two names; it's default "from_to2" and the other that was specified "ARC2".
As the SAS System does uppercases program code, you must think of the input statement
input from_to1 from_to2 arc2 tail_to3;as really being
INPUT FROM_TO1 FROM_TO2 ARC2 TAIL_TO3;The SAS variables named "FROM_TO1" and "FROM_TO2" are not associated with any of the arcs in the preceeding ARCDATA= data set. The values "FROM_TO1" and "FROM_TO2" are different from all of the arc names "from_to1", "from_to2", "TAIL_TO3", and "ARC2". "FROM_TO1" and "FROM_TO2" could end up being the names of two nonarc variables. It is sometimes useful to specify PRINT NONARCS; before commencing optimization to ensure that the model is correct (has the right set of nonarc variables).
The SAS variable named "ARC2" is the name of the second arc in the ARCDATA= data set, even though the name specified in the ARCDATA= data set looks like "arc2". The SAS variable named "TAIL_TO3" is the default name of the third arc in the ARCDATA= data set.
If you use an unconstrained warm start and SAME_NONARC_DATA is not specified, any nonarc variable objective function coefficient, upper bound, or lower bound can be changed. Any nonarc variable data in the CONDATA= data set overrides (without warning messages) corresponding data in the ARCDATA= data set. You can possibly introduce new nonarc variables to the problem, that is, nonarc variables that were not in the problem when the warm start was generated.
SAME_NONARC_DATA should be specified if nonarc variable data in the CONDATA= data set are to be deliberately ignored. Consider
proc netflow options arcdata=arc0 nodedata=node0
condata=con0
/* this data set has nonarc variable */
/* objective function coefficient data */
future1 arcout=arc1 nodeout=node1;
run;
data arc2;
reset arc1; /* this data set has nonarc variable obs */
if _cost_<50.0 then _cost_=_cost_*1.25;
/* some objective coefficients of nonarc */
/* variable might be changed */
proc netflow options
warm arcdata=arc2 nodedata=node1
condata=con0 same_nonarc_data
/* This data set has old nonarc variable */
/* obj, fn. coefficients. same_nonarc_data */
/* indicates that the "new" coefs in the */
/* arcdata=arc2 are to be used. */
RUN;
If a network has one supply node (with supply of one unit) and one demand node (with demand of one unit), you could specify the SHORTPATH option, with the SOURCE= and SINK= nodes, even if the problem is not a shortest path problem. You then should not provide any supply or demand data in the NODEDATA= data set or the ARCDATA= data set.
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.