A Maxent Predictive Model for Hunter-Gatherer Sites in the Southern Pampas, Argentina

The following paper presents the results of a Species Distribution Model (SDM) for grassland hunter-gatherer archaeology sites in the southern Pampas region of Argentina. The goal of this exploratory model is to provide a complementary survey model for the detection of archaeological sites in this region, which will also help characterize and discuss site locations and regional distribution patterns of hunter-gatherer occupations. Even in this largely homogenous and highly dynamic landscape, SDMs can help guide archaeological surveys by identifying some environmental variables affecting hunter-gatherer decisions, and can provide insights into mobility and archaeological settlement patterns. Among the available tools for SDM, Maximum Entropy Modeling (Maxent) is one of the most widely used approaches in archaeological predictive modelling. After controlling for bias and adjustment of several modifiable parameters, the Maxent software provided a potentially effective predictive model to direct future archaeological survey and heritage management projects. The results of this research suggest that watercourses and slope were the key environmental factors influencing the distribution of hunter-gatherer archaeological sites in the southern Pampas region.


INTRODUCTION
Species Distribution Modeling (SDM) is used to derive spatially explicit predictions of environmental suitability for a species (plant or animal). This is typically achieved by classifying grid cells (also called "background locations") according to the degree in which they are suitable for a species in the study region (Elith & Leathwick 2009;Guisan & Thuiller 2005;Miller 2010;Phillips, Anderson & Schapire 2006). Among the available tools for SDM, Maximum Entropy Modeling (Maxent) is one of the most widely used approaches in ecology for predicting habitat suitability from presence-only records (Phillips, Dudík & Schapire 2021). This method involves using a collection of known species locations as sample data, along with relevant environmental factors to model the distribution of that species within a known geographical extent (Phillips, Dudík & Schapire 2021). Having the ability to derive predictive models from presence-only records is particularly relevant in archaeology, as archaeologists often know where some, but not all, sites are located, and they rarely know for certain where past sites are absent (Howey, Palace & McMichael 2016). Maxent has been broadly used by archaeologists to predict archaeological site locations in different environments and time-periods around the world (Conolly et al. 2012;Franklin et al. 2015;Galletti et al. 2013;Howey, Palace & McMichael 2016;Howey et al. 2020;McMichael et al. 2014;Muttaquin, Heru & Susilo 2019;Noviello et al. 2018;Politis et al. 2011Politis et al. , 2019. When compared to traditional predictive models, such as GIS-based multiparametric spatial analysis (i.e., logistic regression), Maxent has been shown to perform well for archaeology predictive models (Noviello et al. 2018;Wachtel et al. 2018). The Maxent modeling provides an initial platform that is flexible and can be run anytime there is additional input data, such as new archaeology site locations or refined environmental layers.
Few archaeological predictive models have been built for the Pampas region of Argentina. One of the reasons for this is related to the properties of the current landscape, which is largely spatially homogenous, temporally dynamic with variations on a year-to-year basis due to extreme weather conditions (e.g., flooding), and highly affected by modern farming activities. Apart from the hills (sierras), the environmental characteristics of the Pampas grasslands are "less than optimal" for predictive models, which work best in heterogeneous landscapes that are relativity stable over long periods. Although the study area is not ideal, refined high resolution and freely available geo-environmental resource data are available. These, combined with powerful geostatistical software are providing the opportunity to build predictive models in less-than-optimal environments, such as forests with high tree coverage, and wetland areas (Jones 2017). At the local and sub-regional scale, archaeological predictive modeling has been used along the north coast of the Pampas region (Eugenio, Aldazabal & Macchi 2013;Macchi 2010), and in different sectors of the Tandilia hills (Mariano et al. 2014;Mazzia & Gómez, 2013). At the regional scale, a Maxent model was built to infer paleoenvironmental conditions during prehistoric times using multiple environmental variables and the presence data of three native fauna species (i.e., Lama guanicoe, Ozotoceros bezoarticus and Blastocerus dichotomus) (Politis et al. 2011). For the southern Pampas and northeast Patagonia, prediction maps were built to study the distribution of lithic artifacts using geostatistical interpolation tools to discuss issues related with the archaeological lithic landscapes (Barrientos, Catella & Oliva 2015). Finally, just north of the Pampas region, in the Northeast of Argentina, a Maxent predictive model was built for archaeological sites of the Goya-Malabrigo archaeological entity (Politis et al. 2019). This last study, which uses geographically referenced sites for the occurrence data (i.e., species), is the closest in terms of methodology to the present one. All of these studies demonstrated the utility of geospatial software to quantify large archaeological databases containing multiple variables, and to integrate different sources of information, allowing a better understanding of prehistoric hunter-gatherer groups from the Pampas region.
The southern Pampas offered abundant and diverse resources for hunter-gatherer groups, including -but not limited to-large sized game in the inter-hill Pampas (or grasslands), shelter and good quality stones for making tools in the hills (Tandilia and Ventania ranges), and marine faunal resources and secondary lithic raw materials along the Atlantic coast. Evidence of consumption of marine resources has been found in sites at least 50 km from the coast (Politis, Scabuzzo & Tykot 2009); however, groups from the Pampas region mainly consumed terrestrial herbivores, such as the guanaco (Lama guanicoe) and the Pampas deer (Ozotoceros bezoarticus) (Gutiérrez & Martínez, 2008;Martínez et al. 2016). Archaeological data suggests that huntergatherers occupied diverse environments within the annual cycle of mobility, and maintained social networks over large geographic areas, which facilitated access to distant resources and information (Armentano, Martínez & Gutiérrez 2007;Barros 2012;Barros, Messineo & Colantonio 2015;Bayón & Flegenheimer 2004;Martínez et al. 2015;Mazzanti 2006;Politis 2008). For the later part of the Holocene, in relation to a growing increase in population density, data suggests a reduction of residential mobility; though, a hunter-gatherer way of life persisted until the Spanish conquest.
In the immense extent of the southern Pampas, where the ground surface is covered by vegetation and soil formation predominates, with thousands of shallow lakes, multiple levels of streams, and variable changes in elevations; archaeological surveying is challenging.
Traditional survey strategies are usually driven by informants, such as landowners or farmers who come across isolated remains or sites. The riverbanks, streams, and shallow lakes are particularly useful for exposing in-situ archaeological material. During wet periods, the riverbanks and streams erosion expose older deposits, giving archaeologists a window into the buried record, and guiding most archaeological pedestrian survey when no prior information is provided. Along the hills, survey strategies are similar and mainly oriented to the exploration of caves or rock shelters where there is a better preservation of sedimentary deposits. Although animal bone does not preserve well in these sites; most of them present exceptional records of stone tool technology (Flegenheimer, Mazzia & Weitzel 2015;Martínez 2017).
The following paper presents the results of a SDM for hunter-gatherer archaeological sites in the southern Pampas, using the geographical location of archaeological sites as the occurrence data (species). The objective is to build a probability distribution of pre-historic hunter-gatherer sites that helps advance pedestrian surveys towards sectors of the landscape with greater archaeological potential (favorable characteristics for human settlement in the past). The SDM will also help characterize and discuss site locations and regional distribution patterns of hunter-gatherer occupations; and may also serve as a potential model for future heritage management projects. Research on mobility and land use is of great importance for understanding the way of life of the hunter-gatherer societies that inhabited the Pampas region. If human behavior is modeled with respect to the natural and social environment, we can learn a lot about how people interacted and moved in space by observing the relationships between the archaeological record and the environment.

STUDY AREA
One of the major challenges of SDM studies is to appropriately determine the extent of the study area. The study area should be objectively determined to cover accessible areas by the species within its known range, allowing for an assortment of environmental variation and extremes occupied by the species (Barve et al. 2011;Kantner 2008;Sánchez-Fernández, Lobo & Hernández-Manrique 2011). It should also be constrained enough not to introduce a spatial sampling bias. The SDM presented here is designed to capture hunter-gatherer sites from the southern Pampas region of Argentina; specifically, it will be centered on the inter-hill Pampas eco-complex and will capture neighboring areas such as the Sierra Tandilia and Sierra Ventania Hills, as well as parts of the Flooding Pampas, Chained Lakes, and Sandy Pampas; for a total area of approximately 124,740 km 2 (not including the ocean) (Figure 1).   Phillips et al. 2017). There are several discussions as to how to objectively set these parameters; ultimately, we need to have a clear understanding of the specific dataset (species) and knowledge of the spatial distribution of sampling effort (Araújo & Guisan 2006;Pearce & Boyce 2006). Here, I adjusted three parameters to optimize the predictive accuracy: (1) bias correction, (2) feature combination, and (3) the regularization multiplier. The remaining parameters (see Table 1 and Data Availability section) were left with the default software values (Phillips & Dudík 2008). For example, for the replicate type, "cross-validation" was used, which has been shown to work better with small data sets and helps with spatial autocorrelations and overfitting (Merckx et al. 2011;Phillips 2017). In order to use this re-sampling procedure, the value of "random test percentage" must be set to zero.

MAXENT PARAMETERS
When using presence-only occurrence data in SDMs, sampling bias is a serious issue (see discussion in Phillips et al. 2009), especially in archaeology. The only way to remove sampling bias is through study design and/or modelling of that bias (Yackulic et al. 2013: 238). Few archaeological surveys are made using planned random survey coverage strategies (Banning 2021;Orton 2000). Without clear records on survey effort across the landscape, we cannot distinguish between areas that are environmentally unsuitable and those that are under-sampled (Elith, Kearney & Phillips 2010: Appendix S1). Constraints and opportunities of survey design result in many areas that are overrepresented, or alternatively, underrepresented because of accessibility (e.g., distance to towns, roads, land ownership, etc.) or research bias (projects dedicated to studying specific areas of the landscape, well known hotspots, etc.). To help correct sampling bias, Maxent has three main parameter settings. First, the program discards redundant records that occur in a single grid cell; second, 10,000 pseudo-absence sample background points are generated (see Barbet-Massin et al. 2012); and third, an optional bias grid file can be added to the model. Using the location of known archaeological sites, I decided to build a bias grid file in QGIS (QGIS Development Team, 2021), following the methods proposed by Elith, Kearney & Phillips (2010). The map was derived using kernel density estimation, and set to a maximum radius of 10 km around each site. The choice of that radius is based on information of hunter-gatherers daily foraging trips (see : Kelly 1995;Lee 1979;Morgan 2008). This is admittedly variable for different regions and groups, as logistical trips greater than 10 km were likely to occur in some circumstances; however, this method provides a reasonable estimate for the purpose of this exploratory model. The result is a continuous surface layer with values ranging between 1 thru 12.
To avoid overfitting (i.e., matching the input data too closely), Maxent allows the user to adjust the feature classes (also called types) used to build the model (linear, quadratic, product, threshold, and hinge features). The default settings allow certain feature classes to be used based on the sample size of occurrence points. Allowing more feature classes enables a more flexible fit to the observed data; however, higher flexibility has been shown to result is an overfitted model (Hastie, Tibshirani & Friedman 2009;Muscarella et al. 2014;Peterson et al. 2011;Phillips & Dudík 2008). The regularization multiplier (β) also affects how focused the output distribution is. A small regularization multiplier will result in a more localized output (Phillips et al., 2006;Phillips & Dudík 2008). Essentially, adjusting both the regularization multiplier and feature classes will provide the best predictive performance. The only way to know which to use requires multiple runs. Here, I ran all possible combinations of feature classes (n = 15), excluding threshold (which is not recommended for smaller samples), against five regularization multiplier increments (0.25, 0.5, 0.75, 1.0 and 1.25). This led to 75 unique combinations. The optimal parameters were selected based on the lowest Akaike Information Criterion (AICc) with a correction for small sample size value (Muscarella et al. 2014;Warren & Seifert 2011), which I calculated using the executable software ENMTools (Warren, Glor & Turelli 2010;Warren & Seifert 2011). The optimal parameters correspond to the combination of feature classes and a regularization multiplier with the smallest AICc value. A low AICc value estimates the relative importance of variables, the suitability of habitat, and the transferability of the model to different periods (Warren & Seifert, 2011); the latter of which is particularly relevant in archaeology. The AICc value makes a useful criterion to objectively choose appropriate settings for Maxent, and it has also been shown to help reduce overfitting issues and complexity (Li et al. 2020); however, it is not fully related with the predictive accuracy of the SDM (see discussion in Velasco & González-Salazar 2019).
Once the optimal parameters were determined, I made a final run of the model in Maxent with response curves and a jackknife variable response to display and determine how each environmental variable contributed to the model (marginal response) and performed on its own (corresponding response). I ran 30 replicates with cross-validation to bring the average of the outputs to a suitable final model (Morales, Fernández & Baca-González 2017;Spiers et al. 2018). Each replicate randomly removed between 2 to 3 presence locations for testing. To help characterize the model performance, I used the Area Under the receiver operator characteristic Curve (AUC) evaluation metric (Warren & Seifert 2011). The AUC is the probability that a randomly chosen presence site will be ranked above a randomly chosen absence site (Merow, Smith & Silander 2013;Phillips and Dudík 2008). The values range from 0 to 1. A random ranking has on average an AUC of 0.5, and a perfect ranking achieves the best possible AUC of 1.0. Models with values above 0.75 are considered potentially useful (Elith 2000).

SPECIES OCCURRENCE DATA
Georeferenced archaeological sites (n = 66) were used as the presence-only occurrence data (Figure 2 and Data Availability section). The samples used in this model included all pre-contact hunter-gatherer sites dated between 12,240 and 470 14 C years BP. Only sites with radiocarbon dates were used in this model. While site density clearly fluctuates with time, given the low sample size for some time periods, and the mixed nature of much of the regional archaeological record; it was appropriate to begin to understand the patterning from a location-based perspective rather than a period specific approach (Conolly 2018: 197). In four cases, there were no published coordinates, just location descriptions or maps, which I georeferenced in Google Earth (Google Earth Inc. 2020). It was imperative to maintain these sites, not only because they are significant archaeology findings, but also because it is important to maximize the sample size for the SDM (van Proosdij et al. 2016;Wisz et al. 2008). From the total number of sites (n = 76), 10 sites were removed because they share a single cell with a neighboring site, which is part of the bias correction method as described above. Deciding which neighboring site to remove was a random selection.

ENVIRONMENTAL PREDICTORS
Determining which environmental layers to use in any SDM is dependent on spatial scale, time-period, known environmental changes (datasets), and the species under investigation. Bioclimatic variables, such as seasonality (e.g., annual range in temperature and precipitation) and extreme or limiting environmental factors (e.g., temperature of the coldest and warmest month) are key to determine the distribution of many plants and animals (Fick & Hijmans 2017); however, humans are less susceptible to these variables, particularly at a smaller scale of analysis such as the southern Pampas, where humans where likely able to adapt to ongoing changes in their environment (Bettinger 1981;Kelly 1983;Yellen 1977). In the Pampas region, there is evidence of climate shifts during the last ca. 12,000 14 C years BP (known date for the initial arrival of humans). The most dramatic changes occurred during the Late Pleistocene, when dry and cold weather and lower precipitation levels (ca. 100 mm) resulted in a sea-level drop of ca. 60 m below present day (Cione, Tonni, & Soibelzon 2003;Ponce et al. 2011;Quattrocchio et al. 2008;Zárate & Blasi 1991). During the Early to Middle Holocene, a warm period (called Hypsithermal), which lasted from ca. 8000 to 4500 14 C years BP., brought with it more humid conditions accompanied by marine ingression and flooding (Aguirre & Whatley, 1995;Bonadonna, Leone, & Zanchetta 1999;Cruz, Prado & Arroyo-Cabrales 2021;Grill et al. 2007;Iriondo, Brunetto & Kröhling 2009;Mancini et al. 2005;Prado & Alberdi 1999;Prieto 2000;Tonni, Cione & Figini 1999;Zárate, Espinosa & Ferrero 1998). After this, the climate eventually stabilized.
High-resolution paleo-environmental data in the study area is fragmentary, and in some cases controversial (see references above). Most datasets are localized models, limited to specific sites or localities. Ideally, all environmental layers should be based on models of the past climate (e.g., paleo-SDM, see Franklin et al. 2015); however, these datasets are often geographically sparse and may not intersect with the time of interest. Projecting this data onto a larger scale can untimely have negative consequences on the effectiveness of the predictive model. Due to the dynamic formation processes in the landscape (both past and present), chronological resolution is also typically poor, as the archaeological record is frequently time-averaged palimpsests of multiple occupations.
Taking into consideration the above information, I decided to focus on some physical attributes of the landscape that remained stable over extended periods of time (Table 2, Figure 3). These include elevation, slope, aspect, permanent water sources, topographic wetness index (TWI) (an index in hydrological analysis that describes the tendency of an area to accumulate water) (Mattivi et al. 2019), and toolstone sources (areas where hunter-gatherers went to collect raw material for making tools). Elevation, slope, aspect, and the TWI were processed using a 3 arc second (~90-meter) void filled digital elevation model (DEM) in QGIS. While higher resolution DEMs are available for the region (e.g., 1 arc), the 90-meter DEM model is considered the most appropriate here primarily because finer scale DEMs tend to capture non-essential features (noise), such as buildings or roads which can skew the results.
The permanent water sources (watercourses and bodies of water, not including the ocean) were derived from vectorized layers provided by the Instituto Geográfico Nacional (IGN), Argentina. The watercourses include permanent rivers, arms, streams, ravines, and estuaries. The bodies of water include both lakes and shallow lakes. Essentially, the size, shape and formation of these water sources changed throughout the Holocene (Iriondo & Garcia 1993;Johnson et al. 2012;Stutz et al. 2010;Stutz & Prieto, 2003;Zárate et al. 2000). For example, along the Quequén Grande River, evidence suggests that in some areas during the Early and Middle Holocene, a channel simply did not exist; just a broad, marshy, lowenergy environment (e.g., paleo-swamps) represented by interconnected longitudinal pools of flowing water (Martínez & Gutiérrez 2018;Zárate et al. 2000). Apart from some dry pulses (see discussion in Martínez & Gutiérrez 2018), paleo-environmental data indicates that throughout the Holocene, water was available in these locations (in one form or another).
Toolstone sources are highly localized and heterogeneously distributed across the southern Pampas (Bayón

MODEL EVALUATION
The Maxent predictive model was evaluated using the average test AUC output, the estimates of relative contributions of the environmental variables, and the marginal and corresponding response curves. The average test AUC output is calculated over the replicate runs (30), and provides the most comprehensive descriptive analysis for interpreting the results, including the standard deviation, which is used to evaluate the model performance. All of these datasets are reported in the Maxent output files.
To help further evaluate the Maxent model performance, I used information from previously completed pedestrian transects in a section of the inter-hill Pampas (Figure 4). Transects were purposely directed towards environmental areas with watercourses, bodies of water, and ease of access (Rafuse & Massigoge 2018). Currently, the total lineal distance covered by the transects is approximately 81 km. This work has resulted in the discovery of various isolated artifacts, and artifact clusters or surface sites (i.e., assemblages of at least five remains in approximately 10 m 2 ). To avoid bias, any transects which were in the same grid cell as previously known sites (occurrence data used in the Maxent model) where not included in this evaluation. While the Maxent prediction model is not specifically designed to locate small scale isolated materials or clusters of archaeological materials, this field evidence is what generally leads to sub-surface exploration (e.g., shovel tests) and the eventual finding of buried archaeological sites (i.e., occurrence data).

SITE AND ENVIRONMENTAL DISTRIBUTION
Archaeological sites from the study area show a dispersed trend (Nearest neighbor index = 72146, z score = 1121277). The average distance between sites is 163 km, with a minimum distance of 21 m and a maximum distance of 465 km. Many of the sites are located within a range of 89-223 km from each other.
The density of sites is 0.0005 sites per km 2 . The highest density per km 2 is in the inter-hill Pampas (n = 25; 38%; 0.0009 per km 2 ); while the greater number (n) of sites is in the hills (n = 35; 53%; 0.0007 per km 2 ). However, if we considered Tandilia and Ventania as two separate areas, the highest density per km 2 is recorded in the Tandilia range (n = 25; 0.0019 per km 2 ). This range shows some possible clusters in the southern sector (as shown in the Bias grid map, Figure 3). These clusters have been interpreted by some researchers as a result of huntergatherer decisions based on geographic features and resource availability (Mazzanti & Bonnat 2013; Politis et al. 2004); while others suggest geological bias created by a greater preservation of archaeological materials in caves and rock shelters (Martínez et al. 2015). Research bias may also be a factor, as this sector of the hill has been under systematic investigation since the 1990s (Mazzanti 1993(Mazzanti , 1997. The low-lying Pampas is the ecocomplex with the lowest density of sites (9%; 0.0002 per km 2 ). While not an independent eco-complex, it is important to mention that along the Atlantic coast, there are 10 sites located within 5 km of the ocean.
The study area presents a mostly gradual change in elevation, from 0 m along the coast to a maximum height of 1041 m in the Ventania range. The sites contain elevation values from 9-533 m; with an average of 147.3 m ( Table 3). Around half of the sites (n = 34; 51%) have an elevation between 39-208 m (Figure 5), and all of the sites over 200 m are located in the hills.
The average slope angle of sites is 3.6°, with a minimum of 0.1° and a maximum of 15.2°. The majority of sites (n = 51; 77%) present slopes between 1.2° to 4.8°. All of the sites above 4.8° correspond to caves and rock shelters located in the hills. Areas with low slope angle are prone to accumulate water, which could be linked to a high topographic wetness index (TWI). The TWI values for the study area range from 3.5 to 32 (mean = 12; sd = 6.2). Half of the sites (50.7%) have values between 6.3 and 9.1, which suggests that sites are mostly located Aspect ranged from 38° to 356°, with an average of 190°. Half of the sites (n = 34; 51%) have an aspect between 109° to 266°, which is roughly equal to a northern direction (in the southern hemisphere). There is no clear pattern in the distribution of sites and aspect, as sites in the hills have a similar range of aspect as those in the inter-hill and low-lying Pampas. There are also no outliers in this variable (Figure 5), which suggests a homogenous range in aspect.
The average distance to the watercourses is 3.1 km. The minimum value is slightly overestimated, as the distances were calculated from the center vertex of the watercourse, not the banks or edges, which is variable and dependent on water flow and accumulation during wet and dry seasons. This however, is not an issue with the model because the minimum scale for this analysis is 90 m, so all sites located less than 90 m from the watercourses have a cell value equal to 1. There is at least one outlier from the low-lying Pampas, at 34.8 km.
The average distance to permanent bodies of water is 8.7 km, with a minimum distance of 6.1 m and maximum distance of 37 km. Half of the sites (n = 34; 51%) are between 3.6-12.9 km. With the exception of one site located along the coast, all sites at distances greater than 12.9 km are located in the hills. Similar to watercourses, the minimum value is slightly overestimated, as the polygon vertices are irregular; however, once again this is not an issue for sites under the 90 m cell size (n = 4; 6%).
Finally, the average distance to the primary toolstones sources is 87 km, with a minimum of 0 m (6 sites located directly in the polygon layer) and a maximum of 248 km. The majority of the sites are located between 37-105 km of tool stones sources. Only 9 sites (all in the Tandilia range) are located under the 10 km foraging range from these sources.

MODELLING RESULT
From the 75 model runs, the average AICc value was 1725.39 (sd = 322.59). The parameter with the lowest AICc value (1544.83) had a regularization multiplier of 0.75 and used the feature combination of just linear and quadratic. There was no one particular parameter or regulation multiplier that outperformed, however the β = 0.50 had some of the highest values, and the hinge feature on its own did not perform well (Figure 6). Using the parameters indicated above (β = 0.75; feature classes = LQ), 30 model replicates were run. The average test AUC for the 30 runs was 0.857 (sd = 0.128), meaning that this is a potentially useful model. In other words, the model shows moderate performance, which is able to discriminate between random points and the environment associated with site locations (Franklin 2010;Galletti et al. 2013). All seven of the environmental variables were uncorrelated (r < 0.99) (see Data Availability section). The environmental variable with highest percent contribution gain were watercourses (55.3%), followed by slope (29.7%) ( Table 4). The remaining variables contributed to 5.7% or less. The permutation importance (i.e., the measure of the change in model fit when values of a given covariate are randomly shuffled among sites) show qualitatively similar results; with the exception of bodies of water, which increases from 5.4% to 10.6%; suggesting that bodies of water may play an important role in the distribution of sites. Fundamentally, results suggest that areas close to water, in particular watercourses, are the most suitable locations to detect new archaeological sites. Figure 7 shows the mean response curves of the 30 replicate Maxent runs (red) and the mean +/-one standard deviation (blue) of the three variables with the highest permutation importance (watercourses, slope, and bodies of water). These curves show how each environmental variable influences the prediction of the hunter-gatherer site location. The x-axis represents the grid cell values of the environmental variable (e.g., distance in meters), and the y-axis gives the probability of occurrence (cloglog output) on a scale from 0 (low probability) to 1 (high probability). The distance to watercourses shows an abrupt curve, in both the marginal and corresponding response curves, staring at grid cell values of 0.60 (with a less than 10% probability). Grid cells with a value of 0.95 to 1.00 (which can be roughly translated to a distance 40  Figure 6 Results of the Akaike information criterion (AICc) score for the 75 model runs to determine the optimal parameters. Feature classes: L., linear; Q., quadratic; P., product; and H., hinge. meters or less from the watercourse) have 80% or more probability of sites. This shows that suitable distance for hunter-gatherer archaeology sites is less than 40 meters from watercourses. The response curve for slope suggests higher probably of sites occurring between 10° to 30°. However, as mentioned above, almost 80% of the presence-only sites are located on slopes under 5°, and no sites are located on slopes greater than 15.2°. Looking at the response curve created using only the corresponding variable (slope), we see how site probably begins to decrease after 20°. Therefore, this corresponding model appears to more accurately describe the response of sites to slope. A less abrupt curve is shown for bodies of water, with a general correlation between grid cell values and probability. For example, sites with grid cell values between 0.80 and 0.90 (which can be roughly translated to a distance 10 km from the body of water), have a site probability of close to 85%. This is consistent with the average distance to bodies of water for the study area, which is 8.7 km. When using only the corresponding variable, these values decrease (e.g., cell values of 0.80 = 60% probability). Both curves show how this variable plays a lesser role in the contribution to the model.
The Maxent model generated a map using the mean and standard deviation of the 30 output grids (Figure 8). Visual analysis once again suggests that watercourses are the primary contributor to the model. The average AUC value for the individual sites across the study area was 0.74 (sd = 0.25). The majority (n = 40; 59.7%) contained AUC values greater than 0.80 (Figure 9). Of these, 4 sites (all corresponding to sites in the hills; specifically, caves/ rock shelters) had AUC values of 1.00; 22 sites (13 in the hills, and 9 in the inter-hill Pampas) had AUC values between 0.90 and 0.99; and 13 sites (7 in the hills, 6 in inter-hill Pampas) had AUC values between 0.80 and 0.89. A Wilcox rank sum test was unable to reject the null hypothesis using the AUC values from the hills and the inter-hill Pampas (W = 523, p-value = 0.2876; r = 0.137; see Data Availability section for Rscript and raw data). The environmental region with the lowest AUC values was the low-lying Pampas, with a maximum value of just 0.60. Sample size is likely an issue in this region.

FIELD VALIDATION
The pedestrian transects that were completed previous to the Maxent model included a total of 104 grid cells. Only 11 grid cells contained archaeological material in the form of isolated remains (4 grid cells) or clusters (7 grid cells). The mean AUC for the grid cells with archaeological material was 0.68. The remaining 93 grid cells, which contained no archaeological material had a similar mean AUC value (0.66). Over half of these grid cells (n = 57; 55%) had AUC values between 0.60 and 0.80; and just 14 grid cells (13%) had AUC values  above 0.80. This suggests the probability of finding archaeological materials in this survey area is closer to random. Looking at some specific transects, the model did successfully predict the location of archaeological materials along grid cells near watercourses; as well as locations with no archaeological materials (see example in Figure 10). In other words, the model does appear to be useful at finer scales of analysis (localized areas), identifying environmental variables such as bodies of water with low average probabilities.

DISCUSSION
Despite a non-optimal regional environment for predictive models, and a challenging archaeological record (e.g., nonrepresentative sampling), the Maxent model produced a reliable exploratory model for the study area. This is promising not only for archaeology predictive models, but also for the Maxent model performance in this type of environment (Jones 2017; Noviello et al. 2018). The results show that the key environmental factors influencing the habitat distribution of hunter-gatherer archaeological sites in this region are watercourses; specifically, classified grid cells which are close or less than 40 meters from this variable. Additionally, sites with slope between 5° and 20°, and that are less than 10 km from a permanent body of water are also promising. What seems to be less relevant to site distribution are the distances to high quality toolstones, ground wetness, aspect, and elevation.
The influence of watercourses in archaeological site distribution from the southern Pampas has two possible, but not mutually exclusive, explanations: (1) huntergatherer settlement choices, and (2) research bias. First, hunter-gatherers logically occupied areas near watercourses for resource availability, including fresh water, and associated animal and plants resources. The watercourses also likely acted as natural travel routes (waterway networks) for groups. In many cases, these watercourses are interconnected with permanent bodies of water, and both created the structure for a foraging territory (Conolly 2018;Kelly 1995). In regards to hunting strategies, the water attracted different mammals, like the guanaco (the most prominent game recovered in archaeological sites in the region), providing huntergatherers the opportunity to stalk or trap their prey using these natural features of the landscape (Kaufmann et al. 2021). Many archaeological sites from the interhill Pampas near water show evidence of repeated and frequent use, and are interpreted as persistent places (sensu Schlanger 1992: 92) (e.g., Arroyo Seco 2, Paso Mayor, and Las Toscas 5; Bayón et al. 2010;Massigoge et al. 2021;Politis et al. 2016). While bodies of water did not show high relative contributions to the Maxent model, they were clearly a vital resource. One positive characteristic that could be taken from the model is its effectiveness to identify bodies of water with low probability, as shown in the field validation results. This is potentially an effective method to deciding which bodies of water to survey, which is particularly challenging in the inter-hill Pampas, considering the immense number of shallow lakes.
A second possibility for the abundance of sites near watercourses is bias related. While the bias grid used in the Maxent model was designed to counter this variable, the fact that pedestrian survey along watercourses is one of the most widely used methods in the region for the detection of archaeological sites is a good reason for the abundance of sites associated with this variable. One option would be to create a bias grid that used not only the location of known archaeological sites, but also the location of watercourses. This can be tested in future models; however, the expected result will reduce the contribution of water resources, which as mentioned above, was an important variable in hunter-gatherer ways of life. To avoid this bias, a better understanding of the paleo landscape; in particular the paleochannel network during different periods, may be a possible solution.
Slope appears to be relevant for the distribution of archaeological sites, although this seems to play a minor role in the inter-hill and low-lying Pampas. Sites in the hills are naturally located on sloped terrain, and thus show higher slope gradient. From a hunter-gatherer perspective, settlement areas (habitable terrain) and travel routes (the cost of movement) were constrained by slope (Llobera 2000). The slope of the land can also impact behavior of grazing animals and act as a natural feature for trapping prey (similar to watercourses). According to the Maxent model, archaeologists working in the southern Pampas should focus on classified grid cells with less than a 20° slope gradient. Slope is directly related to the flatness of the landscape, and hunter-gatherers will prefer dry terrain for settlement areas. The Maxent model showed a very low percentage contribution to the topographic wetness. TWI has been shown to perform badly in a flat landscape and areas characterized by gentle slope (Mattivi et al. 2019). The flat landscape of the inter-hill Pampas may also have reduced the contribution of the elevation variable. Regardless of the fact that there are a greater number of sites in the hills, elevation appears to be less relevant when predicting site locations in the southern Pampas. This result is somewhat inconsistent with the Maxent predictive model by Politis et al. (2019) in the Lower Paraná River. Their research suggested that elevation is the most relevant environmental variable. Since the environment in that region is mostly fluvial, settlement elevation is considered a relevant part of an adaptation to these types of environments (Politis et al. 2019). Conversely, the Maxent model for the southern Pampas region suggests that hunter-gatherers preferred lower elevations, which, as some authors propose would be better for domestic activities (Mazzanti 2003). Furthermore, most locations along the hills are easily accessible, and social practices were not necessarily related to altitude (Mazzia & Flegenheimer 2012).
Regarding the distance to the high-quality raw materials, the low contribution of this variable is consistent with the results of previous regional archaeological studies, which suggests that hunter-gatherers from the Pampas region kept themselves supplied with artifacts through diverse provisioning strategies or used alternative raw materials from outcrops located near the sites (Messineo & Barros 2015). If we consider the daily foraging trip of ca. 10 km, and the known location of archaeological sites, grassland hunter-gatherers at present day conditions would have had good access to fresh watercourses (average distance = 3.1 km) or bodies of water (average distance = 8.7 km); but the distance to high-quality SBG orthoquartzite would not be accessible in most areas (average distance = 87 km). The average distance between archaeological sites in the southern Pampas is currently 163 km, meaning that hunter-gatherers accessed raw materials through either provisioning places, or provisioning individuals of long-distance exchange networks (Armentano, Martínez & Gutiérrez 2007;Barros 2012;Flegenheimer, Mazzia & Weitzel 2015;Martínez & Mackie 2003).
One last factor to consider when interpreting the results of the Maxent model is the detectability issues. On the one hand, portions of the geographic range where the population is most dynamic (low densities, high turnover, etc.) are precisely where non-detection is most likely to occur (Yackulic et al. 2013). Regions of rapid change in occupancy with corresponding low-detection probabilities are expected to occur in archaeology. Hunter-gatherer groups form the southern Pampas region were likely highly mobile, small sized family groups, and occupied short-term camps and hunting stands; therefore, non-detection is probable in most locations. The occurrence of greater seasonal or interannual fluctuations in water availability in smaller basins could have contributed to a shorter duration and high turnover of human occupations (Massigoge 2009;Massigoge & Pal, 2011). On the other hand, some watercourses have, and continue to undergo significant fluvial processes that negatively affect the preservation and visibility of archaeological deposits, which help explain in part the lack of buried archaeological material in certain sectors of the landscape (Dubois 2006; Favier-Dubois, Massigoge & Messineo 2017).

FUTURE STRATEGIES AND CONCLUSIONS
The Maxent model is a flexible cost-effective way to identify areas most likely to reveal the presence of archaeological sites. Initial results of the model for the southern Pampas are promising, with a moderate performance. When complete random sampling is not feasible, this information can be used by archaeologists to focus their efforts on areas with greater probabilities of site discovery. In other words, this method should be considered a complement, and not a replacement to traditional archaeological survey methods. Random sampling strategies should not be discarded, as this is needed to improve the overall model performance.
Archaeologists must also reinforce a standardized field transect recording method. Recording track survey data with information of occurrence data (i.e., positive and negative grid cells) will help to improve field validation. The Maxent model can also be used to complement heritage cultural management strategies.
With new environmental and archaeological data regularly being gathered and updated, the model needs to be constantly adjusted. Paleoenvironmental data will help improve the performance of the model, in particular, geographic mapping of ancient watercourses. In order to capture the range of variability in the use of the environment and settlement strategies, future analysis should develop a period specific approach (although this may reduce sample size significantly), in addition to site classifications (e.g., base camp vs hunting camps; mortuary vs non-mortuary sites; etc.), and the correlation between distance to major watercourses and the proportions of materials. This will help to test hypothesis and explain archaeological patterns along the watercourses (e.g., waterway-based network models; see Conolly, 2018). The model should also attempt to consider nonenvironmental variables. Many times, human settlement selection is based on cultural factors, such as symbology or taboo, or social factors, such as the presence of other groups. For this reason, we should be cautious and modest in our expectations for inference when using predictive modelling based exclusively on environmental variables (Grøn 2018;Wheatley 2004). Currently, site databases for the southern Pampas are inconsistent, where some site records are very detailed and provide data from sampling strategies, whereas other site records are basic qualitative descriptions. Sample size is still geographically sparse for certain areas (e.g., north of the Tandilia hills for instance), which makes interpretations problematic.
In conclusion, the site and environmental distributions described in this paper, along with a new Maxent predictive model, allowed us to start to identify key landscape variables critical to hunter-gatherer settlement decisions in the southern Pampas. This information can ultimately lead to the detection of new archaeology sites. This is an important advance in the regional analysis, and for similar environments around the world (i.e., grassland hunter-gatherer settings). This work also reinforced the need for using standardized protocols when running and analyzing distribution models. In order for the model to improve we need to continue to use these protocols to enhance transparency, reproducibility, evaluation and reuse in the research (Merow, Smith & Silander 2013;Yackulic et al. 2013;Zurell et al. 2020Zurell et al. : 1273. We also need to continue to study the uncertainty of the model through more calibration (adjustment of the input parameters), sensitivity analysis (examination of the effects of different input parameters), and model validation (evaluation of how well the model provides reproducible results that are close to reality) (Brouwer Burg 2017; Brouwer Burg, Peeters & Lovis 2016).