introduction, crawling, common, method, extracting, information, sites, archiving, indexing, future, looking, specific, interact, results, basis, current, crawlers, traverse, links, visited, poses, limitation, linked, site, reachable, accessible, crawler, collectively, referred, user, visiting, interacting, dynamically, entering, data, finding, directly, individually, expected, operator, create, link, based, index, content, approach, detect, topic, generate, relevant, yield, submitting, filtering, methods, designed, specifically, entity, focuses, article, type, display, individual, actually, includes, outdated, manually, sharing, deleted, included, targeting, objective, thesis, develop, generating, urls, queries, templates, generated, produced, contain, elements, parameters, difficult, predict, working, existing, contains, unique, identifier, incrementing, random, text, useless, purpose, predicting, identifiers, using, relative, compared, detecting, entities, missing, described, trying, producing, variable, numeric, refer, produce, template, representing, optimization, techniques, ordinary, human, readable, caused, system, additional, varying, quality, naively, detected, merged, represent, prefix, suffix, detection, phase, unused, element, themselves, actual, related, previously, unindexed, analysing, former, similarities, approaches, context, barbosa, freire, developed, algorithm, matching, main, focus, accurately, userdefined, components, efficiently, called, classifier, frontier, determines, collects, tracked, maintains, priority, retrieved, searchable, detects, defined, harvested, unrelated, excludes, example, login, email, posting, subscription, productive, maximize, efficiency, distinguishing, factor, adaptive, updates, keywords, operate, proceed, role, filled, result, considered, reverse, engineering, limit, scope, extended, purposes, via, limited, generation, query, accessing, simpler, latter, convenient, sufficient, et, al, utilizes, perform, semantic, crawled, resulting, filtered, extracted, combined, deduplicated, representative, minimize, duplicate, iteration, similar, category, listings, therefore, deduplication, strategy, completely, agarwal, deduplicating, reference, initial, requires, tokenization, addition, tokens, separated, standard, parameter, delimiters, implemented, multiple, repeating, substrings, considering, reduce, processed, ranked, normalized, format, ranking, selected, analysis, shorter, less, preferred, provide, generalized, grouping, replaced, wildcards, normalization, removal, superfluous, final, replacing, constant, achieved, verifying, integer, mentioned, possibility, manica, discovering, input, considers, website, nested, starting, process, reached, entire, categorize, pagination, descendant, arbitrary, ancestor, removed, structurally, potential, sibling, contained, within, dedicated, presented, inverse, verifies, affect, blanco, sample, finder, module, locates, locating, collections, collection, location, candidates, parallel, weninger, original, similarity, distance, schemas, jaccard, coefficient, predefined, threshold, calculating, differ, expecting, dynamic, easily, variations, structure, whereas, calculated, sensitive, consists, outputs, ratio, application, highly, effectiveness, varies, greatly, domain, listing, output, http, com, items, keychain, glowing, expensive, forum, grouped, identified, containing, prevalence, various, represents, added, prefixed, suffixed, served, later, optimize, initially, partitioned, differing, analysed, independently, consist, checked, split, effect, displayed, doesnt, whether, comparison, def, findurltemplates, initialtemplates, createinitialtemplates, templategroups, createtemplategroups, prefixsuffixnumericparameterssplit, filtertemplategroups, numericparamfilter, mergebyunusedtextelements, createfinaltemplates, subjected, marked, static, intended, landing, knowing, excluded, code, parsed, separator, stored, performed, reordered, alphabetically, special, treatment, buildelements, url, locationtext, paramstext, spliturl, [type, [domain]], possiblenumericparameter, num_value, isnumber, text_static, append, [location], startsorendswithdigit, params, param, [separator], [key], [], [value], pagetopic, mergetemplates, target, range, len, elements[i], [num_value, text_value], generatekey, templatekeypart, templates[key], concatenated, fixed, merging, groupkeypart, groupkey, groups[groupkey], restriction, existence, non, significantly, increase, unnecessary, reduces, total, needed, lower, increases, topicpage, consisted, entirely, digits, usually, description, vital, redirects, obvious, redirect, appropriate, handling, suffixes, prefixes, ignored, changing, identical, tricky, chapters, actions, looked, constants, verified, increased, moved, applysplitmergepertextgroup, merge, groupsbytext, bytext, groupbytext, merge_minimum_values, createsplittemplate, removefromgroup, applysplitmergetoall, cansplitmergeall, min, merge_verify_depth, cansplitmerge, groupsbytext[i], verifyandapplyprefixsuffixmerge, elif, getbytextgrouping, prefixsuffixprocessgroupelement, elementindex, groupsplit, mergepairs, elements[elementindex], splittype, splitwithtype, mergekey, generatemergekey, findorcreatemergefromgroupsplit, pair[splittype], numberset, merges, pair[], prefixsuffixprocessgroup, getgrouppathelementcount, finalgroups, popitem, finalgroups[key], problem, sake, simplicity, address, increasing, interchangeable, followed, categories, previous, chapter, per, indices, currently, candidate, respond, nonsuccess, comparing, produces, assuming, redirections, error, codes, occurring, reload, source, verifycategorymerge, sorted, keylambda, cancategorymerge, templates[i], verifynamemerge, cannamemerge, applyunusedmerge, newelements, elements[merge, index], changedelement, text_value, newelements[merge, templatekey, newtemplate, verifyandapplymerge, bestuniquevalueratio, totalcount, gettotalurlcount, [templatekeypart, elements], parts[elementindex], mergeunusedprocessgroupelement, groupmerge, findorcreatemergeinmergegroup, mergeunusedprocessgroup, series, finalizing, remaining, refers, larger, largest, finalizetemplate, numindex, highestuniquevaluecount, numericelementcount, finaltemplates, uniquevaluecount, numelement, elements[numindex], singlenumberkey, generatesinglenumberkey, finaltemplate, finaltemplates[singlenumberkey], values[i], buildsinglenumberelements, finalization, finalvalues, [int, values], urlwithplaceholder, nthurlwithplaceholder, knownvalue, int, correspond, applied, require, ability, simplest, server, responds, redirecting, supposed, inspection, unlikely, valid, invalid, necessary, indicates, essentially, differs, corresponding, attributes, automatically, exclusions, include, contents, meta, loaded, occur, randomly, negatives, excluding, removing, detector, knowledge, response, solutions, alternative, acting, validating, exactly, traversed, gaps, gap, exist, continuous, covered, sampled, interval, exceeds, decreased, repeated, versus, success, operation, halted, guarantees, recordgeneratedurl, aggregate, generatedurl, validurls, totalqueries, checkindices, ids, idlist, shuffle, originalurl, replace, emptyurl, very_high_value, str, compareotherentityurl, generatefromtemplate, edgex, initial_x, intervaly, initial_y, processedids, minimum_y, maximum_x, currentids, expectedvalue, addgapindices, difference_update, checkedids, update, generatenewurls, isfailedcondition, failed_minimum, failed_ratio, aggregateresults, evaluation, evaluating, websites, estonian, july, august, partial, comparable, internet, focused, metadata, archives, heritrix, mime, archive, returned, passed, filter, domains, kilobyte, indicating, kilobytes, verify, meaningful, patterns, testing, java, apache, httpcomponents, jsoup, representation, commons, utilities, logging, gradle, script, command, started, commands, entered, named, splitting, warc, performs, specified, precision, configured, lowering, average, demonstrated, inspected, filters, offset, instance, heavily, outnumbered, paginated, tied, applies, perspective, class, modified, indicator, redirected, action, prior, causes, positive, duplicated, existed, manage, positives, types, approximately, median, higher, accuracy, de, tected, recall, insight, reliably, manual, statistical, happened, changed, wholly, splits, requiring, pattern, possibly, responsible, zero, uncommon, minimum, created, reduced, reducing, appeared, noticeable, catching, improving, decreasing, maximum, ordered, matched, significant, attempted, subset, processing, measured, evaluate, validate, correctly, tested, classification, visual, rare, classifications, shown, worked, discovered, validation, improvement, indicate, successfully, conclusions, intentionally, slash, required, experiments, inserting, experiment, relies, limitations, bottlenecks, reliable, reliability, provides, especially