introduction, web, crawling, common, method, extracting, information, sites, archiving, indexing, future, searches, looking, specific, interact, results, engines, used, daily, basis, current, crawlers, traverse, following, links, visited, poses, limitation, pages, linked, reachable, accessible, collectively, referred, user, visiting, interacting, dynamically, entering, data, form, finding, directly, individually, expected, operator, reason, create, based, index, content, approach, detect, topic, generate, relevant, likely, yield, submitting, filtering, hidden, designed, specifically, entity, focuses, single, article, movie, type, display, individual, actually, includes, being, outdated, manually, sharing, deleted, due, included, targeting, objective, thesis, develop, generating, urls, queries, describes, templates, generated, produced, making, contain, elements, parameters, difficult, predict, working, existing, unique, identifier, incrementing, random, text, useless, purpose, predicting, set, using, relative, age, compared, detecting, missing, described, trying, producing, variable, numeric, refer, otherwise, produce, representing, optimization, techniques, helpful, ordinary, human, readable, caused, system, additional, varying, quality, naively, detected, merged, represent, prefix, suffix, detection, phase, unused, stage, themselves, actual, related, previously, unindexed, analysing, former, similarities, context, barbosa, freire, developed, algorithm, matching, main, focus, possible, accurately, userdefined, components, efficiently, called, classifier, frontier, manager, determines, collects, tracked, maintains, priority, retrieved, improve, searchable, given, defined, harvested, unrelated, excludes, example, login, email, posting, subscription, allows, productive, maximize, efficiency, distinguishing, factor, adaptive, updates, keywords, useful, difference, operate, proceed, role, filled, considered, reverse, engineering, major, limit, scope, certainly, extended, fit, via, limited, generation, accessing, simpler, latter, simple, convenient, customers, sufficient, et, al, utilizes, perform, semantic, crawled, resulting, filtered, extracted, combined, deduplicated, representative, minimize, duplicate, iteration, similar, category, listings, rather, therefore, deduplication, strategy, completely, agarwal, deduplicating, reference, initial, requires, tokenization, addition, tokens, separated, standard, delimiters, implemented, multiple, repeating, substrings, considering, separate, reduce, processed, ranked, normalized, format, ranking, selected, analysis, shorter, less, preferred, provide, generalized, grouping, replaced, wildcards, normalization, removal, superfluous, replacing, values, constant, achieved, verifying, integer, beginning, authors, mentioned, possibility, manica, discovering, input, considers, website, nested, starting, level, repeat, process, reached, root, entire, compare, categorize, pagination, below, descendant, arbitrary, ancestor, removed, structurally, checks, potential, sibling, contained, within, dedicated, presented, inverse, verifies, affect, itself, blanco, taking, sample, finder, module, locates, locating, collections, location, candidates, parallel, weninger, original, further, measure, schemas, jaccard, coefficient, predefined, threshold, calculating, differ, known, expecting, dynamic, easily, variations, structure, whereas, calculated, sensitive, consists, outputs, ratio, addresses, application, highly, effectiveness, varies, greatly, domain, http, com, cheap, items, keychain, carpet, glowing, onions, expensive, fancy, forum, grouped, identified, containing, prevalence, various, added, prefixed, suffixed, treat, served, later, optimize, initially, partitioned, equal, differing, analysed, independently, checked, split, really, effect, displayed, doesnt, whether, comparison, def, findurltemplates, initialtemplates, createinitialtemplates, templategroups, createtemplategroups, prefixsuffixnumericparameterssplit, filtertemplategroups, numericparamfilter, mergebyunusedtextelements, createfinaltemplates, subjected, remove, marked, static, intended, serve, landing, knowing, excluded, code, parsed, separator, stored, performed, reordered, alphabetically, treatment, buildelements, locationtext, paramstext, spliturl, [type, false, [domain]], possiblenumericparameter, num_value, isnumber, text_static, append, [location], startsorendswithdigit, params, sort, [separator], [key], [], [value], pagetopic, mergetemplates, target, range, len, elements[i], [num_value, text_value], generatekey, templatekeypart, templates[key], concatenated, fixed, nor, merging, groupkeypart, groupkey, groups[groupkey], restriction, existence, non, cannot, significantly, increase, amount, unnecessary, however, total, needed, lower, topicpage, consisted, entirely, digits, usually, extra, description, vital, redirects, correct, obvious, having, appropriate, anyway, expect, handling, ignored, changing, identical, tricky, remains, chapters, actions, taken, looked, verified, increased, moved, applysplitmergepertextgroup, merge, groupsbytext, bytext, groupbytext, merge_minimum_values, createsplittemplate, removefromgroup, applysplitmergetoall, cansplitmergeall, min, merge_verify_depth, cansplitmerge, groupsbytext[i], verifyandapplyprefixsuffixmerge, elif, getbytextgrouping, prefixsuffixprocessgroupelement, elementindex, groupsplit, mergepairs, elements[elementindex], splittype, splitwithtype, mergekey, generatemergekey, findorcreatemergefromgroupsplit, pair[splittype], numberset, pair[], prefixsuffixprocessgroup, getgrouppathelementcount, finalgroups, popitem, finalgroups[key], sake, simplicity, address, increasing, ideal, interchangeable, tried, followed, previous, per, currently, respond, nonsuccess, comparing, assuming, redirections, error, occurring, reload, source, verifycategorymerge, sorted, keylambda, cancategorymerge, templates[i], verifynamemerge, cannamemerge, applyunusedmerge, newelements, elements[merge, index], changedelement, text_value, newelements[merge, templatekey, newtemplate, verifyandapplymerge, bestuniquevalueratio, totalcount, gettotalurlcount, [templatekeypart, elements], parts[elementindex], mergeunusedprocessgroupelement, groupmerge, findorcreatemergeinmergegroup, mergeunusedprocessgroup, series, finalizing, leaves, remaining, larger, largest, finalizetemplate, numindex, highestuniquevaluecount, numericelementcount, finaltemplates, uniquevaluecount, numelement, elements[numindex], singlenumberkey, generatesinglenumberkey, finaltemplates[singlenumberkey], values[i], buildsinglenumberelements, finalization, finalvalues, [int, values], urlwithplaceholder, nthurlwithplaceholder, knownvalue, int, correspond, applied, chosen, ability, simplest, server, fixes, redirecting, supposed, setting, inspection, unlikely, valid, proper, invalid, necessary, indicates, essentially, corresponding, tag, attributes, match, automatically, exclusions, meta, title, direct, loaded, twice, record, occur, randomly, negatives, excluding, removing, detector, knowledge, response, solutions, alternative, acting, validating, exactly, traversed, gaps, exist, continuous, ending, covered, sampled, interval, exceeds, decreased, repeated, versus, success, operation, halted, guarantees, getting, recordgeneratedurl, aggregate, generatedurl, validurls, totalqueries, checkindices, ids, idlist, shuffle, originalurl, replace, emptyurl, very_high_value, str, compareotherentityurl, generatefromtemplate, edgex, initial_x, intervaly, initial_y, processedids, minimum_y, maximum_x, currentids, expectedvalue, addgapindices, difference_update, checkedids, generatenewurls, isfailedcondition, failed_minimum, failed_ratio, aggregateresults, evaluation, evaluating, estonian, july, august, partial, comparable, internet, focused, raw, metadata, archives, heritrix, files, mime, test, returned, passed, filter, kilobyte, indicating, length, meaningful, patterns, testing, written, java, apache, httpcomponents, library, jsoup, representation, utilities, logging, built, gradle, script, command, started, entered, named, prints, splitting, warc, specified, precision, configured, lowering, average, demonstrated, inspected, offset, instance, heavily, outnumbered, paginated, tied, applies, language, perspective, class, modified, indicator, redirected, prior, causes, positive, duplicated, existed, manage, approximately, median, higher, accuracy, sense, de, tected, recall, insight, reliably, manual, statistical, happened, changed, wholly, requiring, interesting, possibly, responsible, zero, uncommon, minimum, created, prove, reduced, hint, reducing, appeared, noticeable, catching, improving, decreasing, overall, maximum, ordered, matched, significant, attempted, subset, processing, measured, evaluate, validate, correctly, tested, classification, visual, rare, shown, worked, discovered, validation, improvement, successfully, conclusions, intentionally, slash, required, experiments, inserting, relies, bottlenecks, reliable, reliability, general, especially