What Are URL Parameters (Query Strings)?
URL Parameters are pieces of information located in the query string of a URL. The query string is the portion of the URL that follows a question mark. For example: domain.com/shoes?type=sneakers
The format of URL Parameters
Upon first look, you might just see a random string of letters, numbers, and symbols, but once you understand the anatomy of a URL parameter, you'll be able to make sense of these long strings of code.
Question Mark: This starts the URL parameter (domain.com/shoes?type=sneakers)
Ampersand: This separates parameters when you have multiple in one URL (domain.com/shoes?type=sneakers&sort=price_ascending)
Variable Name: (or "key") is like the title or label of the parameter (domain.com/shoes?type=sneakers)
Value: This is the specific value that the key identifies (domain.com/shoes?type=sneakers). For grammar nerds, it's like the predicate nominative in a sentence. In the sentence "Bagel is my dog", Bagel is like the key/variable name while dog is the value.
What are query strings used for?
Query strings are used to either change the page content or to track information about a click through the URL. These two main uses indicate two main types of URL parameters: active and passive.
Active URL parameters modify the content on the page. Some examples include:
Narrowing content (ex: ?type=yellow to display only “yellow” results on a page)
Reordering content (ex: ?sort=price_ascending to display items in a particular order)
Segmenting content (ex: ?page=1, ?page=2, ?page=3 to break up one large piece of content into three smaller parts)
Passive URL parameters do not change the content of the page. They are used for tracking and attribution. Some examples include:
Session IDs (A website’s server assigns a unique number to a specific user for the duration of their website visit.)
UTM codes (Stands for “urchin tracking module” and helps identify the source, medium, and campaign a website visit came from.)
Affiliate IDs (Often used by bloggers/influences. The owner of the website with this type of link generates revenue per click.)
URL Parameters & Duplicate Content
One of the biggest concerns with URL parameters is that, if they are not handled properly, multiple versions of a single piece of content can get crawled and indexed. There are a few things webmasters can do to avoid non-unique content URLs from getting indexed. These options should only be used for parameter-containing URLs that don't contain unique content and that you don't want in the index.
Indicate what version of your page Google should index with rel=canonical
Implement a canonical tag on URLs with parameters that point to the real/primary version of that page. For example, you would place a canonical tag in head of domain.com/shoes?sessionid=1234 that points to domain.com/shoes, indicating the latter is preferred and should be the indexed version. Google doesn't have to respect your canonical tag, but setting one up can influence Google's choice.
Control what Googlebot crawls and doesn't crawl with robots.txt
Block bots from crawling certain sections of your site (ex: Disallow:/*?tag=*) if you want them to avoid certain sections completely. The robots.txt file is what crawlers look at first before crawling your site. If they see something is disallowed, they won't even go there.
Also, be sure to use consistent internal linking! Link to the canonicalized version of the URL and never the version with the parameter to avoid sending inconsistent signals to Google regarding which version of the page is preferred.
URL Parameters & Crawl Budget
Crawl budget is the number of pages Googlebot will crawl on your site before leaving. That means that when Googlebot crawls your site, it doesn’t have to go through all your pages. Every site has a different crawl budget.
Here’s how to find your site’s crawl budget:
Go to your Google Search Console account (old version)
Select Crawl > Crawl Stats.
On the “Pages crawled per day” chart, look for the number listed under “Average”
Compare that number to the number of pages on your site.
If there are a lot more pages on your site than are being crawled, you may be wasting your crawl budget on unimportant pages, such as URL parameters, meaning Google might not be crawling pages that are actually important to you. To learn more, check out Yoast’s article on crawl budget optimization.
According to Google, one of the main factors affecting crawl budget is having crawlable, low-value URLs like those with URL parameters created from faceted navigations. The only way to control the way spiders like Googlebot crawl your site is with a robots.txt file. If you know you use things like sorting/filtering parameters that create endless URLs with non-unique content, then you should disallow crawlers from accessing them via robots.txt.
Moving URL parameters to static URLs
If the current site has indexed URL parameter pages and you're moving to a static URL structure, you should redirect any indexed URL parameter pages to their corresponding new, static location.
To check if a URL is indexed, navigate to “URL Inspection” in the new version of Google Search Console. Input the URL to inspect, and look at the “Coverage” card. If the URL is in Google’s index, it will say “Indexed.”
Alternatively, if you don’t have access to Google Search Console data, use the info: operator in Google search. This will yield which version of the URL is indexed. In the example below, I searched for a Target URL with parameters, and got the canonical version of the page as a result.
If the search engine returns a "did not match any documents" message, it’s not indexed. If the search engine returns a different version of the URL (its canonical), then the parameter-containing URL is not indexed. If a URL is in Google's index, it could be ranking, bringing in traffic, and could possibly have earned backlinks that are adding value to the site. If a URL is not in Google's index, it could still be accessed if someone bookmarked that link, for example. It's most critical to include indexed URLs in your redirect mapping though.
Checking Canonicalization in Screaming Frog
I like to use Screaming Frog to easily see which URLs on my site are canonicalized to a different URL. Run your site, or a section of pages, through the Screaming Frog site spider, then go to Directives > Canonicalised to see the URL address next to its canonical link.
In the example above, you can see that the URL for page 2 of the blog is canonicalized to the main blog page. In this example, you’re telling Google that the preferred version of the page is /blog/ and not to index /blog/?page=2 separately. Only do this if you know the two pages have the same content.
Pagination That Creates URL Parameters
I’ve seen pagination on sites that takes the form of URL parameters - for example: example.com/article?page=2. Should you canonicalize these pages or block them from crawlers in your robots.txt? The answer is “neither”!
Pagination tells Google that a set of pages are meant to be viewed in sequence. They contain unique content, but are parts in a series. An easy way to think about pagination is like pages in a book (unless you’re Jack Kerouac, in which case you’d just use one long roll of paper). Only canonicalize your paginated pages if you have a “view all” version, in which case all paginated URLs would canonicalize back to that version. Otherwise, these are unique pages, and should not be canonicalized to page 1.
Sites paginate their content typically to make it easier to browse. For example, a video page with 200 videos, or a product page with 1,000 results. The rel=”next” and rel=”prev” tags are a way to ensure that visitors can easily click to the next page of results or back to the previous page of results. If you have pages in a series, you don’t actually have to do anything. Google says they can usually figure out pagination, sending searchers to the appropriate page and consolidating link signals. But why chance it? If Google gives you the option to clarify content on your site, I would take it.
Here’s a breakdown of how you would implement pagination on your site.
Categorizing Your URL Parameters in Google Search Console
Google does give you the option within Google Search Console to categorize your parameters. This essentially tells Google which versions of your pages are preferred. If you tell Google to ignore certain parameters, those URLs could be de-indexed (they will no longer show up in search) so do this only if you’re sure how your site’s parameters work.
To locate the URL Parameters tool, log into Google Search Console and select “Crawl” then “URL Parameters.”
If your site contains URL parameters, Google will typically already have them listed. You also have the option to “add parameter.” Select “edit” next to the parameter you want to instruct Google on.
You’ll get a pop-up window (pictured below) asking for more information about your page. In this example, Google wants to know if my “?tag=” parameter changes the page content or doesn’t affect the page content. On my site, selecting a tag will narrow the page content, allowing visitors to see only posts with that particular tag, so I’m going to select “yes.”
Once I select “yes” Google wants me to specify how the parameter changes my content. Here, I have the option to choose between: sorts, narrows, specifies, translates, paginates, and other. Clicking on a tag will narrow my content, in other words, it displays a subset of that page’s content.
Next, I have the option to request how I want Googlebot to treat these URLs. By default, this is set to “let Googlebot decide” but if you want more control over it, you can have Googlebot crawl every URL, crawl only some of those URLs, or crawl none of those URLs. Since my tags create a bunch of thin pages on my site, I’m going to select “No URLs.”
By expanding the “Show example URLs” option, I can see exactly what my request will do. In my example, I can see that Google will not crawl these tag pages. Bingo! That’s what I want. Just keep in mind that Googlebot finds content through links, which means that even if you tell Googlebot to ignore these pages, the crawler could find it through a link and index it anyway, so if you don’t want it crawled, don’t link to it.
If you have multiple parameters in a single URL (for example, maybe your URL sorts your page content by size and color), you can specify crawl settings for each one. Google does give the disclaimer though that you can inadvertently set conflicting parameter settings, so Googlebot will always respect the most restrictive settings over the less restrictive settings. For more information, visit their documentation “Manage URLs with multiple parameters.”
In sum, URLs with parameters can do a lot of different things. Due to the unique nature of query strings, the intent behind them, and the type of sites they’re on, how you handle them will vary by situation.