Trends and prospects for the development of search services. Prospects for the development of search systems. Modern sociological problems of physical culture and sports

KOVROV STATE TECHNOLOGICAL ACADEMY

Information and analytical reference on computer science

on the topic: “Modern search engines, development trends of one of the market leaders Yandex.”

Completed by: 1st year student

3 academic groups

Makarov Ivan

Introduction. 3

Main part. 4

Conclusion. eleven

Introduction.

Yandex is a Russian IT company that owns the Internet search system of the same name and an Internet portal. The Yandex search engine is the eighth largest search site in the world in terms of the number of search queries processed (1.290 billion, statistics for August 2009) and the second largest non-English search engine after the Chinese Baidu.

The company's website was opened on September 23, 1997. 2000 is the year the Yandex company was founded. The founder of Yandex is CompTek (the company that developed the Yandex search engine and provided its support). The company reached self-sufficiency in 2002, turnover for 2006 was 72.6 million dollars, net profit - 29.9 million, for 2005 - 35.6 million dollars, net profit - 13.6 million.

The main and priority direction of the company is the development of a search engine, but over the years, Yandex has become a multi-portal. In 2009, Yandex included more than 30 services. The most popular are: Yandex.News, Yandex.Photos, Yandex.Toys and others.

The main office of the company is located in Moscow. The company has offices in St. Petersburg, Yekaterinburg, Odessa, Simferopol and Kyiv. In mid-June 2008, the company announced the opening of Yandex Labs, an office in the USA, California.

Main part.

History of the company's creation.

The Yandex.Ru search engine was officially announced on September 23, 1997 at the Softool exhibition. The main distinctive features of Yandex.Ru at that time were checking the uniqueness of documents (excluding copies in different encodings), as well as the key properties of the Yandex search engine, namely: taking into account the morphology of the Russian language (including search by exact word form), search taking into account distances (including within a paragraph, the exact phrase), and a carefully developed algorithm for assessing relevance (compliance with the response to the query), taking into account not only the number of query words found in the text, but also the “contrast” of the word (its relative frequency for a given document) , the distance between words, and the position of the word in the document.

A little later, in the “Fairy Tales” section (observations on the content of the Russian Internet), the first Runet fairy tale appeared - “Web - humanism or chernukha?” And in the “Numbers” section there is the first estimate of the volume of the Runet, 5 thousand servers and 4 GB of texts.

Two months later, in November 1997, the natural language query was implemented. From now on, you can simply access Yandex.Ru “in Russian”, ask long queries, for example: “where to buy a computer”, “genetically modified products” or “international telephone communication» and receive accurate answers. The average length of a query in Yandex.Ru is now 2.7 words. In 1997, it was 1.2 words, then search engine users were accustomed to the telegraph style.

In 1998, Yandex.Ru introduced the ability to “find a similar document,” a list of found servers, search within a specified date range, and sort search results by last modified time. During this year, the “volume” of the Russian Internet has doubled, which has led to the need to optimize search engines. Both then and now (with a volume of 200 GB) the search speed on Yandex.Ru is a fraction of a second.

During 1999, Runet grew by an order of magnitude, both in the volume of texts and in the number of users. This was a year of rapid development for Yandex.Ru. The new search robot made it possible to optimize and speed up the crawling of Runet sites. Today, the search base of Yandex.Ru is twice as large as that of its closest competitors.

The new robot made it possible to provide users with new opportunities - search in different areas of text (headings, links, annotations, addresses, captions for pictures), limiting the search to a group of sites, searching by links and images, and also highlighting documents in Russian. A search in catalog categories appeared and for the first time in Runet the concept of “citation index” was introduced - the number of resources referencing a given one.

Throughout the year, work continued on quantitative and qualitative analysis of the Runet. The NINI index (index of “Inconstancy of Interests of the Internet Population”) was opened, showing the dynamics of changes in the interests of Internet users. The search Forum and a new service have opened - request subscription, that is, you can leave your request on Yandex.Ru and regularly receive information by e-mail about the appearance of new and/or changed documents that correspond to this request. By the beginning of the school year, “Family Yandex” was opened, filtering search results from obscenities and pornography.

Origin of the word "Yandex".

Today “Yandex” is a word from the everyday life of an Internet user. On the Internet you often see “What, Yandex has already been cancelled?”, “Loneliness is when Yandex is the first to congratulate you on your birthday”, “All questions to Yandex”. Many people already think that it has always been this way. In some ways, this is true - Yandex really appeared simultaneously with the mass Internet, when access to the network ceased to be the lot of select technical specialists. But the word “Yandex” itself is artificial, has its own authors and its own history.

In 1993, Arkady Volozh, the future general director of the future Yandex company, and Ilya Segalovich, the future technology director of the company, developed, as it later turned out, the main technology - the search for unstructured information taking into account the Russian language.

The development had to be called something. Ilya remembers how he wrote out in a column various derivatives of words that described the meaning of technology. Quite quickly it became clear that search (“search”) in Russian sounds too dissonant and you can’t make a successful combination based on it. The word index was more suitable. So yandex appeared in the list of names - yet another indexer (“another indexer” or Language index). Both Ilya and Arkady liked the option - easy to pronounce, easy to write. In addition, Arkady suggested that the letter “Ya” in the name - specifically Russian - should be left Russian for clarity. This is how the word “Yandex” was invented. And the program file, accordingly, was called yandex.exe.

In 1996, when for the first time search was offered to the general public as a technology, and not as part of a content product (before that there were the International Classification of Inventions and the Bible Computer Reference), the line of programs was called Yandex and this name was explained as Language iNDEX. The first programs in the line were Yandex.Site (search on one of your own sites - this product is now called Yandex.Server) and Yandex.Dict (a morphological prefix for AltaVista, the only search engine that at that time could somehow work with the Cyrillic alphabet) .

But of course, wide use The word “Yandex” was used in September 1997, after the launch of the search engine www.yandex.ru. Since then, users of the system have been offering us their interpretations. For example, Tyoma Lebedev, preparing to draw the first version home page Yandex website, said: “Oh, I understand, if the first “I” in the word index is translated into Russian, it will be “I”, that is, it will turn out to be “Yandex”. The authors honestly admitted that they had not thought about this, but it is a good interpretation and is accepted. Then someone on the Internet suggested another option, having seen two sides of the Internet, INDEX and YANDEX. This word has already acquired derivatives; for example, Yandex employees are often called “Yandexoids” and, less often, “Yandex people.”

Search "Yandex".

Yandex search allows you to search on Runet, Uanet, and Kaznet (since October 14, 2009) for documents in Russian, Ukrainian, Belarusian, Romanian, English, German and French, taking into account the morphology of Russian and English languages and proximity of words in a sentence. Since the beginning of 2006, Yandex search has been installed on the Mail.ru portal.

In addition to web pages in HTML format, Yandex indexes documents in PDF (Adobe Acrobat), Rich Text Format (RTF), binary formats Microsoft Word, Microsoft Excel, Microsoft PowerPoint, SWF (Macromedia Flash), RSS (blogs and forums).

A distinctive feature of Yandex is the ability to fine-tune the search query. This is achieved through a flexible query language. So, for example, for an exclusion operation you can specify the scope: the query A ~~ B will find documents (pages) in which A is present, but B is not present, and the query A ~ B will find documents where the word B is not present with the word A in in one sentence. Similarly, the & operator looks for combinations keywords in a sentence, and && throughout the document.

Operator! allows you to disable morphology for a specific word, eh!! allows you to specify the normal form, which avoids some problems associated with homonymy. For example, the query!!Ivanov will find Ivanov and Ivanovs, but not Ivan.

By default, Yandex displays 10 links on each results page; in the search results settings, you can increase the page size to 20, 30 or 50 found documents. Sometimes the order of sites on these pages may differ because the databases for these results are not updated at the same time.

If a query finds a lot of links, the results page offers to limit the search range - by region (that is, by IP range) or by date. If nothing is found for a word or words, it is proposed to replace it/them with similar ones (since the proposed options depend on the frequency of finding similar words, sometimes funny situations arise). Also, it is proposed to correct words typed in the wrong keyboard layout.

From time to time, Yandex algorithms responsible for the relevance of search results change, which leads to changes in the results of search queries. The last officially announced changes occurred in March 2004, April 2005 and January 2007; according to unofficial information, there are much more of them (for example, the last one in August-September 2007).

In particular, these changes are aimed against search spam, which leads to irrelevant results for some queries (less often, for entire families of queries). Semi-automatic and manual moderation of search results (using so-called “white hat optimizers”), as well as direct refusal to index “malicious” sites, are used against search spam that is not automatically screened out.

Owners, management and performance indicators.

More than 30% of the company, according to its own data, belongs to the investment funds ru-Net Holdings and Baring Vostok Capital Partners, 15% to the Tiger Technologies fund, about 30% to the company’s founders and 20% to managers and other minority shareholders.

In mid-September 2009, it became known that the parent company of Yandex, the Dutch company Yandex N.V., issued a priority share, which was transferred to Sberbank for a symbolic 1 euro. The only right that the share gives is to veto the sale of more than 25% of the company's shares.

Management: Rkady Volozh - General Director, Ilya Segalovich - Technical Director, Elena Kolmanovskaya - Editor-in-Chief, Alexey Tretyakov - Commercial Director, Svetlana Kondrashova - Advertising Director.

All Yandex services.

Information retrieval:

Search and ya.ru

Directory - directory of websites sorted by citation index. It is replenished manually by catalog editors, and there is the possibility of paid registration.

News - The top stories of the day, sourced from mainstream media outlets found on the Internet. It is possible to search by news, as well as subscribe to news for a given search query.

Yandex.XML - using this service you can make automatic search queries to Yandex in xml format.

Search blogs and forums - search through resources that have RSS representation, as well as ratings of current queries, popular categories and news.

Market - search for offers for the sale of goods and services, selection of models.

“Meditative” search is the only search service in the world that has a “Find” button, but no search bar.

Dictionaries - encyclopedias, reference books, dictionaries-translators.

Pictures - image search.

Video - video search.

Maps - maps of Europe and Russia, maps of major cities of the Russian Federation (accurate to the house), search on the map, as well as the ability to “wander” along the streets of some cities.[source?]

Addresses - search for contact information by names of companies and organizations.

Poster - information about available events: cinema, theater, concerts, sports, clubs, etc.

Weather - weather forecast.

TV program - programs of central, regional and satellite channels TV.

Timetables - train and plane timetables.

Personalized:

Yandex.Video - video hosting and video search.

Mail - email.

Ya.ru is a blogging service.

Yandex.Photos - photo hosting.

Spam defense - spam filtering.

People - free hosting for personal Internet pages, as well as a file storage service.

Yandex money - payment system, allowing you to pay for goods and services online.

Bookmarks is a bookmark storage system integrated with Yandex. Bar."

Subscriptions - subscription to news.

Lenta - online RSS reader

Yandex.Direct is a system for placing contextual advertising with payment by clicks.

Cup - regular Internet search competitions.

Cities - Internet indexes of Russian cities.

Tariff - search by tariffs of Internet providers.

Postcards

Spring - automatic generation of philosophical essays.

Internet - measures the speed of the Internet connection.

Mirror - Mirror of the main Linux OS distributions, as well as FreeBSD and other projects.

Yandex. Local network - provides the opportunity to use all Yandex services not at the federal, but at the local rate.

Metrics - allows you to measure traffic, analyze user behavior and evaluate the effectiveness of advertising campaigns.

Software products:

Spam filter Spam defense for corporate use (paid).

Search program Yandex files Desktop Search on your computer.

Ya.Online instant messaging program based on Jabber. Allows you to also receive notifications about new emails from Yandex. Mail about new events from the Odnoklassniki.ru and VKontakte sites.

Punto Switcher program is an automatic layout switcher.

Widgets for operating rooms Mac systems OS X and Windows Vista, and also for Opera browser: Search, Traffic, Clock, News.

Yandex ICQ is a special version of the ICQ client with symbols and integration of some services from Yandex.

Interesting facts.

1) The average length of a request in Yandex.Ru is now 2.7 words. In 1997, it was 1.2 words, then search engine users were accustomed to the telegraph style.

2) Yandex appeared before www.yandex.ru. The word Yandex was invented in 1993, and it was publicly pronounced in 1996 and then meant not a company or a search engine, but a search technology on its own server and a morphological prefix to the Altavista.com search engine.

3) www.yandex.ru was launched to demonstrate the capabilities of Yandex technology; no one thought about making money from advertising.

4) The slogan “Everything can be found” was invented in 2000. In the same year, Yandex launched the first advertisement for an Internet site on Russian television.

5) According to Yandex itself, about 80 percent of its audience is from Russia, about 3 percent from Europe, and just over 1 percent from the USA.

6) Some of the Yandex technical support employees operate under the collective pseudonym “Platon Shchukin”.

Conclusion.

So, now we have complete information about Yandex. We know who runs it, how it works from the inside, what the history of the company’s development is, and much more. Now we can easily understand why Yandex is a leader in the Russian and global markets. The main reason I think the success of Yandex is that the search engine copes well with the complexities of the Russian language. This is why search engines that were developed for the English language cannot index and rank Russian-language documents as well. The second advantage I see is the creative, friendly, cheerful slogans with which Yandex attracts users to use its services. Thematic pictures that Yandex places near its search bar are much more accessible to the Russian user.

Leaders, trend The growth in the number of proposals will continue. Those present today market electronic payment systems... more one landmark event: Paycash entered into an agreement with the largest search engine system ...

  • Volga Federal District: modern state and prospects development(using the example of the Republic of Tatarstan)

    Coursework >> Economics

    ... trends further development. ... leader. ... development one from the most important... complex search engine and aerobatics... market. Development ... modern technologies, high-performance equipment, modern...supertoxicants; - development systems land monitoring...

  • Modern sociological problems of physical culture and sports

    Abstract >> Sociology

    To popularize political leaders, parties, ... aggregate subject-object system socio-pedagogical... creative search engine activities... market and the state. Market ... Trends development modern Olympic movement Russia is one from ...

  • Trends development oil industry in the global economy

    Abstract >> Economics

    World market oil: trends development and... already carried out search-exploration work, ... Preliminary assessment. Leader in world consumption... is one from essential elements modern world economic... world economic system, during...

  • Send your good work in the knowledge base is simple. Use the form below

    Students, graduate students, young scientists who use the knowledge base in their studies and work will be very grateful to you.

    Similar documents

      Basic protocols used on the Internet. Internet search tools. Popular search engines. How search engines work. Search and structuring tools. Automated web navigation. Criteria for the quality of search engine performance.

      abstract, added 02/14/2012

      Essence and content " world wide web", the use of hypertext technology, in which documents are interconnected using hyperlinks. Browsers for viewing Web pages. Methods of communication on the Internet. File archive servers, their tasks.

      presentation, added 12/21/2014

      The structure of Internet reference and search systems, the operation of search mechanisms. Comparative review of reference and search systems (Gopher, WAIS, WWW, AltaVista, Yahoo, OpenText, Infoseek). Search robots, the most popular reference and search systems.

      abstract, added 01/14/2010

      Browser extension that provides information such as translation, dictionary meanings and audio for the selection on any page Internet of English words. The set of errors returned by all functions. Structure of data storage on the server.

      thesis, added 11/30/2016

      Grade current state Internet as a source information support scientific research, methods for improving search necessary files. The development of the Semantic Web as a way to enhance the role of the Internet as a source for research.

      course work, added 08/29/2015

      Tools for searching information on the Internet. Basic requirements and methods of information retrieval. Structure and characteristics of search services. Global search engines WWW (World Wide Web). Planning the search and collection of information on the Internet.

      abstract, added 11/02/2010

      The concept of the Internet as a worldwide information system, its internal structure and principles of operation. History and main stages of development of the World Wide Web, characteristics of the services provided. Assessment of prospects and expansion trends.

      To search the index, the user must formulate a query and send it to the search engine. The request can be very simple, at least it should consist of one word. To build a more complex query, you need to use Boolean operators that allow you to refine and expand your search terms.

      The most commonly used Boolean operators are:

      • AND - all expressions connected by the “AND” operator must be present on the searched pages or documents. Some search engines use the “+” operator instead of the word AND.
      • OR - at least one of the expressions joined by the "OR" operator must be present in the pages or documents being searched.
      • NOT - the expression or expressions following the "NOT" operator must not appear on the searched pages or documents. Some search engines use the "-" operator instead of the word NOT.
      • FOLLOWED BY - one of the expressions must immediately follow the other.
      • NEAR - one of the expressions must be at a distance from the other no greater than the specified number of words.
      • Quotation marks - words enclosed in quotation marks are treated as a phrase to be found in the document or file.

      Prospects for the development of search engines

      The search specified by Boolean operators is literal - the machine searches for words or phrases exactly as they were entered. This can cause problems when the words entered are ambiguous. For example, the English word “Bed” can mean a bed, a flower bed, a place where fish spawn, and much more. If the user is only interested in one of these meanings, he does not need pages with a word that has other meanings. It is possible to construct a literal search query aimed at cutting out unwanted values, but it would be nice if the search engine itself could provide appropriate assistance.

      One of the options for how a search engine works is conceptual search. Part of this search involves using statistical analysis pages containing words or phrases entered by the user, to find other pages that might be of interest to that user. It is clear that conceptual search requires storing more information about each page, and each search query will require more calculations. Currently, many development teams are working on improving the efficiency and performance of these types of search engines. Other researchers have focused on a different area called natural-language queries.

      The idea behind natural language queries is for the user to formulate a query the same way they would ask the person sitting next to them—without having to keep track of Boolean operators or complex query structures. The most popular modern natural language website search queries is AskJeeves.com, which analyzes a query to identify keywords that are then used to search the search engine's site index. The said site only works with simple search queries, but the developers, in a highly competitive environment, are developing a natural language search engine that can handle very complex queries.

      Modern search engines are the most powerful hardware and software systems, the purpose of which is to index documents on the Internet to provide data at the request of users.

      To provide high-quality and relevant information, search engines have to constantly improve their ranking formulas. Ensuring maximum High Quality search results for users and preventing manipulation by optimizers - these are the key goals of search engine development.

      At a time when search engines were just beginning to emerge, their ranking algorithms were very primitive. Thanks to this, the most resourceful optimizers began to promote their sites so that they appear in the search results for queries that interest them. As a result, this led to the fact that resources that often did not provide the user with any useful information, became the first, thereby relegating more useful sites to the background.

      In response to these actions, search engines began to defend themselves by improving their ranking algorithms, introducing more and more variables into the formulas and taking into account more and more factors. Over time, this struggle between optimizers and search engines moved to a new level and contributed to the emergence of more advanced algorithms, based, among other things, on machine learning.

      Stages of search engine development:

      As you can see from the diagram, the development of search engines and their algorithms goes in circles. Some create new algorithms, others adapt to them. It is difficult to say whether this process will ever stop, but personally I am inclined to believe that it will not. Despite the fact that search engine ranking algorithms have recently not only changed the significance of various factors, but also changed qualitatively, this does not frighten optimizers: their arsenal is constantly being replenished with more and more new techniques.

      How often do search engines change their algorithms?

      Let's turn to the main search engine of the Runet - Yandex. Qualitative and fundamental changes in ranking formulas occur on average once a year. Not long ago, Yandex introduced a new search platform called “Kaliningrad”. Its essence is to generate personal results for each user based on his search history and preferences.

      In addition, we should not forget that every search engine, including Yandex, constantly experiences “tweaks” in ranking formulas, when in automatic or semi-automatic mode the influence of certain factors is underestimated, while others, on the contrary, are increased. All this is done with only one goal - to improve search results as much as possible, ridding them of sites that do not satisfy user needs, and thereby increasing its relevance.

      Looking at changes to search Google system, you can see that transformations of the ranking formula also occur constantly, and Google itself reports hundreds of small changes from year to year. But if we talk not about the ranking formula, but about the filters that help Google clear the results of low-quality sites, then new versions of algorithms, such as Panda or Penguin, appear every 3-6 months.

      The answer to the question posed above can be this: search engines are constantly improving their ranking algorithms, and dramatic changes occur on average once every 6-12 months.

      Which search engine algorithms pose a real threat to promotion?

      I would like to answer the “rally” - none, but still, let’s figure it out. And to do this, we need to ask the question: do search engines set themselves the goal of preventing search engine promotion?

      I think not. There are several justifications for this:

      1. Optimizers help search engines improve their algorithms, which ultimately leads to improved quality of search results. After all, if there were no optimizers, then search engines, most likely, would have stopped their development in 2000.

      2. Without optimizers, the results for many commercial queries would look like a collection of abstracts and useless information articles.

      If search engine promotion did not exist in principle, then it would not make sense for search engines to grow and develop as intensively as they do now.

      Thus, we come to the following conclusion:

      Search engines and SEO are closely and inextricably linked with each other. That is why, by following the rules they set, you can have absolutely no fear of algorithms, because PSs do not set out to destroy SEO as such.

      Development of search engine services

      Speaking about search engines, do not forget that Yandex, Google or Bing have their own services designed to help users. In addition to search results, over the years of evolution, search engines have studied the behavior of their users in order to increase satisfaction with search results.

      Actually, for this purpose the Yandex search engine came up with the so-called mechanism. “Wizards” who help the user quickly get an answer to their question. So, for example, when you enter the query “weather forecast”, Yandex will display information about the weather directly on the search results page. current date, thereby relieving the user of the need to navigate through the search results.

      Other search engines, for example, Google, went further and instead of “Sorcerers” they offered a more interesting solution - “Knowledge Graph”.

      “Knowledge graph”(from English Knowledge Graph) is the first step on Google's path to intelligent search. Thanks to this innovation, the search engine displays not only standard links, but also direct answers to user questions, brief information about the object of the request and information about facts related to it. Technically, the “Knowledge Graph” is a semantic network that links together various entities: individuals, events, spheres of life, things, categories. Information base for the “knowledge graph” there are a number of sources: the open semantic database Freebase, Wikipedia, the CIA open data collection and other sources.

      What conclusions can be drawn, you ask?

      The answer is simple: search and search services will continue to develop towards quick and relevant answers to user questions, providing the opportunity to get all the necessary information directly in the SERP and eliminating the need to go to other sites.

      There is an opinion that search engines, with their desire to answer the user’s question here and now, can destroy search engine optimization, becoming sort of global knowledge bases. But such fears are unfounded, since in order to become global knowledge bases, they need information, and it is stored by the very sites that are worked on by the same optimizers who are involved in the fact that search engines do not stand still, but are constantly evolving.

      As you can see, both SEO and search engines are links in the same chain that cannot exist without each other. Therefore, thoughts about the imminent death of SEO are unfounded. It is quite possible that search engine optimization Over time it will evolve, for example, into consulting, but it certainly won’t die. I wish everyone successful promotion to the TOP!

      A variety of technologies and methods created over the years of development of the theory and practice of information retrieval find their application in modern information retrieval systems. Along with classic library information systems, which continue to improve, intensive development is taking place in the field of global information systems on the Internet, which has become the main driving force modern technologies information search. The enormous volume of available information resources requires the use of scalable search algorithms. Hypertexts allow the use of fundamentally new search models based on semantic analysis of document collections. The high speed of updating pages, their free placement and lack of guarantee of constant access leads to the need for constant re-indexing of current information resources.

      Finally, the heterogeneous composition of users, who often do not have the skills to work with a search engine, forces us to look for effective ways to formulate queries that work with minimal initial information.

      6.1. Dictionary information retrieval systems

      Dictionary information retrieval systems today are the fastest and most effective search engines that are most widespread on the Internet. Searching for the necessary information in dictionary information systems is carried out using keywords. Search results are generated during the work of one or another search algorithm with a dictionary and a query compiled by the user in the IP language.

      IPS vocabulary structure (Fig. 13) consists of the following components: a document viewer, a user interface, a search engine, a database of search images and an indexing agent.

      The information array includes information resources potentially available to the user. This includes text and graphic documents, multimedia information, etc. For the global IRS, this is the entire Internet, where all documents are characterized by a unique URL (URL - Uniform Resource Locator).

      The search engine interface determines the way the user interacts with the search engine. This includes rules for forming queries, a mechanism for viewing search results, etc. The interface of Internet search engines is usually implemented in a web browser environment. Appropriate software is used to work with audio and video information.

      The main function of a search engine is the implementation of the adopted search model. First, the user's request, prepared in IP, is translated according to established rules into a formal request. Then, during the execution of the search algorithm, the request is compared with search images of documents from the database. Based on the comparison results, a final list of found documents is generated. Typically it contains the title, size, creation date and brief annotation of the document, a link to it, as well as the value of the similarity measure between the document and the query.

      Fig. 13. Structure of the IPS vocabulary.

      The list is subject to ranking (ordering according to some criterion, usually according to the value of formal relevance).

      The database of searchable document images is designed to store descriptions of indexed documents. The structure of a typical IRS dictionary database is described in detail in Part 1 of the guidelines.

      The indexing agent performs indexing of available documents in order to compile their search images. In local systems, this operation is usually carried out once: after the formation of an array of documents is completed, all information is indexed and search images are entered into the database. In the dynamic decentralized information array of the Internet, a different approach is used. A special robot program, called a spider or crawler, continuously crawls the network. Transitions between different documents are made using the hyperlinks they contain. The speed of updating information in the search engine database is directly related to the speed of network scanning. For example, a powerful indexing robot can crawl the entire Internet in a few weeks. With each new crawl cycle, the database is updated and old invalid addresses are removed.

      Some documents are closed to search engines. This is information that is authorized or accessed not through a link, but upon request from a form. Intelligent methods for scanning the hidden part of the Internet are currently being developed, but they have not yet received widespread use.

      To index hypertext documents, agent programs use sources: hypertext links (href), headings (title), headings (H1, H2, etc.), annotations, lists of keywords (keywords), image captions. URLs are used to index non-text information (for example, files transferred via FTP).

      Semi-automatic or manual indexing capabilities are also used.

      In the first case, administrators leave messages about their documents, which the indexing agent processes after some time; in the second, administrators independently enter the necessary information into the IRS database.

      An increasing number of information retrieval systems produce full-text indexing. In this case, the entire text of the document is used to compose the search image. Formatting, links, etc. in this case become an additional factor influencing the significance of a particular term. A term from the title will receive more weight than a term from the figure caption.

      Modern large information retrieval systems must process hundreds of requests within a second. Therefore, any delay can lead to an outflow of users and, as a consequence, to the unpopularity of the system and commercial failures. From an architectural point of view, such information systems are implemented in the form of distributed computing systems consisting of hundreds of computers located around the world. Search algorithms and program code undergo extremely careful optimization.

      In information retrieval systems with a large document database, technologies are used to speed up their work separation and pruning .

      Separation consists in dividing the database into obviously more relevant and less relevant parts. First, the IPS searches for documents in the first part of the database. If no documents are found or not enough are found, then the search is performed in the second part.

      Using Pruning (Pruning – English abbreviation, deletion) request processing automatically stops after finding a sufficient number of relevant documents.

      Also widely used threshold search models , which define certain threshold values ​​for the characteristics of documents issued to the user. For example, the relevance of documents is usually limited to some relevance value

      All documents with a relevance value are brought to the user's attention

      If you rank search results by date, the thresholds determine the time interval when the documents were modified. For example, the IPS can automatically cut off documents that have not been changed for the last three years.

      The main advantage of a dictionary-type IPS is its almost complete automation. The system independently analyzes search resources, compiles and stores their descriptions, and searches among these descriptions. Wide coverage of Internet resources is also an advantage of such systems. Significant database volumes make dictionary information systems especially useful for exhaustive searches, complex queries, or for localizing obscure information.

      At the same time, the huge number of documents in the system database often leads to too many documents found. This causes difficulties for most users when analyzing the information found and makes it impossible to quickly search. Automatic indexing methods cannot take into account the specifics of specific documents, and the number of non-pertinent documents among

      found by such a system is often large.

      Another disadvantage of the dictionary information system is the need to formulate queries to the system in a special language. Although there is a tendency towards convergence of FL with natural languages, today the user must have certain skills in formulating queries.