How to create own Search Engine in C# – Part 2

Introduction

As we demonstrate in first part of this article, searching big data in List will not be efficient, therefore we have several options to optimize our searching. One way is to use Dictionary or SortedDictionary. Both Data structures are Key and Value based and the lookup time for the Key for both Data structure is O(1).

The reason for that is we convert all words, also repeated word to a Key that is has a hashed value. Each Key has a Value, the value has a list of all Urls, as you can see in the following figure.

Example

Hash table example for words and Urls

In my solution I have chosen to play with ILookup<string, SearchData> which is kind of equitant to Dictionary <string, List<SearchData>>. You might not heard about ILookup or have using it and not really 100% sure, so this will be a good use case. The string here is our key, for instance if we search for word Denmark and want to find all Url contains Denmark, so our key will be Denmark, and our value will be a list of SearchData that contains url pages with word Denmark.

Implementation

The main twist we need to do comparing to our last article, is we need to change our data structure from List<SearchData> to ILookup<string, SearchData>.

List _searchContent;

with

private ILookup _searchContent;

and

adding following line at the end of CreateDataStructure parsing loop:

_searchContent = searchContent.ToLookup(x => x.Word, x => x);

We change our FindWord method to following:

public List<SearchData> FindWord(string word)
{
    IGrouping<string, SearchData> result = _searchContent.FirstOrDefault(e => e.Key.ToLower() == word.ToLower());
    return result?.ToList();
}

And that is it. So now we search in bigger list, it goes very fast. I can of course spend more hours improving this, and measure the performance. This is some thing I leave for your imagination.

Benchmarking

In my benchmarking, it took me following second to search in my files:

  • 0 milliseconds for the small file search-data-S.txt
  • 0 milliseconds for the medium file search-data-M.txt
  • 0 milliseconds for the large file search-data-L.txt

Complexity analysis

Search Using Dictionary, SortedDictionary or ILookup as mentioned has a lookup time O(1). This means it calculate the hash of the word we search for and directly point us to the word reference with out needing to go through all words elements.

This is exactly what is in our benchmarking, that it does not take any time to find words in all file size comparing with previously article, where we used list.

Conclusion

As you can see here, with using the right data structure we can improve efficiency of searching results.

Implementing clever code and optimized data-structure has not only positive effect on faster results, but will also lead to better customer experience, better environment and not least a positive economical impact.

Now some one might maybe ask, but some time in google, I write word like dnemark, it will automatically give me suggestion like denmark. I am not going to make google search engine, but in my next article I am going to show you a very simple Algorithm that return you a suggestion.

Google Example of Search for word dnemark and google return suggestion denmark.
Bing Example of Search for word dnemark and bing return suggestion denmark.

Leave a Comment