How to create own Search Engine in C# – Part 1

Introduction

I was always curious how Search Engine works in general and was always considering how can I build my own personal Search Engine for own data source. Of course not trying to build a new Google search engine or re-inventing the wheel, but there is a good feeling you get when your curiosity is satisfied, especially if is about a question you found interesting or important, to know more about.

I know that today you can find a lot of software like Elastic search, Apache Lucene and others, but it was important for me to understand the concept and basic principles behind search engine.

In this articles series I will start designing and build a search engine based on C#. It will start with simple and pedagogical solution and push it to more advance solution later on, so you might find the first part of this article is easy, than you can skip it and check part 2 and 3.

Search Engine has a data source that contain the content we are searching for. For our example I have created a simple text file with following content:

*PAGE:https://web1.web
Web1 Title
west
one
same
*PAGE:https://web2.web
Web2 Title
north
news
two
same
*PAGE:https://web3.web
Web3 Title
south
news
three
same

Implementation

For this we create a model call it SearchData

public class SearchData
{
    public string Word { get; set; }
    public string Title { get; set; }
    public string Url { get; set; }
}

So each url link has one title and multiple words.

Now lets parse our text file to a data structure.

public void CreateDataStructure(string path)
{
    _searchContent = new List<SearchData>();

    string page = null;
    string title = null;

    foreach (var line in File.ReadLines(path))
    {
        SearchData searchData = new SearchData();

        if (line.StartsWith("*PAGE:"))
        {
            page = line.Substring(6);
            title = null;
        }
        else
        {
            if (title == null)
            {
                title = line.Trim();
            }
        }

        if (page != null && title != null)
        {
            searchData.Url = page;
            searchData.Title = title;
            searchData.Word = line.Trim();

            _searchContent.Add(searchData);
        }
    }
}

So what we do here is parsing the file content and moving it to a List for SearchData type.

Now we have our data parsed, the next step is to search in it.

Lets create a method FindWord with string signature that take search input. The method go through all elements in list and find all match words, adding it to a new list and return all found results of SearchData type.

public List<SearchData> FindWord(string word)
{
    List<SearchData> list = new List<SearchData>();

    foreach (var searchData in _searchContent)
    {
        if (string.Equals(searchData.Word, word, StringComparison.CurrentCultureIgnoreCase))
        {
            list.Add(searchData);
        }
    }

    return list;
}

Lets try it.

When we type “north” we get one result, but when we type “same” we get three results as expected in the list.

Enter a word to search: north
north found with title north with url https://web2.web
Enter a word to search: same
same found with title west with url https://web1.web
same found with title north with url https://web2.web
same found with title south with url https://web3.web

As you can see the return results are identical to what we have in our simple text file that we created earlier.

Benchmarking

In my benchmarking, it took me following second to search in my files:

  • 5 milliseconds for the small file search-data-S.txt
  • 50 milliseconds for the medium file search-data-M.txt
  • 2000 milliseconds for the large file search-data-L.txt

Complexity analysis

As we can see in our implementation, each time we search for a word, the Algorithm search for all words in all list, with worst case we need to go through all elements to find a word, or repeated word this will give n time search (n time is the size of the list) in another word, Big O(n).

This is exactly what we can also see in benchmarking, each time file get bigger the search takes longer time.

So far so good, we are done with this part.

Conclusion

So what have we learned here?

We created a simple data source, we learned how to parse it to a data structure and finally we have learned to searching in our data structure.

With all that said, even thus our algorithm is not efficient, it does not require many step to create a primitive and simple search engine.

Now we need to ask few questions:

  1. Does the algorithm find what we asking for?
    Yes.
  2. Is our algorithm efficient enough to handle bigger file?
    No, not really, our algorithm will be able to work on bigger data file but will slow down when searching for result. This is because it take O(n) times to lookup through the list, and this is what we gone find out and try to optimize a better solution in next article.

Leave a Comment