-
Inverted index database setup
I'm looking to parse stories through SyntaxNet but I'm extremely new to this program and databases. https://research.googleblog.com/2016...rlds-most.html
I'm told that, in the end, I'll want it to be in an inverted index.
http://nlp.stanford.edu/IR-book/html...d-index-1.html
To gain the speed benefits of indexing at retrieval time, I'll have to build the index in advance. The major steps in this are:
1. Collect the documents to be indexed:
2. Tokenize the text, turning each document into a list of tokens:
3. Do linguistic preprocessing, producing a list of normalized tokens, which are the indexing terms:
Index the documents that each term occurs in by creating an inverted index, consisting of a dictionary and postings.
My question is if I have to get everything in order before putting the information into a inverted index database, what holds my data while I'm collecting it from thousands of storys? Can I append to the information already inside it, by re-indexing?
Posting Permissions
- You may not post new threads
- You may not post replies
- You may not post attachments
- You may not edit your posts
-
Forum Rules
|
|