Solr Developer with 10 Years experience
Solr Developer Skills: CentOS, Lucene, SOLR, Hadoop HDFS, Yarn, MapReduce, HBase, mySQL, Storm, Spark, Kafka, Redis, MongoDB, Nodejs, Spring, Tomcat, HAProxy, Nginx, Android, iOS, Java, Parse, Firebase, geolocation, Linux, DataCenter, Networking, Elasticsearch, Solr Developer in New York.
Shopping Search Engine: 300 Million Products. 1.2B Titles and Descriptions. 1B+ Images. 12,000 Merchants.
● Voice Search and Image matching, recognition and search.
● Highly scalable infrastructure with average response times under 100ms.
● Contextual + Relational, neural network based Shopping Search Engine able to understand user queries and provide exact results for “ Blue Bedspread by Martha Stewart from Walmart or Macy's.com or around me for under $500”.
● High Volume Search + Big Data Infrastructure allowing Products and Search Queries to reflect most recent state.
● Built and Managed 40 High Performance Servers in Datacenter with 10G uplinks.
Solr Developer / Search Engine Architect / AI Researcher / Supercomputing ● Added partial NLP to search, letting the computer “understand” the query.
● Created a self-healing, self-learning “Auto Product categorization” algorithm that can automatically analyse and categorize products in any of over 30,000 available categories. Successfully used and demonstrated success rate of around 85%.
● Built a 20 Tflops CPU based Super Computer with over 12TB RAM, 640 Cores and 1 PB Storage. Easy to add GPUs to increase total Floating Point computation.
● Theorized and Worked on Computer Vision with small GPU based cluster to provide a better understanding of how neural networks can understand images and “see” Videos.
● Theorized and Researched on Artificial Intelligence using various current open source projects and how integrated usage could provide a better understanding of neural networks and their application to live real world data scenarios by providing computers with the ability to “understand” different datasets, “see” images and videos and roughly match their interconnects.
Building a fast, feature-rich real-time search application on top of Apache Lucene Solr or Elasticsearch.
● Full-text search
● Geo-spatial search
Solr is fast. Solr became a standard among search engines for a reason. It’s stable, reliable, it outperforms nearly every search solution for basic searches, except for Elasticsearch. Yet all it takes to break this powerful search engine is to search while concurrently updating the index with new content. Throw a few million documents into the index and Solr will be seriously struggling while Elasticsearch stills performs without a hitch. This becomes a serious problem if you need to update your search index regularly.
Solr just was not meant for real-time big-data search applications. The web applications today demand that new content generated by users be indexed in real time. The distributed nature of Elasticsearch allows it to keep up with concurrent search and index requests without skipping a beat.
● ElasticSearch over Solr:
● Distributed Search/Cloud-ready
Elasticsearch takes the stage is the distributed search. Elasticsearch, unlike Solr was built with distribution in mind, to be EC2-friendly. What it actually means is that Elasticsearch runs a search index on multiple servers, in a fail-safe and efficient way. And that’s quite a challenge. Distributed systems are, in general, hard to program, but when done correctly such a system is resilient in the face of malice, degrades gracefully, and its security is far superior to the others.
Elasticsearch allows you to break indices into shards with one or more replicas. The shards are hosted in a data node within the cluster that delegates operations to the correct shards with rebalancing and routing done automatically. This ensures that even, in case of some catastrophic hardware or software failure, the chances of your search server going completely offline are close to none. Elasticsearch provides a cloud support for amazon S3, as well as GigaSpaces, Coherence and Terracotta.
Even though some steps to make Solr cloud-ready have been taken, its initial architecture and design do not include it, so it will take more time to get Solr where Elasticsearch is out-of-the-box.
● Real-time search
Elasticsearch is real-time and distributed : just specify delay time via API. Its design follows percolation, an innovative search model similar to webhooks. The idea behind it is that Elasticsearch will notify your application each time new document matches your filters instead of constantly polling the search engine to check for new updates. Elasticsearch has a default refresh interval set to one second, so within only a second of indexing a document, it becomes searchable.
This is the perfect architecture for real-time search.
● JSON-based API
Elasticsearch API is clean and easy to use. You can built a modern application JSON query language provides a more powerful and useful abstraction tool for querying the documents. Elasticsearch is more accessible and pleasant to interact with than Solr.
Less configuration to set and sensible defaults make it so much more user-friendly. No schema is required, which means you can start indexing the content right away. You still can use mapping to define your index structure, which ElasticSearch uses when new indices are created.
● Solr over ElasticSearch
Solr has a mature community, and this should be a major criterion to consider when deciding which product Elasticsearch or Solr to use as a base for your application. Solr has a number of pretty active contributors that indicates it’s a stable and trustworthy search engine. But it’s not to say that Elasticsearch is far behind. Although quite young its community is vastly expanding.
Solr is well documented with the necessary context and examples on how different APIs and components are used, while documentation for Elasticsearch lacks good working examples and configuration instructions, yet it’s slightly better organized.
Both are Lucene-based applications and both are open source. Solr is your search server for creating standard search applications, no massive indexing and no real time updates are required. Elasticsearch architecture is on a whole new level aimed at building modern real-time search applications. If you want distributed indexing then you need to choose Elasticsearch. Elasticsearch is the only true option for cloud and distributed environment. Elasticsearch is scalable, lightning fast and a breeze to integrate with. Its API is more intuitive and accessible than Solr’s. Less configuration to set and sensible defaults let you get the project into production very quickly.
Marketplaces: Product Development for B2B Wholesale Marketplaces and B2C, O2O and C2C Mobile Shopping Marketplaces built around a Social Network and live Messaging with Photo and Video Sharing, Private Group Buying and Selling plus location based Local Users, Groups, Products and Deals Discovery. For India and Global markets. Status – Launched on Play Store.
● WANT - Global B2B shopping
● Benipal - India B2B marketplace
● WANT - B2C Marketplace in India
● want local - O2O shopping marketplace
● beni - C2C shopping marketplace
Logistics: Stealth Mode Product Development for a pan India plus hyper local logistics service to complement “Newco” Shopping Marketplace Shipping and Delivery. ● Created algorithm based optimum routing, Unique 10 digit ID based delivery location, live map delivery status, client initiated re routing, Image recognition based trusted recipient for delivery acceptance and live package plus payment confirmation.